Skip to content

HTS Bioinf - Using the vcpipe staging environment

Scope

This document describes how to setup, activate and monitor staging environment (NSC and TSD), and how to define the analyses that should be run. We also describe some guidelines on appropriate use the staging environments. Details of executor and webui are found in documentation for the vcpipe repository.

Responsibility

Bioinformaticians involved in development of the variantcalling pipeline.

Platforms

There are two staging systems.

  • NSC is accessible from your local computer when attached to the NSC network by a network cable.
  • TSD is accessible via VMWare Horizon or a web-browser logged in to the server view.tsd.usit.no.

All staging paths are given relative to the base path {staging}, which is system dependent as shown in the following table:

System Alias Base path
TSD {staging} /cluster/projects/p22/staging
NSC {staging} /boston/diag/staging

In order to access {staging} we need to log in to the appropriate server as

ssh {server}

where {server} is

System Alias Server
TSD {server} p22-submit-dev
NSC {server} beta.ous.nsc.local

On both systems there is a persistent staging database that is used to store the status, logs and quality information for all analyses.

System setup

The staging environments are set up to mirror the production environment. The most typical usage is to test new releases of the production environment.

In order to set up the staging environment, follow the deployment procedures for vcpipe, anno and the corresponding genepanels, reference data and annotation data, but use the {staging} base path instead of the {production} base path as described in:

All default paths and ports that are used in the staging environment are defined in the following config files:

System Alias Config
TSD {config} {staging}/sw/vcpipe/vcpipe/config/settings-tsd_staging.json
NSC {config} {staging}/sw/vcpipe/vcpipe/config/settings-nsc_staging.json

In order to get access to the aliases that will start and stop an instance of the staging environment, add the following to your ~/.bashrc file:

[[ -d {staging} ]] && source {staging}/sw/staging-screen-commands.txt || true

System start and stop

The staging system is started by running the following commands on {server}:

  • Start the executor in a screen session. The executor is that process that will pick up and do new analyses

    staging-start-executor
    
  • Start the webui in a screen session. The webui allows inspection and interaction of the executor database

    staging-start-webui
    
  • After using the staging environment, always stop the executor and webui

    staging-stop-webui
    staging-stop-executor
    

Monitor the staging environment

The locations of log files and the port to the webui are defined in the {config} files.

On NSC, in order to access the webui from the web browser at the address localhost:{webui_port}, it has to be mapped to localhost by logging in to {server} as

ssh -L {webui_port}:localhost:{webui_port} {server}

On TSD, the webui is accessible from the web-browser at the address {server}:{webui_port}.

Run analyses

An analysis is picked up by the executor as soon as all required files are present. File dependencies are different for the different analysis types. The easiest way the create the metadata is to copy from production and then do whatever edits are needed (if any). Make sure to skip the READY file when copying, and create the READY after having done your edits. Also make sure the value of the 'platform' tag in .analysis and .sample files match your platform.

Files can be put into four different directory categories, depending on the analysis type. The directories can be populated by following the Reanalysis of samples section in the procedure for Execution and monitoring of pipeline, using {staging} instead of {production}:

Alias Path Contents
{samples} {staging}/samples directories containing sample raw files
{analyses} {staging}/analyses directories containing an .analysis definition file
{singles} {staging}/preprocessed/singles directories containing the output of a basepipe analysis
{trios} {staging}/preprocessed/trios directories containing the output of a triopipe analysis
{ella-staging} /tsd/p22/data/durable/production/ella/ella-staging/data/analyses/imported directories containing the output of annopipe

There are three different analysis types, and each uses files from at least two of these directories

Alias Requirements
{basepipe} one sample directory in {sample}, one basepipe analysis directory in {analyses}
{triopipe} {singles} results for each family member, one trio analysis directory in {analyses}
{annopipe} one {singles} result or one {trios} result, all {samples} and one annopipe single or trio analysis directory in {analyses}

No analyses will be run unless there is a file named READY in the {samples} and {analyses} directories. As long as there are no READY file, the analysis will not be picked up by the executor and the user is free to manipulate the .analysis files.

All final and temporary files and processes that have been triggered by the executor can be found in the result directory within the {analyses} directories.

The final results of {annopipe} are automatically transferred ELLA staging {ella-staging} (can be changed in {config})

Relevant testing as part of release

Pre-release tests

Depending on the type of changes what went into the release, run one or more of the following analyses and compare the result files (names, sizes and lists of variants in ELLA staging) to similar analyses in production (ELLa production or durableN/production/preprocessed).

  • Diag-*-NA?
  • Diag-wgs??-X*
  • Diag-EKG?
  • Diag-excap?

*For full sample number (which are sensitvie) see file in durable2/investigations.

Post-release tests

Before deployment of the (final) tagged release, run a few quick tests to make sure that the release artifact is valid:

  • update the staging environment with release
  • run sw/show-version.sh to make sure the correct versions are shown
  • run a quick analysis (a targeted basepipe and an annopipe)
  • check that the analyses appear in ELLA staging's incoming directory

Best practice

  • Every staging run for which the results are important should have a JIRA/Gitlab issue and a corresponding directory in /tsd/p22/data/durable2/investigations.
  • Set up an analysis by copying directories that are needed for {basepipe}, {triopipe} or {annopipe} from {production}.
  • The only file that needs to be manually modified is the .analysis-file from the directory in {analyses}.
  • Always skip the READY file from the {analyses} directory, so that any manual changes to the .analysis-file can be performed before it is picked up by the executor.
  • Instead of setting up an {annopipe} it can be easier to do a reanalyses in {ella-staging} first and then check that the result is identical when running {annopipe} in {staging}.
  • Use the {staging} on TSD as a default, and if needed check that NSC gives the same result.
  • In general it is not necessary to run the same analyses on both NSC and TSD. Run one or a few tests on the other platform for sanity checking.
  • When finishing work in the staging environment, remove the result directories in {analyses} and then delete the analysis from the admin tab in the staging-webui.
  • Data that has been transferred to {staging} and is intended for repeated use should not be deleted.
  • After testing in the staging environment keep important files outside of the staging environment and clean up.
  • Before a release with significant changes, run all analysis types.
  • Compare the analyses as seen in ELLA staging with same analyses in ELLA production.

Cleaning

Clean up by removing

  • your analyses directories
  • remove the analyses entries in the database using the webui (gears icon -> Admin -> Delete analysis)

The sample and preprocessed data can be kept in staging environment unless disk usage is critical. These data might be used in future testing (like running annopipe using data already present in preprocessed from a previous run).