Skip to content

HTS Bioinf - Using the vcpipe staging environment

Scope

This document describes how to setup, activate and monitor the staging environments (NSC and TSD), and how to define the analyses that should be run. We also describe some guidelines on the appropriate use the staging environments. More details on executor and webui are found in the vcpipe repository's documentation.

Responsibility

Bioinformaticians involved in development of the variant calling pipeline.

Platforms

There are two staging systems.

  • NSC is accessible from your local computer when attached to the NSC network by a network cable.
  • TSD is accessible via VMware Horizon or a web-browser logged in to the server view.tsd.usit.no.

All staging paths are given relative to the base path {staging}, which is system dependent as shown in the following table:

System Alias Base path
TSD {staging} /ess/p22/data/durable/staging
NSC {staging} /boston/diag/staging

Log in to the appropriate server to access the desired staging environment.

ssh {server}

where {server} is

System Alias Server
TSD {server} p22-hpc-03
NSC {server} gdx-login.ous.nsc.local

On both systems there is a persistent staging database that is used to store the status, logs and quality information for all analyses.

System setup

The staging environments are set up to mirror the production environment. Their most typical usage is to test new releases of the production environment.

In order to set up the staging environment, follow the deployment procedures for vcpipe, anno and the corresponding gene panels, reference data and annotation data, but use the {staging} base path instead of the {production} base path as described in:

All default paths and ports used in the staging environment are defined in the following config files:

System Alias Config
TSD {config} {staging}/sw/variantcalling/vcpipe/config/settings-tsd_staging.json
NSC {config} {staging}/sw/variantcalling/vcpipe/config/settings-nsc_staging.json

In order to get access to the aliases that will start and stop an instance of the staging environment, add the following to your ~/.bashrc file:

[[ -d {staging} ]] && source {staging}/sw/staging-screen-commands.txt || true

System start and stop

The staging system is started by running the following commands on {server}:

  • Start the executor in a screen session. The executor is the process that will pick up and run new analyses.

    staging-start-executor
    
  • Start the webui in a screen session. The webui allows inspection and interaction of the executor database.

    staging-start-webui
    
  • After using the staging environment, always stop the executor and webui.

    staging-stop-webui
    staging-stop-executor
    

Monitor the staging environment

The locations of the log files and the port to the webui are defined in the {config} files.

On NSC, in order to access the webui from the web browser at the address localhost:{webui_port}, its {server} port must be forwarded to localhost by issuing

ssh -N -L {webui_port}:localhost:{webui_port} {server} &

On TSD, the webui is accessible from the web-browser at the address {server}:{webui_port}.

Run analyses

An analysis is picked up by the executor as soon as all required files are present. File dependencies are different for the different analysis types. The easiest way the create the metadata is to copy from production and then do whatever edits are needed (if any). Make sure to skip the READY file when copying, and create the READY after having done your edits. Also make sure the values of the platform tags in .analysis and .sample files match your platform.

Files can be put into four different directory categories, depending on the analysis type. The directories can be populated by following the Reanalysis of samples section in the procedure for Execution and monitoring of pipeline, using {staging} instead of {production}:

Alias Path Contents
{samples} {staging}/data/samples directories containing sample raw files
{analyses} {staging}/data/analyses-work directories containing an .analysis definition file
{singles} {staging}/data/analyses-results/singles directories containing the output of a basepipe analysis
{trios} {staging}/data/analyses-results/trios directories containing the output of a triopipe analysis
{ella-staging} /ess/p22/data/durable/production/ella/ella-staging/data/analyses/imported directories containing the output of annopipe

There are three different analysis types, and each uses files from at least two of these directories

Alias Requirements
basepipe one sample directory in {samples}, one basepipe analysis directory in {analyses}
triopipe {singles} results for each family member, one trio analysis directory in {analyses}
annopipe one {singles} result or one {trios} result, all {samples} and one annopipe single or trio analysis directory in {analyses}

No analyses will be run unless there is a file named READY in the {samples} and {analyses} directories. As long as any READY files are missing, the analysis will not be picked up by the executor and the user is free to manipulate the .analysis file.

All final and temporary files and processes that have been triggered by the executor can be found in the result directory within the {analyses}' subdirectories.

The final annopipe results are automatically transferred to ELLA staging {ella-staging} (can be changed in {config}).

Relevant testing as part of release

Pre-release tests

Depending on the type of changes what went into the release, run one or more of the following analyses and compare the result files (names, sizes and lists of variants in ELLA staging) to similar analyses in production (ELLA production or durable/production/data/analyses-results/{singles,trios}).

  • Diag-*-NA?
  • Diag-wgs??-X*
  • Diag-EKG?
  • Diag-excap?

  • For full sample numbers (which are sensitive) see file in /ess/p22/data/durable/production/investigations.

Post-release tests

Before deployment of the (final) tagged release, run a few quick tests to make sure that the release artifact is valid:

  • update the staging environment with the new release
  • run sw/show-versions.sh to make sure the correct versions are shown
  • run a quick analysis (a targeted basepipe and an annopipe)
  • check that the analyses appear in ELLA staging's incoming directory

Best practices

  • Every staging run whose results can be of general interest should have a Gitlab issue and a corresponding directory in /ess/p22/data/durable2/investigations.
  • Use the {staging} environment on TSD as a default, and if needed check that NSC gives the same results.
  • In general, it is not necessary to run the same analyses on both NSC and TSD. Run one or a few tests on the other platform for sanity checking.
  • Instead of setting up an annopipe analysis it can be easier to run a re-analysis in {ella-staging} first and then check that the result is identical when running annopipe in {staging}.
  • Set up an analysis by copying directories that are needed for basepipe, triopipe or annopipe from production.
  • The only file that needs to be manually modified is the .analysis file from the relevant subdirectory of {analyses}.
  • Always skip the READY file from the {analyses} directory, so that any manual changes to the .analysis file can be performed before it is picked up by the executor.
  • When the staging work is done, remove the result directories under {analyses} and then delete the analysis from the admin tab in the staging-webui.
  • Data transferred to {staging} and intended for repeated use should not be deleted.
  • Keep important files outside of the staging environment and clean up.
  • Before a release with significant changes, run all analysis types.
  • Compare the analyses as seen in ELLA staging with same analyses in ELLA production instance.

Cleaning

Clean up by removing

  • your {analyses} directories
  • analyses entries in the database using the webui (gears icon -> Admin -> Delete analysis)

The samples and analyses-results data can be kept in the staging environment unless disk usage is critical. These data might be used in future testing (like running annopipe using data already present in analyses-results from a previous run).