HTS Bioinf - Using the vcpipe staging environment
Scope
This document describes how to setup, activate and monitor the staging environments (NSC and TSD), and how to define the analyses that should be run.
We also describe some guidelines on the appropriate use the staging environments. More details on executor
and webui
are found in the vcpipe repository's documentation.
Responsibility
Bioinformaticians involved in development of the variant calling pipeline.
Platforms
There are two staging systems.
- NSC is accessible from your local computer when attached to the NSC network by a network cable.
- TSD is accessible via VMware Horizon or a web-browser logged in to the server
view.tsd.usit.no
.
All staging paths are given relative to the base path {staging}
, which is system dependent as shown in the following table:
System | Alias | Base path |
---|---|---|
TSD | {staging} |
/ess/p22/data/durable/staging |
NSC | {staging} |
/boston/diag/staging |
Log in to the appropriate server to access the desired staging environment.
where {server}
is
System | Alias | Server |
---|---|---|
TSD | {server} |
p22-hpc-03 |
NSC | {server} |
gdx-login.ous.nsc.local |
On both systems there is a persistent staging database that is used to store the status, logs and quality information for all analyses.
System setup
The staging environments are set up to mirror the production environment. Their most typical usage is to test new releases of the production environment.
In order to set up the staging environment, follow the deployment procedures for vcpipe, anno and the corresponding gene panels, reference data and annotation data, but use the {staging}
base path instead of the {production}
base path as described in:
All default paths and ports used in the staging environment are defined in the following config files:
System | Alias | Config |
---|---|---|
TSD | {config} |
{staging}/sw/variantcalling/vcpipe/config/settings-tsd_staging.json |
NSC | {config} |
{staging}/sw/variantcalling/vcpipe/config/settings-nsc_staging.json |
In order to get access to the aliases that will start and stop an instance of the staging environment, add the following to your ~/.bashrc
file:
System start and stop
The staging system is started by running the following commands on {server}
:
-
Start the
executor
in a screen session. Theexecutor
is the process that will pick up and run new analyses. -
Start the
webui
in a screen session. Thewebui
allows inspection and interaction of the executor database. -
After using the staging environment, always stop the
executor
andwebui
.
Monitor the staging environment
The locations of the log files and the port to the webui
are defined in the {config}
files.
On NSC, in order to access the webui
from the web browser at the address localhost:{webui_port}
, its {server}
port must be forwarded to localhost
by issuing
On TSD, the webui
is accessible from the web-browser at the address {server}:{webui_port}
.
Run analyses
An analysis is picked up by the executor
as soon as all required files are present. File dependencies are different for the different analysis types.
The easiest way the create the metadata is to copy from production and then do whatever edits are needed (if any).
Make sure to skip the READY
file when copying, and create the READY
after having done your edits.
Also make sure the values of the platform
tags in .analysis
and .sample
files match your platform.
Files can be put into four different directory categories, depending on the analysis type. The directories can be populated by following the Reanalysis of samples section in the procedure for Execution and monitoring of pipeline, using {staging}
instead of {production}
:
Alias | Path | Contents |
---|---|---|
{samples} |
{staging}/data/samples |
directories containing sample raw files |
{analyses} |
{staging}/data/analyses-work |
directories containing an .analysis definition file |
{singles} |
{staging}/data/analyses-results/singles |
directories containing the output of a basepipe analysis |
{trios} |
{staging}/data/analyses-results/trios |
directories containing the output of a triopipe analysis |
{ella-staging} |
/ess/p22/data/durable/production/ella/ella-staging/data/analyses/imported |
directories containing the output of annopipe |
There are three different analysis types, and each uses files from at least two of these directories
Alias | Requirements |
---|---|
basepipe | one sample directory in {samples} , one basepipe analysis directory in {analyses} |
triopipe | {singles} results for each family member, one trio analysis directory in {analyses} |
annopipe | one {singles} result or one {trios} result, all {samples} and one annopipe single or trio analysis directory in {analyses} |
No analyses will be run unless there is a file named READY
in the {samples}
and {analyses}
directories. As long as any READY
files are missing, the analysis will not be picked up by the executor
and the user is free to manipulate the .analysis
file.
All final and temporary files and processes that have been triggered by the executor
can be found in the result
directory within the {analyses}
' subdirectories.
The final annopipe results are automatically transferred to ELLA staging {ella-staging}
(can be changed in {config}
).
Relevant testing as part of release
Pre-release tests
Depending on the type of changes what went into the release, run one or more of the following analyses and compare the result files (names, sizes and lists of variants in ELLA staging) to similar analyses in production (ELLA production or durable/production/data/analyses-results/{singles,trios}
).
Diag-*-NA?
Diag-wgs??-X*
Diag-EKG?
-
Diag-excap?
-
For full sample numbers (which are sensitive) see file in
/ess/p22/data/durable/production/investigations
.
Post-release tests
Before deployment of the (final) tagged release, run a few quick tests to make sure that the release artifact is valid:
- update the staging environment with the new release
- run
sw/show-versions.sh
to make sure the correct versions are shown - run a quick analysis (a targeted basepipe and an annopipe)
- check that the analyses appear in ELLA staging's incoming directory
Best practices
- Every staging run whose results can be of general interest should have a Gitlab issue and a corresponding directory in
/ess/p22/data/durable2/investigations
. - Use the
{staging}
environment on TSD as a default, and if needed check that NSC gives the same results. - In general, it is not necessary to run the same analyses on both NSC and TSD. Run one or a few tests on the other platform for sanity checking.
- Instead of setting up an annopipe analysis it can be easier to run a re-analysis in
{ella-staging}
first and then check that the result is identical when running annopipe in{staging}
. - Set up an analysis by copying directories that are needed for basepipe, triopipe or annopipe from production.
- The only file that needs to be manually modified is the
.analysis
file from the relevant subdirectory of{analyses}
. - Always skip the
READY
file from the{analyses}
directory, so that any manual changes to the.analysis
file can be performed before it is picked up by theexecutor
. - When the staging work is done, remove the
result
directories under{analyses}
and then delete the analysis from theadmin
tab in thestaging-webui
. - Data transferred to
{staging}
and intended for repeated use should not be deleted. - Keep important files outside of the staging environment and clean up.
- Before a release with significant changes, run all analysis types.
- Compare the analyses as seen in ELLA staging with same analyses in ELLA production instance.
Cleaning
Clean up by removing
- your
{analyses}
directories - analyses entries in the database using the
webui
(gears icon -> Admin -> Delete analysis)
The samples
and analyses-results
data can be kept in the staging environment unless disk usage is critical.
These data might be used in future testing (like running annopipe using data already present in analyses-results
from a previous run).