HTS Bioinf - Using the vcpipe staging environment
Scope
This document describes how to setup, activate and monitor staging environment (NSC and TSD), and how to define the analyses that should be run.
We also describe some guidelines on appropriate use the staging environments. Details of executor
and webui
are found in
documentation for the vcpipe repository.
Responsibility
Bioinformaticians involved in development of the variantcalling pipeline.
Platforms
There are two staging systems.
- NSC is accessible from your local computer when attached to the NSC network by a network cable.
- TSD is accessible via VMWare Horizon or a web-browser logged in to the server
view.tsd.usit.no
.
All staging paths are given relative to the base path {staging}
, which is system dependent as shown in the following table:
System | Alias | Base path |
---|---|---|
TSD | {staging} |
/cluster/projects/p22/staging |
NSC | {staging} |
/boston/diag/staging |
In order to access {staging}
we need to log in to the appropriate server as
where {server}
is
System | Alias | Server |
---|---|---|
TSD | {server} |
p22-submit-dev |
NSC | {server} |
beta.ous.nsc.local |
On both systems there is a persistent staging database that is used to store the status, logs and quality information for all analyses.
System setup
The staging environments are set up to mirror the production environment. The most typical usage is to test new releases of the production environment.
In order to set up the staging environment, follow the deployment procedures for vcpipe
, anno
and the corresponding genepanels, reference data and annotation data, but use the {staging}
base path instead of the {production}
base path as described in:
All default paths and ports that are used in the staging environment are defined in the following config files:
System | Alias | Config |
---|---|---|
TSD | {config} |
{staging}/sw/vcpipe/vcpipe/config/settings-tsd_staging.json |
NSC | {config} |
{staging}/sw/vcpipe/vcpipe/config/settings-nsc_staging.json |
In order to get access to the aliases that will start and stop an instance of the staging environment, add the following to your ~/.bashrc
file:
System start and stop
The staging system is started by running the following commands on {server}
:
-
Start the
executor
in a screen session. Theexecutor
is that process that will pick up and do new analyses -
Start the
webui
in a screen session. Thewebui
allows inspection and interaction of the executor database -
After using the staging environment, always stop the
executor
andwebui
Monitor the staging environment
The locations of log files and the port to the webui
are defined in the {config}
files.
On NSC, in order to access the webui
from the web browser at the address localhost:{webui_port}
, it has to be mapped to localhost by logging in to {server}
as
On TSD, the webui
is accessible from the web-browser at the address {server}:{webui_port}
.
Run analyses
An analysis is picked up by the executor
as soon as all required files are present. File dependencies are different for the different analysis types.
The easiest way the create the metadata is to copy from production and then do whatever edits are needed (if any).
Make sure to skip the READY
file when copying, and create the READY
after having done your edits.
Also make sure the value of the 'platform' tag in .analysis
and .sample
files match your platform.
Files can be put into four different directory categories, depending on the analysis type. The directories can be populated by following the Reanalysis of samples section in the procedure for Execution and monitoring of pipeline, using {staging}
instead of {production}
:
Alias | Path | Contents |
---|---|---|
{samples} |
{staging}/samples |
directories containing sample raw files |
{analyses} |
{staging}/analyses |
directories containing an .analysis definition file |
{singles} |
{staging}/preprocessed/singles |
directories containing the output of a basepipe analysis |
{trios} |
{staging}/preprocessed/trios |
directories containing the output of a triopipe analysis |
{ella-staging} |
/tsd/p22/data/durable/production/ella/ella-staging/data/analyses/imported |
directories containing the output of annopipe |
There are three different analysis types, and each uses files from at least two of these directories
Alias | Requirements |
---|---|
{basepipe} |
one sample directory in {sample} , one basepipe analysis directory in {analyses} |
{triopipe} |
{singles} results for each family member, one trio analysis directory in {analyses} |
{annopipe} |
one {singles} result or one {trios} result, all {samples} and one annopipe single or trio analysis directory in {analyses} |
No analyses will be run unless there is a file named READY
in the {samples}
and {analyses}
directories. As long as there are no READY
file, the analysis will not be picked up by the executor
and the user is free to manipulate the .analysis
files.
All final and temporary files and processes that have been triggered by the executor
can be found in the result
directory within the {analyses}
directories.
The final results of {annopipe}
are automatically transferred ELLA staging {ella-staging}
(can be changed in {config}
)
Relevant testing as part of release
Pre-release tests
Depending on the type of changes what went into the release, run one or more of the following analyses and compare the result files (names, sizes and lists of variants in ELLA staging) to similar analyses in production (ELLa production or durableN/production/preprocessed).
- Diag-*-NA?
- Diag-wgs??-X*
- Diag-EKG?
- Diag-excap?
*For full sample number (which are sensitvie) see file in durable2/investigations
.
Post-release tests
Before deployment of the (final) tagged release, run a few quick tests to make sure that the release artifact is valid:
- update the staging environment with release
- run
sw/show-version.sh
to make sure the correct versions are shown - run a quick analysis (a targeted basepipe and an annopipe)
- check that the analyses appear in ELLA staging's incoming directory
Best practice
- Every staging run for which the results are important should have a JIRA/Gitlab issue and a corresponding directory in
/tsd/p22/data/durable2/investigations
. - Set up an analysis by copying directories that are needed for
{basepipe}
,{triopipe}
or{annopipe}
from{production}
. - The only file that needs to be manually modified is the
.analysis
-file from the directory in{analyses}
. - Always skip the
READY
file from the{analyses}
directory, so that any manual changes to the.analysis
-file can be performed before it is picked up by theexecutor
. - Instead of setting up an
{annopipe}
it can be easier to do a reanalyses in{ella-staging}
first and then check that the result is identical when running{annopipe}
in{staging}
. - Use the
{staging}
on TSD as a default, and if needed check that NSC gives the same result. - In general it is not necessary to run the same analyses on both NSC and TSD. Run one or a few tests on the other platform for sanity checking.
- When finishing work in the staging environment, remove the
result
directories in{analyses}
and then delete the analysis from theadmin
tab in thestaging-webui
. - Data that has been transferred to
{staging}
and is intended for repeated use should not be deleted. - After testing in the staging environment keep important files outside of the staging environment and clean up.
- Before a release with significant changes, run all analysis types.
- Compare the analyses as seen in ELLA staging with same analyses in ELLA production.
Cleaning
Clean up by removing
- your analyses directories
- remove the analyses entries in the database using the webui (gears icon -> Admin -> Delete analysis)
The sample
and preprocessed
data can be kept in staging environment unless disk usage is critical.
These data might be used in future testing (like running annopipe using data already present in preprocessed
from a previous run).