HTS Bioinf - Release and deployment of the Anno system
Terms and definitions
Anno system: system built on ella-anno
and anno-targets
Gitlab repositories and their deployed releases.
Release: by design, a release is defined as an anno-targets
tagged image, which automatically includes the latest ella-anno
tagged image.
DO: Digital Ocean cloud spaces.
Deployment checklist for the reviewer
- Check that
anno.sif
symlinks to the right image for bothprod
andstaging
on TSD and NSC - Check that
anno.sif
has the right permissions (should be at least774
) - Make sure the
vcfanno_config.toml
file points to the right data source (if updated, check the release notes) and has the right permissions (should be at least774
) - For the Anno-service on TSD check that the anno process was restarted (if not, do restart it)
- Finally, check if the minimal test (a re-analysis in ELLA staging) was run successfully
Components of the anno system
A release includes three components for each repository (see figure below):
- Source code (main and helper scripts and utilities)
- Data sources (annotation data in
datasets.json
file) - Docker/Singularity image (built automatically by a CI job)
Anno release steps overview
The release steps can be grouped into the following stages:
I. Prepare
- Create release issue from template
- Write "Endringskontroll" (major/minor code changes)
- Verify credentials (DO, HGMD, NCBI ClinVar API keys)
- Generate and upload data to DO (via
ella-anno
oranno-targets
) ella-anno
andanno-targets
dev
- MRs merged and all tests PASSella-anno
andanno-targets
dev
- merge tomaster
II. Fetch
- Tag
anno-targets
and/orella-anno
(or re-use latest existing tag for the latter) - Fetch the
anno-targets
singularity image (.sif
) built by the CI job - Download data from DO and prepare data
.tar
archive
III. Deploy
- Upload release artifacts (data
.tar
, release.sif
) to TSD/NSC - Make sure no analyses are running
- Deploy first in
staging
, then inprod
- Stop Supervisor of the respective anno-service on TSD
- Unpack data
.tar
in the respective instances (ops/update_data.sh
) - Place new
.sif
into the archive and symlink it toanno.sif
- Check
anno.sif
and unpacked data permissions (at least774
) - Re-start Supervisor on TSD
- Update tracks in ELLA if needed
IV. Test
V. Announce
Anno release preparation
Create Gitlab release issue (except for data-only release)
The release responsible (typically a system owner) creates a release issue from the template in anno-targets
to track and document all steps of release preparation and deployment. We skip this step for the routine data-only and bug fix releases.
Generate annotation data
Update data sources following the respective procedures:
Source code changes
Merge all feature branches associated with the release into ella-anno
and anno-targets
's respective dev
branches (in this order!). Make sure that all CI tests pass before the merge. If any of the changes in anno-targets
depend on those in ella-anno
, transient tagging (release candidate *-rc
) of ella-anno
's dev
branch is advisable. If deemed necessary, transiently tag (release candidate *-rc
) anno-targets
's dev
branch and deploy the release candidate artifact to staging (see below) for field-testing.
NOTE: Tags can be created in Gitlab's webUI or locally issuing
git tag -a <tag> && git push origin <tag>
.
Anno release artifact creation
-
Merge
ella-anno
andanno-targets
'sdev
branches into the correspondingmaster
branches and tag them with a new version (in this order, merge >>> tag >>> merge >>> tag). The last tag will trigger a CI job in Gitlab that will build the release image to be uploaded to DO. -
Unless the
ella-anno
release is data-only, create a newella-anno
Gitlab release, using the tag just created. Include release notes from any data-only releases since the last Gitlab release. -
Download the release Singularity image from DO for source code updates via
anno-targets
'smaster
branch:make fetch-singularity-release DB_CREDS=$HOME/.db_creds
. Alternatively build the Docker image and then the Singularity image locally. Usemake help
to get building instructions. -
Create the data archive:
- Clone the
anno-targets
repository make build-annobuilder
- this is to make sure the correct image is usedmake download-[anno|amg]-packag DB_CREDS=$HOME/.db_creds PKG_NAME=<package name>
(according to the data source table,anno
forella-anno
sources,amg
foranno-targets
sources)- alternatively,
make download-data
to download all data make tar-data PKG_NAMES=<package name 1>,<package name 2> TAR_OUTPUT=/anno/data/<name of tar-file>
(recommended format<package name 1>_<version>[-<package name 2>_<version>...].tar
). This will generate a tar file in the data directory. IfPKG_NAMES
is not specified, this will tar all the data in the data directory.
- Clone the
Anno deployment
The anno system is deployed in two locations and must be updated in each accordingly:
- TSD -
ANNO_ROOT=/ess/p22/data/durable/production/anno/anno-{prod,staging}
- NSC -
ANNO_ROOT=/boston/diag/{production,staging}/sw/annotation/anno
Below is an overview of the deployment procedure. For the details, read on.
- Check that no analyses are running
- Stop the anno-service from Supervisor on TSD
- Change to service user
- Place the data tarball in
DATA_ARCHIVE
and the Singularity image inSIF_ARCHIVE
- Run
${BASE}/ops/update_data.sh -t ${DATA_ARCHIVE}/<data tar> -i ${INSTANCE} -p ${PLATFORM}
- Link the image:
ln -fs ${SIF_ARCHIVE}/<anno sif> ${ANNO_ROOT}/anno.sif
- Verify the image:
singularity exec ${ANNO_ROOT}/anno.sif cat /anno{-targets,}/version
- Verify the data [files
sources.json
andvcfanno_config.toml
inANNO_DATA
] - Once done, restart the anno-service from Supervisor on TSD
TSD | NSC | |
---|---|---|
BASE |
/ess/p22/data/durable/production/anno |
/boston/diag |
DATA_ARCHIVE |
${BASE}/archive/data |
N/A (archived on TSD) |
SIF_ARCHIVE |
${BASE}/archive/releases |
${BASE}/production/sw/archive/releases |
PRODUCTION | ||
ANNO_ROOT |
${BASE}/anno-prod |
${BASE}/production/sw/annotation/anno |
INSTANCE |
prod |
prod |
PLATFORM |
tsd |
nsc |
STAGING | ||
ANNO_ROOT |
${BASE}/anno-staging |
${BASE}/staging/sw/annotation/anno |
INSTANCE |
staging |
staging |
PLATFORM |
tsd |
nsc |
The anno system should be deployed in staging on TSD/NSC first. A re-analysis in ELLA staging should be run after deployment on TSD (always do this) to start an anno process and validate the new image. Once established that everything runs as expected, the anno system can be deployed in production, too.
Steps of Anno's deployment
-
Upload data
.tar
and release.sif
files to TSD/NSC (check our Wiki for instructions on usingtacl
for data upload). -
ssh
as service user into the production VM. -
move release
.sif
toSIF_ARCHIVE
(production only). -
check that there are no analyses running in the pipeline (ask the responsible for operations).
-
[TSD-only] check that no anno-service analyses are running (use
$BASE/ops/num_active_jobs.sh
). -
[TSD-only] move data
.tar
to$BASE/archive/data
(on NSC, just place it somewhere accessible and remember to delete it after the update). -
[TSD-only] stop the respective anno-service (production or staging) using the webUI accessible at
p22-anno-01:9000
(you may need to authenticate). To verify that the environment processes have in fact stopped one can (afterssh
ing intop22-anno-01
) compare the output of the following commands before and after:# show info for all supervisord processes ps uww $(pgrep supervisord) # USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND # 1200 11084 0.0 0.0 28108 22880 ? S 10:23 0:04 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf # tsnowlan 11106 0.0 0.0 68244 23148 ? S 10:23 0:07 /dist/ella-python/bin/python3.7 /dist/ella-python/bin/supervisord -c /ella/ops/dev/supervisor.cfg
-
Update the
anno.sif
symlink's target to the new release.sif
-file:# to test the command ln -s path-to-new-image-vX.X.X.sif ${ANNO_ROOT}/anno.sif # to override the current link ln -sf path-to-new-image-vX.X.X.sif ${ANNO_ROOT}/anno.sif
-
Check the symlink's permissions (must be at least
774
) -
Verify that the symlinked file is not corrupted and that the versions of the software are correct (see Gitlab release issue):
-
-
Extract the new data: for each combination (PLATFORM, INSTANCE) in
[(tsd, staging), (tsd, prod)]
on TSD and[(nsc, staging), (nsc, prod)]
on NSC, you'll have to run the following:# check options ${BASE}/ops/update_data.sh -h # run the script ${BASE}/ops/update_data.sh -t <path/to/tar-file> -i <INSTANCE>
Warning
DO NOT UNTAR THE FILE DIRECTLY IN THE ANNO DATA DIRECTORY
You may encounter issues with permissions during the data update. These will be typically because the script cannot move the previous version (for example the
data/variantDB/clinvar
directory) to the.tmp
directory or it cannot remove the.tmp
directory itself. -
Verify that the data are untarred correctly. Check the respective instance's
${ANNO_ROOT}/data
directory for the:- Contents of
sources.json
- Contents of
vcfanno_config.toml
- Untarred data permissions inside the respective data source directories (should be
774
)
- Contents of
-
[TSD-only] restart the respective anno-service using the webUI accessible at
p22-anno-01:9000
.
Updates to IGV tracks in ELLA
If any of the sources mentioned in the table below have been updated, the corresponding tracks in ELLA will need to be updated as well:
Data source | Filename |
---|---|
ClinVar | clinvar-current.vcf.gz |
HGMD | hgmd-current.vcf.gz |
gnomAD exomes | gnomad.exomes-current.vcf.gz |
gnomAD genomes | gnomad.genomes-current.vcf.gz |
To update one of these tracks, e.g. the ClinVar track, do the following:
-
copy the files (
.vcf.gz
and.tbi
) from the fresh anno update to the ELLA tracks directory (those files are identical in prod and staging, you can use either) -
update symlinks for
clinvar-current.vcf.gz
andclinvar-current.vcf.gz.tbi
to point to the respective new filesclinvar_yyyymmdd.vcf.gz[.tbi]
-
update the version of the track in ELLA using ELLA's track config webUI: refer to HTS Bioinf - Using ELLA track config manager for the instructions on how to use ELLA track config manager.
Minimal test
Make sure that you have done this test: as a minimal sanity check use ELLA-staging and try manually importing an analysis.
To check that the right files are in fact used, one can grep for '^##' from the log at /ess/p22/data/durable/production/anno/anno-staging/work/*/VCFANNO/output.vcf
and then grep the updated database. See picture:
User announcement
Send a notification to the users about the contents of the update in Teams: GDx driftsmeldinger.