HTS Bioinf - Release and deployment of anno system
Terms and definitions
anno system: system built on ella-anno
and anno-targets
Gitlab repositories and their deployed releases.
release: by design the release is defined as an anno-targets
tagged image, which automatically includes the latest ella-anno
tagged image.
DO: Digital Ocean cloud spaces
Deployment checklist for the reviewer
- Check
anno.sif
symlink to the right image for TSD for annoservice (/ess/..
) and NSC (bothprod
andstaging
) - Check
anno.sif
permissions (at least774
) - Check the data (if updated),
vcfanno_config.toml
file according to the release notes, permissions (should be at least 774) -
For annoservice on TSD check that the anno process was restarted (if not do restart it).
-
Finally, check if the minimal test (a re-analysis in ELLA staging) was run successfully
Components of anno system
A release includes three components for each repository (see figure below):
- Source code (main and helper scripts and utilities)
- Data sources (annotation data in
datasets.json
file) - Docker/Singularity image (built automatically by a CI job)
Anno release steps overview
The release steps can be grouped into the following stages:
I. Prepare
- Create release issue from template
- "Endringskontroll" (major/minor code changes)
- Credentials (DO, HGMD, NCBI ClinVar API key)
- Generate and upload data to DO (via
ella-anno
oranno-targets
) ella-anno
andanno-targets
dev
- MRs merged and all tests PASSella-anno
andanno-targets
dev
- merge tomaster
- Tag
anno-targets
and/orella-anno
(or re-use latest existing tag for the latter) - Fetch the
anno-targets
singularity image (.sif
) built by the CI job - Download data from DO and prepare data
.tar
archive
III. Deploy
- Upload release artifacts (
data.tar
,*.sif
) to TSD/NSC - Make sure no analyses run on TSD/NSC
- Deploy first in
staging
, and then inprod
- Stop Supervisor of the respective anno-service on TSD
- Unpack data
.tar
in the respective instances (ops/update_data.sh
) - Place new
.sif
into archive and symlink it toanno.sif
- Check
anno.sif
and unpacked data permissions (at least 774) - Re-start Supervisor on TSD
- Update tracks in ELLA if needed
IV. Test
V. Announce
Anno release preparation
Create Gitlab release issue
The release responsible (typically a system owner) creates a release issue from template in anno-targets
to track and document all steps of release preparation and deployment.
Generate data sources
Follow procedures to update data sources:
Source code changes
- Merge all MRs in
ella-anno
andanno-targets
planned in the release into the respectivedev
branches. Make sure that all CI tests PASS before the merge. - In each of
ella-anno
andanno-targets
, mergedev
tomaster
.
Anno release artifact creation
-
Tag the
master
branch inella-anno
andanno-targets
. This can be done in Gitlab web-UI or locally withgit tag -a vX.X.X && git push origin vX.X.X
. This will trigger a CI job in Gitlab that will build the release image to be uploaded to DO. -
Unless
ella-anno
release is data-only, create a new Gitlab release, using the tag just created. Include release notes from any data-only releases since the last Gitlab release. -
Download the release singularity image from DO for source code updates via
anno-targets
master
branch:make fetch-singularity-release
. -
Create the data archive: a. Clone the
anno-targets
repository b.make build-annobuilder
- this is to make sure the correct image is used c.make download-[anno|amg]-package PKG_NAME=<package name>
(according to the data source table,anno
forella-anno
sources,amg
foranno-targets
sources) d. alternatively,make download-data
to download all data e.make tar-data PKG_NAMES=<package name 1>,<package name 2> TAR_OUTPUT=/anno/data/<name of tar-file>
(recommended formatsource1_version-source2_version.tar
). This will generate a tar file in the data directory. IfPKG_NAMES
is not specified, this will tar all the data in the data directory.
Anno deployment
An overview of the deployment procedure is summarized in the figure below, for details, read on.
As illustrated on the figure above, the anno system is deployed in three locations and needs to be updated in those locations respectively:
- anno as anno-service that runs as part of ELLA on TSD and is deployed on
/ess/p22/data/durable/production/anno
- anno as annopipe that runs as part of the production pipeline on NSC and is deployed on
/boston/diag/production(staging)/sw/anno
The anno system should be deployed on TSD in this order:
- anno-service in staging -> Run a re-analysis in ELLA staging to validate (always do this step). This will start an anno process and will validate that the new image is running as expected.
- anno-service in prod.
Steps of anno deployment
- Upload
data.tar
and release.sif
files to TSD/NSC (check our Wiki for instructions on usingtacl
for data upload). ssh
using {serviceuser} intop22-submit-dev
on TSD or intoboston
on NSC.cd
into the root directory ofanno
, which will be specific for each of the two locations (you have to repeat the following steps for each of the two locations):ROOT_ANNOSERVICE=/ess/p22/data/durable/production/anno
for anno-service on TSDROOT_NSC=/boston/diag/production(staging)/sw/anno
for annopipe) on NSC- Data archive. On TSD the archive and utility scripts are stored in
ROOT_ANNOSERVICE
for both anno-service and annopipe. To archive: - move
tar
file to$ROOT_ANNOSERVICE/archive/data
(no special location on NSC as archiving is already done on TSD) - move
.sif
file to$ROOT_ANNOSERVICE/archive/releases
for anno-service on TSD ($ROOT_NSC/archive/releases
on NSC) - move
.sif
file to$ROOT_ANNOPIPE/../../archive
on both production and staging for annopipe on TSD - Check that there are no running jobs in the anno-service (run script
$ROOT_ANNOSERVICE/ops/num_active_jobs.sh
), and that there are no samples running in pipeline (ask production responsible) -
Stop respective anno-service on TSD (production, staging or validation) using the webUI located at
p22-anno-01:9000
. You will need to authenticate. -
Update link target from
anno/anno.sif
to the new release.sif
-file, e.g from the root ofanno
folder:
# show info for all supervisord processes
ps uww $(pgrep supervisord)
# USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
# 1200 11084 0.0 0.0 28108 22880 ? S 10:23 0:04 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf
# tsnowlan 11106 0.0 0.0 68244 23148 ? S 10:23 0:07 /dist/ella-python/bin/python3.7 /dist/ella-python/bin/supervisord -c /ella/ops/dev/supervisor.cfg
# 1200 11657 0.0 0.0 33504 23104 ? S 10:23 0:05 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf
# 1200 11808 0.0 0.0 33392 23268 ? S 10:23 0:05 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf
# tsnowlan 14791 0.0 0.0 68244 23032 ? S 10:23 0:06 /dist/ella-python/bin/python3.7 /dist/ella-python/bin/supervisord -c /ella/ops/dev/supervisor.cfg
anno/anno.sif
to the new release .sif
-file, e.g. from the root of anno
directory:
# to test the command
ln -s path-to-new-image-vX.X.X.sif anno.sif
# to override the link
ln -sf path-to-new-image-vX.X.X.sif anno.sif
[(tsd, staging), (tsd, prod)]
on TSD and (boston, prod)
on NSC, run the following from ROOT_ANNOSERVICE
location:
# check options
ops/update_data.sh -h
# run the script
ops/update_data.sh -t <path to tar-file> -r <ROOT_FS> -e <ENVIRONMENT>
You may encounter issues with permissions during the data update. These will be typically because the script cannot move the previous version (for example data/variantDB/clivar
directory) to the .tmp
directory or it cannot remove the .tmp
directory itself.
10. Verify that the data is untarred correctly. Check the data directory for the:
- Untarred data permissions (might need a change to 774)
- Contents of data/sources.json
at the root of anno
- Contents of data/vcfanno_config.toml
at the root of anno
11. Start anno-service on TSD using the webUI located at p22-anno-01:9000
.
Updates to IGV tracks in ELLA
If any of the sources mentioned in the table below have been updated, the corresponding tracks in ELLA will need to be updated as well:
Data source | Filename |
---|---|
ClinVar | clinvar-current.vcf.gz |
HGMD | hgmd-current.vcf.gz |
gnomAD exomes | gnomad.exomes-current.vcf.gz |
gnomAD genomes | gnomad.genomes-current.vcf.gz |
To update one of these tracks, do the following (on example of the clinvar track):
- copy file from fresh anno update to ELLA tracks directory
cp /ess/p22/data/durable/production/anno/anno-prod/data/variantDBs/clinvar/clinvar_yyyymmdd.vcf.gz* /ess/p22/data/durable/production/ella/ella-(prod|staging)/data/igv-data/tracks
- update symlinks for
clinvar-current.vcf.gz
andclinvar-current.vcf.gz.tbi
with respective newer filesclinvar_yyyymmdd.vcf.gz(tbi)
- update the version of the track in ELLA using track config webUI:
- Navigate to
p22-app-01:9274
in a browser on TSD (e.g. Chromium) - Select an instance (staging or prod) - need to do this for both
- Click
Edit Config
button on top right of the view and choose Code inView
- Scroll down to find the track in question and update version in the
description
field - Tick
Write updated config
and click the button to save the new config
Minimal test
Make suree that you have done this test: as a minimal sanity check use ELLA-staging and try creating a re-analysis.
User announcement
Send the notification to the users about the contents of update.