HTS Bioinf - Release and deployment of anno system
Terms and definitions
Anno system: system built on ella-anno
and anno-targets
Gitlab repositories and their deployed releases.
Release: by design, a release is defined as an anno-targets
tagged image, which automatically includes the latest ella-anno
tagged image.
DO: Digital Ocean cloud spaces.
Deployment checklist for the reviewer
- Check that
anno.sif
symlinks to the right image for bothprod
andstaging
on TSD and NSC - Check that
anno.sif
has the right permissions (should be at least774
) - Make sure the
vcfanno_config.toml
file points to the right data source (if updated, check the release notes) and has the right permissions (should be at least774
) - For the Anno-service on TSD check that the anno process was restarted (if not, do restart it)
- Finally, check if the minimal test (a re-analysis in ELLA staging) was run successfully
Components of the anno system
A release includes three components for each repository (see figure below):
- Source code (main and helper scripts and utilities)
- Data sources (annotation data in
datasets.json
file) - Docker/Singularity image (built automatically by a CI job)
Anno release steps overview
The release steps can be grouped into the following stages:
I. Prepare
- Create release issue from template
- Write "Endringskontroll" (major/minor code changes)
- Verify credentials (DO, HGMD, NCBI ClinVar API key)
- Generate and upload data to DO (via
ella-anno
oranno-targets
) ella-anno
andanno-targets
dev
- MRs merged and all tests PASSella-anno
andanno-targets
dev
- merge tomaster
II. Fetch
- Tag
anno-targets
and/orella-anno
(or re-use latest existing tag for the latter) - Fetch the
anno-targets
singularity image (.sif
) built by the CI job - Download data from DO and prepare data
.tar
archive
III. Deploy
- Upload release artifacts (data
.tar
, release.sif
) to TSD/NSC - Make sure no analyses are running
- Deploy first in
staging
, then inprod
- Stop Supervisor of the respective anno-service on TSD
- Unpack data
.tar
in the respective instances (ops/update_data.sh
) - Place new
.sif
into the archive and symlink it toanno.sif
- Check
anno.sif
and unpacked data permissions (at least774
) - Re-start Supervisor on TSD
- Update tracks in ELLA if needed
IV. Test
V. Announce
Anno release preparation
Create Gitlab release issue (except for data-only release)
The release responsible (typically a system owner) creates a release issue from the template in anno-targets
to track and document all steps of release preparation and deployment. We skip this step for the routine data-only and bug fix releases.
Generate data sources
Follow procedures to update data sources:
Source code changes
- Merge all feature branches associated with the release into
ella-anno
andanno-targets
's respectivedev
branches (in this order!). Make sure that all CI tests pass before the merge. If any of the changes inanno-targets
depend on those inella-anno
, transient tagging (release candidate*-rc
) ofella-anno
'sdev
branch is advisable. - Merge
ella-anno
andanno-targets
'sdev
branches into the correspondingmaster
branches (in this order!).
Anno release artifact creation
-
Tag the
master
branch inella-anno
andanno-targets
(in this order!). This can be done in Gitlab webUI or locally withgit tag -a vX.X.X && git push origin vX.X.X
. This will trigger a CI job in Gitlab that will build the release image to be uploaded to DO. -
Unless the
ella-anno
release is data-only, create a new Gitlab release, using the tag just created. Include release notes from any data-only releases since the last Gitlab release. -
Download the release Singularity image from DO for source code updates via
anno-targets
'smaster
branch:make fetch-singularity-release DB_CREDS=$HOME/.db_creds
. Alternatively build the Docker image and then the Singularity image locally. Usemake help
to get building instructions. -
Create the data archive:
- Clone the
anno-targets
repository make build-annobuilder
- this is to make sure the correct image is usedmake download-[anno|amg]-package PKG_NAME=<package name>
(according to the data source table,anno
forella-anno
sources,amg
foranno-targets
sources)- alternatively,
make download-data
to download all data make tar-data PKG_NAMES=<package name 1>,<package name 2> TAR_OUTPUT=/anno/data/<name of tar-file>
(recommended formatsource1_version-source2_version.tar
). This will generate a tar file in the data directory. IfPKG_NAMES
is not specified, this will tar all the data in the data directory.
- Clone the
Anno deployment
The anno system is deployed in two locations and must be updated in each accordingly:
- TSD - deployed on
/ess/p22/data/durable/production/anno
- NSC - deployed on
/boston/diag/production(staging)/sw/annotation
An overview of the deployment procedure is shown in the figure below, for details, read on.
The anno system should be deployed in staging on TSD/NSC first. A re-analysis in ELLA staging should be run after deployment on TSD (always do this) to start an anno process and validate the new image. Once established that everything runs as expected, the anno system can be deployed in production, too.
Steps of anno deployment
-
Upload data
.tar
and release.sif
files to TSD/NSC (check our Wiki for instructions on usingtacl
for data upload). -
ssh
as service user intop22-hpc-03
on TSD or intogdx-login
on NSC. -
cd
intoanno
's root directory, which will be specific to each of the two locations (you have to repeat the following steps for each of them):ROOT_TSD=/ess/p22/data/durable/production/anno
on TSDROOT_NSC=/boston/diag/production(staging)/sw/annotation
on NSC
-
TSD-only: archive data. On TSD the archive and utility scripts are stored in
ROOT_TSD
. To archive:- move data
.tar
to$ROOT_TSD/archive/data
(on NSC you can place the data where it is accessible and remember to delete it after the update) - move release
.sif
to$ROOT_TSD/archive/releases
- move data
-
TSD-only: Check that there are no running jobs in the anno-service (run script
$ROOT_TSD/ops/num_active_jobs.sh
), and that there are no samples running in pipeline (ask the production responsible - this applies to NSC as well). -
TSD-only: stop the respective anno-service (production or staging) using the webUI accessible at
p22-anno-01:9000
. You will need to authenticate. To verify that the environment processes have in fact stopped one can compare before/after the output of (afterssh
ing intop22-anno-01
):# show info for all supervisord processes ps uww $(pgrep supervisord) # USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND # 1200 11084 0.0 0.0 28108 22880 ? S 10:23 0:04 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf # tsnowlan 11106 0.0 0.0 68244 23148 ? S 10:23 0:07 /dist/ella-python/bin/python3.7 /dist/ella-python/bin/supervisord -c /ella/ops/dev/supervisor.cfg
-
Update the
anno.sif
symlink's target to the new release.sif
-file, e.g. from$ROOT_TSD
:# to test the command ln -s path-to-new-image-vX.X.X.sif anno-prod/anno.sif # to override the current link ln -sf path-to-new-image-vX.X.X.sif anno-prod/anno.sif
-
Check the symlink's permissions (must be at least
774
) -
Check that the symlinked file is not corrupted and versions are correct (see Gitlab release issue):
-
-
Extract the new data: for each combination (ROOT_FS, ENVIRONMENT) in
[(tsd, staging), (tsd, prod)]
on TSD and[(boston, staging), (boston, prod)]
on NSC, run the following from anno's root directory (for NSC, add the prefixanno/
):\Warning
DO NOT UNTAR THE FILE DIRECTLY IN THE ANNO DATA DIRECTORY
# check options ops/update_data.sh -h # run the script ops/update_data.sh -t <path/to/tar-file> -e <ENVIRONMENT>
You may encounter issues with permissions during the data update. These will be typically because the script cannot move the previous version (for example
data/variantDB/clinvar
directory) to the.tmp
directory or it cannot remove the.tmp
directory itself. -
Verify that the data are untarred correctly. Check the respective instance's
data
directory (TSD:ROOT_TSD/{anno-prod,anno-staging}/data
; NSC staging:ROOT_NSC/data
; NSC production:ROOT_NSC/anno/anno/data
) for the:- Contents of
sources.json
- Contents of
vcfanno_config.toml
- Untarred data permissions inside the respective data source folders (might need a change to
774
)
- Contents of
-
TSD-only: start anno-service on TSD using the webUI accessible at
p22-anno-01:9000
.
Updates to IGV tracks in ELLA
If any of the sources mentioned in the table below have been updated, the corresponding tracks in ELLA will need to be updated as well:
Data source | Filename |
---|---|
ClinVar | clinvar-current.vcf.gz |
HGMD | hgmd-current.vcf.gz |
gnomAD exomes | gnomad.exomes-current.vcf.gz |
gnomAD genomes | gnomad.genomes-current.vcf.gz |
To update one of these tracks, e.g. the ClinVar track, do the following:
-
copy the files (
.vcf.gz
and.tbi
) from the fresh anno update to the ELLA tracks directory (those files are identical in prod and staging, you can use either) -
update symlinks for
clinvar-current.vcf.gz
andclinvar-current.vcf.gz.tbi
to point to the respective new filesclinvar_yyyymmdd.vcf.gz[.tbi]
-
update the version of the track in ELLA using track config webUI:
Info
ELLA track config manager is run on the VM p22-app-01
. If it is not up, navigate to /ess/p22/data/durable/production/ella/ops-compose
as service user from p22-app-01
and run task track-config-manager
from within a screen. The configuration of the tool can be found at /ess/p22/data/durable/production/ella/ops-compose/ella_trackcfg_mgr/config.txt
.
- Navigate to
p22-app-01:8585
in a browser on TSD (e.g. Chromium) - Select an instance (staging or prod) -- need to do this for both
- Click the
Edit Config
button on the top right of the view and choose Code inView
- Find the track in question (scroll or use
CTRL+f
) and update its version in thedescription
field - Tick
Write updated config
and click the button to save the new config - How to check that the Track config update worked as expected: navigate to respective ELLA instance, any sample, tab "VISUAL" (left panel, on top of variants table), in TRACK SELECTION hover over respective data source (for example, "Clinvar" in SNV section) - it should show the annotation and file version that you entered earlier in Track config.
Minimal test
Make sure that you have done this test: as a minimal sanity check use ELLA-staging and try creating a re-analysis.
To check that the right files are in fact used, one can grep for '^##' from the log at /ess/p22/data/durable/production/anno/anno-staging/work/*/VCFANNO/output.vcf
and then grep the updated database. See picture:
User announcement
Send a notification to the users about the contents of the update in Teams: GDx bioinformatikk - informasjon og kontaktskjema
.