Skip to content

HTS Bioinf - Release and deployment of the Anno system

Terms and definitions

Anno system: system built on ella-anno and anno-targets Gitlab repositories and their deployed releases.

Release: by design, a release is defined as an anno-targets tagged image, which automatically includes the latest ella-anno tagged image.

DO: Digital Ocean cloud spaces.

Deployment checklist for the reviewer

  • Check that anno.sif symlinks to the right image for both prod and staging on TSD and NSC
  • Check that anno.sif has the right permissions (should be at least 774)
  • Make sure the vcfanno_config.toml file points to the right data source (if updated, check the release notes) and has the right permissions (should be at least 774)
  • For the Anno-service on TSD check that the anno process was restarted (if not, do restart it)
  • Finally, check if the minimal test (a re-analysis in ELLA staging) was run successfully

Components of the anno system

A release includes three components for each repository (see figure below):

  1. Source code (main and helper scripts and utilities)
  2. Data sources (annotation data in datasets.json file)
  3. Docker/Singularity image (built automatically by a CI job)

figure-anno-3

Anno release steps overview

The release steps can be grouped into the following stages:

I. Prepare

  • Create release issue from template
  • Write "Endringskontroll" (major/minor code changes)
  • Verify credentials (DO, HGMD, NCBI ClinVar API keys)
  • Generate and upload data to DO (via ella-anno or anno-targets)
  • ella-anno and anno-targets dev - MRs merged and all tests PASS
  • ella-anno and anno-targets dev - merge to master

II. Fetch

  • Tag anno-targets and/or ella-anno (or re-use latest existing tag for the latter)
  • Fetch the anno-targets singularity image (.sif) built by the CI job
  • Download data from DO and prepare data .tar archive

III. Deploy

  • Upload release artifacts (data .tar, release .sif) to TSD/NSC
  • Make sure no analyses are running
  • Deploy first in staging, then in prod
  • Stop Supervisor of the respective anno-service on TSD
  • Unpack data .tar in the respective instances (ops/update_data.sh)
  • Place new .sif into the archive and symlink it to anno.sif
  • Check anno.sif and unpacked data permissions (at least 774)
  • Re-start Supervisor on TSD
  • Update tracks in ELLA if needed

IV. Test

V. Announce

Anno release preparation

Create Gitlab release issue (except for data-only release)

The release responsible (typically a system owner) creates a release issue from the template in anno-targets to track and document all steps of release preparation and deployment. We skip this step for the routine data-only and bug fix releases.

Generate annotation data

Update data sources following the respective procedures:

Source code changes

Merge all feature branches associated with the release into ella-anno and anno-targets's respective dev branches (in this order!). Make sure that all CI tests pass before the merge. If any of the changes in anno-targets depend on those in ella-anno, transient tagging (release candidate *-rc) of ella-anno's dev branch is advisable. If deemed necessary, transiently tag (release candidate *-rc) anno-targets's dev branch and deploy the release candidate artifact to staging (see below) for field-testing.

NOTE: Tags can be created in Gitlab's webUI or locally issuing git tag -a <tag> && git push origin <tag>.

Anno release artifact creation

  1. Merge ella-anno and anno-targets's dev branches into the corresponding master branches and tag them with a new version (in this order, merge >>> tag >>> merge >>> tag). The last tag will trigger a CI job in Gitlab that will build the release image to be uploaded to DO.

    Version tag memo: M.m.p[l]
    Defined by most significant change, where
    M - major
    m - minor
    p - patch
    l - letter for data-only changes
    
  2. Unless the ella-anno release is data-only, create a new ella-anno Gitlab release, using the tag just created. Include release notes from any data-only releases since the last Gitlab release.

  3. Download the release Singularity image from DO for source code updates via anno-targets's master branch: make fetch-singularity-release DB_CREDS=$HOME/.db_creds. Alternatively build the Docker image and then the Singularity image locally. Use make help to get building instructions.

  4. Create the data archive:

    • Clone the anno-targets repository
    • make build-annobuilder - this is to make sure the correct image is used
    • make download-[anno|amg]-packag DB_CREDS=$HOME/.db_creds PKG_NAME=<package name> (according to the data source table, anno for ella-anno sources, amg for anno-targets sources)
    • alternatively, make download-data to download all data
    • make tar-data PKG_NAMES=<package name 1>,<package name 2> TAR_OUTPUT=/anno/data/<name of tar-file> (recommended format <package name 1>_<version>[-<package name 2>_<version>...].tar). This will generate a tar file in the data directory. If PKG_NAMES is not specified, this will tar all the data in the data directory.

Anno deployment

The anno system is deployed in two locations and must be updated in each accordingly:

  • TSD - ANNO_ROOT=/ess/p22/data/durable/production/anno/anno-{prod,staging}
  • NSC - ANNO_ROOT=/boston/diag/{production,staging}/sw/annotation/anno

Below is an overview of the deployment procedure. For the details, read on.

  • Check that no analyses are running
  • Stop the anno-service from Supervisor on TSD
  • Change to service user
  • Place the data tarball in DATA_ARCHIVE and the Singularity image in SIF_ARCHIVE
  • Run ${BASE}/ops/update_data.sh -t ${DATA_ARCHIVE}/<data tar> -i ${INSTANCE} -p ${PLATFORM}
  • Link the image: ln -fs ${SIF_ARCHIVE}/<anno sif> ${ANNO_ROOT}/anno.sif
  • Verify the image: singularity exec ${ANNO_ROOT}/anno.sif cat /anno{-targets,}/version
  • Verify the data [files sources.json and vcfanno_config.toml in ANNO_DATA]
  • Once done, restart the anno-service from Supervisor on TSD
TSD NSC
BASE /ess/p22/data/durable/production/anno /boston/diag
DATA_ARCHIVE ${BASE}/archive/data N/A (archived on TSD)
SIF_ARCHIVE ${BASE}/archive/releases ${BASE}/production/sw/archive/releases
PRODUCTION
ANNO_ROOT ${BASE}/anno-prod ${BASE}/production/sw/annotation/anno
INSTANCE prod prod
PLATFORM tsd nsc
STAGING
ANNO_ROOT ${BASE}/anno-staging ${BASE}/staging/sw/annotation/anno
INSTANCE staging staging
PLATFORM tsd nsc

The anno system should be deployed in staging on TSD/NSC first. A re-analysis in ELLA staging should be run after deployment on TSD (always do this) to start an anno process and validate the new image. Once established that everything runs as expected, the anno system can be deployed in production, too.

Steps of Anno's deployment

  1. Upload data .tar and release .sif files to TSD/NSC (check our Wiki for instructions on using tacl for data upload).

  2. ssh as service user into the production VM.

  3. move release .sif to SIF_ARCHIVE (production only).

  4. check that there are no analyses running in the pipeline (ask the responsible for operations).

  5. [TSD-only] check that no anno-service analyses are running (use $BASE/ops/num_active_jobs.sh).

  6. [TSD-only] move data .tar to $BASE/archive/data (on NSC, just place it somewhere accessible and remember to delete it after the update).

  7. [TSD-only] stop the respective anno-service (production or staging) using the webUI accessible at p22-anno-01:9000 (you may need to authenticate). To verify that the environment processes have in fact stopped one can (after sshing into p22-anno-01) compare the output of the following commands before and after:

    # check for running supervisord processes
    pgrep supervisord | wc -l
    # 5
    
    # show info for all supervisord processes
    ps uww $(pgrep supervisord)
    # USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    # 1200       11084  0.0  0.0  28108 22880 ?        S    10:23   0:04 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf
    # tsnowlan   11106  0.0  0.0  68244 23148 ?        S    10:23   0:07 /dist/ella-python/bin/python3.7 /dist/ella-python/bin/supervisord -c /ella/ops/dev/supervisor.cfg
    
  8. Update the anno.sif symlink's target to the new release .sif-file:

    # to test the command
    ln -s path-to-new-image-vX.X.X.sif ${ANNO_ROOT}/anno.sif
    # to override the current link
    ln -sf path-to-new-image-vX.X.X.sif ${ANNO_ROOT}/anno.sif
    
    • Check the symlink's permissions (must be at least 774)

    • Verify that the symlinked file is not corrupted and that the versions of the software are correct (see Gitlab release issue):

      singularity exec ${ANNO_ROOT}/anno.sif cat /anno/version /anno-targets/version
      
  9. Extract the new data: for each combination (PLATFORM, INSTANCE) in [(tsd, staging), (tsd, prod)] on TSD and [(nsc, staging), (nsc, prod)] on NSC, you'll have to run the following:

    # check options
    ${BASE}/ops/update_data.sh -h
    # run the script
    ${BASE}/ops/update_data.sh -t <path/to/tar-file> -i <INSTANCE>
    

    Warning

    DO NOT UNTAR THE FILE DIRECTLY IN THE ANNO DATA DIRECTORY

    You may encounter issues with permissions during the data update. These will be typically because the script cannot move the previous version (for example the data/variantDB/clinvar directory) to the .tmp directory or it cannot remove the .tmp directory itself.

  10. Verify that the data are untarred correctly. Check the respective instance's ${ANNO_ROOT}/data directory for the:

    • Contents of sources.json
    • Contents of vcfanno_config.toml
    • Untarred data permissions inside the respective data source directories (should be 774)
  11. [TSD-only] restart the respective anno-service using the webUI accessible at p22-anno-01:9000.

Updates to IGV tracks in ELLA

If any of the sources mentioned in the table below have been updated, the corresponding tracks in ELLA will need to be updated as well:

Data source Filename
ClinVar clinvar-current.vcf.gz
HGMD hgmd-current.vcf.gz
gnomAD exomes gnomad.exomes-current.vcf.gz
gnomAD genomes gnomad.genomes-current.vcf.gz

To update one of these tracks, e.g. the ClinVar track, do the following:

  • copy the files (.vcf.gz and .tbi) from the fresh anno update to the ELLA tracks directory (those files are identical in prod and staging, you can use either)

    cp /ess/p22/data/durable/production/anno/anno-(prod|staging)/data/variantDBs/clinvar/clinvar_yyyymmdd.vcf.gz* /ess/p22/data/durable/production/ella/ella-(prod|staging)/data/igv-data/tracks
    
  • update symlinks for clinvar-current.vcf.gz and clinvar-current.vcf.gz.tbi to point to the respective new files clinvar_yyyymmdd.vcf.gz[.tbi]

    ln -s (--force) clinvar_yyyymmdd.vcf.gz clinvar-current.vcf.gz
    ln -s (--force) clinvar_yyyymmdd.vcf.gz.tbi clinvar-current.vcf.gz.tbi
    
  • update the version of the track in ELLA using ELLA's track config webUI: refer to HTS Bioinf - Using ELLA track config manager for the instructions on how to use ELLA track config manager.

Minimal test

Make sure that you have done this test: as a minimal sanity check use ELLA-staging and try manually importing an analysis.

To check that the right files are in fact used, one can grep for '^##' from the log at /ess/p22/data/durable/production/anno/anno-staging/work/*/VCFANNO/output.vcf and then grep the updated database. See picture:

figure-anno-5

User announcement

Send a notification to the users about the contents of the update in Teams: GDx driftsmeldinger.