Skip to content

HTS Bioinf - Release and deployment of anno system

Terms and definitions

Anno system: system built on ella-anno and anno-targets Gitlab repositories and their deployed releases.

Release: by design, a release is defined as an anno-targets tagged image, which automatically includes the latest ella-anno tagged image.

DO: Digital Ocean cloud spaces.

Deployment checklist for the reviewer

  • Check that anno.sif symlinks to the right image for both prod and staging on TSD and NSC
  • Check that anno.sif has the right permissions (should be at least 774)
  • Make sure the vcfanno_config.toml file points to the right data source (if updated, check the release notes) and has the right permissions (should be at least 774)
  • For the Anno-service on TSD check that the anno process was restarted (if not, do restart it)
  • Finally, check if the minimal test (a re-analysis in ELLA staging) was run successfully

Components of the anno system

A release includes three components for each repository (see figure below):

  1. Source code (main and helper scripts and utilities)
  2. Data sources (annotation data in datasets.json file)
  3. Docker/Singularity image (built automatically by a CI job)

figure-anno-3

Anno release steps overview

The release steps can be grouped into the following stages:

I. Prepare

  • Create release issue from template
  • Write "Endringskontroll" (major/minor code changes)
  • Verify credentials (DO, HGMD, NCBI ClinVar API key)
  • Generate and upload data to DO (via ella-anno or anno-targets)
  • ella-anno and anno-targets dev - MRs merged and all tests PASS
  • ella-anno and anno-targets dev - merge to master

II. Fetch

  • Tag anno-targets and/or ella-anno (or re-use latest existing tag for the latter)
  • Fetch the anno-targets singularity image (.sif) built by the CI job
  • Download data from DO and prepare data .tar archive

III. Deploy

  • Upload release artifacts (data .tar, release .sif) to TSD/NSC
  • Make sure no analyses are running
  • Deploy first in staging, then in prod
  • Stop Supervisor of the respective anno-service on TSD
  • Unpack data .tar in the respective instances (ops/update_data.sh)
  • Place new .sif into the archive and symlink it to anno.sif
  • Check anno.sif and unpacked data permissions (at least 774)
  • Re-start Supervisor on TSD
  • Update tracks in ELLA if needed

IV. Test

V. Announce

Anno release preparation

Create Gitlab release issue (except for data-only release)

The release responsible (typically a system owner) creates a release issue from the template in anno-targets to track and document all steps of release preparation and deployment. We skip this step for the routine data-only and bug fix releases.

Generate data sources

Follow procedures to update data sources:

Source code changes

  • Merge all feature branches associated with the release into ella-anno and anno-targets's respective dev branches (in this order!). Make sure that all CI tests pass before the merge. If any of the changes in anno-targets depend on those in ella-anno, transient tagging (release candidate *-rc) of ella-anno's dev branch is advisable.
  • Merge ella-anno and anno-targets's dev branches into the corresponding master branches (in this order!).

Anno release artifact creation

  1. Tag the master branch in ella-anno and anno-targets (in this order!). This can be done in Gitlab webUI or locally with git tag -a vX.X.X && git push origin vX.X.X. This will trigger a CI job in Gitlab that will build the release image to be uploaded to DO.

    Version tag memo: M.m.p[l]
    Defined by most significant change, where
    M - major
    m - minor
    p - patch
    l - letter for data-only changes
    
  2. Unless the ella-anno release is data-only, create a new Gitlab release, using the tag just created. Include release notes from any data-only releases since the last Gitlab release.

  3. Download the release Singularity image from DO for source code updates via anno-targets's master branch: make fetch-singularity-release. Alternatively build the docker image and then singulariy image, use make help to look up those commands.

  4. Create the data archive:

    • Clone the anno-targets repository
    • make build-annobuilder - this is to make sure the correct image is used
    • make download-[anno|amg]-package PKG_NAME=<package name> (according to the data source table, anno for ella-anno sources, amg for anno-targets sources)
    • alternatively, make download-data to download all data
    • make tar-data PKG_NAMES=<package name 1>,<package name 2> TAR_OUTPUT=/anno/data/<name of tar-file> (recommended format source1_version-source2_version.tar). This will generate a tar file in the data directory. If PKG_NAMES is not specified, this will tar all the data in the data directory.

Anno deployment

The anno system is deployed in two locations and must be updated in each accordingly:

  • TSD - deployed on /ess/p22/data/durable/production/anno
  • NSC - deployed on /boston/diag/production(staging)/sw/annotation

An overview of the deployment procedure is shown in the figure below, for details, read on.

figure-anno-4

The anno system should be deployed in staging on TSD/NSC first. A re-analysis in ELLA staging should be run after deployment on TSD (always do this) to start an anno process and validate the new image. Once established that everything runs as expected, the anno system can be deployed in production, too.

Steps of anno deployment

  1. Upload data .tar and release .sif files to TSD/NSC (check our Wiki for instructions on using tacl for data upload).

  2. ssh as service user into p22-submit-dev on TSD or into beta on NSC.

  3. cd into anno's root directory, which will be specific to each of the two locations (you have to repeat the following steps for each of them):

    • ROOT_TSD=/ess/p22/data/durable/production/anno on TSD
    • ROOT_NSC=/boston/diag/production(staging)/sw/annotation on NSC
  4. TSD-only: archive data. On TSD the archive and utility scripts are stored in ROOT_TSD. To archive:

    • move data .tar to $ROOT_TSD/archive/data (on NSC you can place the data where it is accessible and remember to delete it after the update)
    • move release .sif to $ROOT_TSD/archive/releases
  5. TSD-only: Check that there are no running jobs in the anno-service (run script $ROOT_TSD/ops/num_active_jobs.sh), and that there are no samples running in pipeline (ask the production responsible - this applies to NSC as well).

  6. TSD-only: stop the respective anno-service (production or staging) using the webUI accessible at p22-anno-01:9000. You will need to authenticate. To verify that the environment processes have in fact stopped one can compare before/after the output of (after sshing into p22-anno-01):

    # check for running supervisord processes
    pgrep supervisord | wc -l
    # 5
    
    # show info for all supervisord processes
    ps uww $(pgrep supervisord)
    # USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    # 1200       11084  0.0  0.0  28108 22880 ?        S    10:23   0:04 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf
    # tsnowlan   11106  0.0  0.0  68244 23148 ?        S    10:23   0:07 /dist/ella-python/bin/python3.7 /dist/ella-python/bin/supervisord -c /ella/ops/dev/supervisor.cfg
    
  7. Update the anno.sif symlink's target to the new release .sif-file, e.g. from $ROOT_TSD:

    # to test the command
    ln -s path-to-new-image-vX.X.X.sif anno-prod/anno.sif
    # to override the current link
    ln -sf path-to-new-image-vX.X.X.sif anno-prod/anno.sif
    
    • Check the symlink's permissions (must be at least 774)

    • Check that the symlinked file is not corrupted and versions are correct (see Gitlab release issue):

      singularity exec anno.sif cat /anno/version /anno-targets/version
      
  8. Extract the new data: for each combination (ROOT_FS, ENVIRONMENT) in [(tsd, staging), (tsd, prod)] on TSD and [(boston, staging), (boston, prod)] on NSC, run the following from anno's root directory (for NSC, add the prefix anno/):\

    Warning

    DO NOT UNTAR THE FILE DIRECTLY IN THE ANNO DATA DIRECTORY

    # check options
    ops/update_data.sh -h
    # run the script
    ops/update_data.sh -t <path/to/tar-file> -e <ENVIRONMENT>
    

    You may encounter issues with permissions during the data update. These will be typically because the script cannot move the previous version (for example data/variantDB/clinvar directory) to the .tmp directory or it cannot remove the .tmp directory itself.

  9. Verify that the data are untarred correctly. Check the respective instance's data directory (TSD: ROOT_TSD/{anno-prod,anno-staging}/data; NSC staging: ROOT_NSC/data; NSC production: ROOT_NSC/anno/anno/data) for the:

    • Contents of sources.json
    • Contents of vcfanno_config.toml
    • Untarred data permissions inside the respective data source folders (might need a change to 774)
  10. TSD-only: start anno-service on TSD using the webUI accessible at p22-anno-01:9000.

Updates to IGV tracks in ELLA

If any of the sources mentioned in the table below have been updated, the corresponding tracks in ELLA will need to be updated as well:

Data source Filename
ClinVar clinvar-current.vcf.gz
HGMD hgmd-current.vcf.gz
gnomAD exomes gnomad.exomes-current.vcf.gz
gnomAD genomes gnomad.genomes-current.vcf.gz

To update one of these tracks, e.g. the ClinVar track, do the following:

  • copy the files (.vcf.gz and .tbi) from the fresh anno update to the ELLA tracks directory (those files are identical in prod and staging, you can use either)

    cp /ess/p22/data/durable/production/anno/anno-(prod|staging)/data/variantDBs/clinvar/clinvar_yyyymmdd.vcf.gz* /ess/p22/data/durable/production/ella/ella-(prod|staging)/data/igv-data/tracks
    
  • update symlinks for clinvar-current.vcf.gz and clinvar-current.vcf.gz.tbi to point to the respective new files clinvar_yyyymmdd.vcf.gz[.tbi]

    ln -s (--force) clinvar_yyyymmdd.vcf.gz clinvar-current.vcf.gz
    ln -s (--force) clinvar_yyyymmdd.vcf.gz.tbi clinvar-current.vcf.gz.tbi
    
  • update the version of the track in ELLA using track config webUI:

Info

ELLA track config manager is run on the VM p22-app-01. See /ess/p22/data/durable/services/tuls-screen-commands.txt/ for how to start / stop it, remember to do so as service user from p22-app-01. The configuration of the tool can be found at ess/p22/data/durable/services/ella_trackcfg_mgr/config.txt.

  • Navigate to p22-app-01:9274 in a browser on TSD (e.g. Chromium)
  • Select an instance (staging or prod) -- need to do this for both
  • Click the Edit Config button on the top right of the view and choose Code in View
  • Find the track in question (scroll or use CTRL+f) and update its version in the description field
  • Tick Write updated config and click the button to save the new config
  • How to check that the Track config update worked as expected: navigate to respective ELLA instance, any sample, tab "VISUAL" (left panel, on top of variants table), in TRACK SELECTION hover over respective data source (for example, "Clinvar" in SNV section) - it should show the annotation and file version that you entered earlier in Track config.

Minimal test

Make sure that you have done this test: as a minimal sanity check use ELLA-staging and try creating a re-analysis.

To check that the right files are in fact used, one can grep for '^##' from the log at /ess/p22/data/durable/production/anno/anno-staging/work/*/VCFANNO/output.vcf and then grep the updated database. See picture:

figure-anno-5

User announcement

Send a notification to the users about the contents of the update in Teams: GDx bioinformatikk - informasjon og kontaktskjema.