Skip to content

HTS Bioinf - Training for running pipeline

Scope

This procedure lists what a bioinformatician must do before being allowed to run and monitor pipeline analyses on patient samples.


  1. Read GATK best practice https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels -- use the latest version that fits with the pipeline -- as background material for an overall understanding of the different pipeline steps.

  2. Read the TSD help pages to learn how to login, where to find what files, etc., should this be unfamiliar.

  3. Read through the pipeline scripts and the automation system code. Get a tour by a trained bioinformatician who knows this well.

  4. Read the pipeline specification for the pipelines that will be run in the analyses. Look inside the mentioned scripts to understand how they work, study the output files and make a mental note of how they should look like.

  5. Get a user account on TSD, NSC and Clarity (see procedure [HTS Bioinf - Infrastructure]) if you haven't got one already.

  6. Ask TSD system administrators to add you into the following user groups if you aren't a member of them already: p22-diag-ous-bioinf-group, p22-diag-ous-lab-group, p22-export-group, p22-import-group.

  7. Set up prerequisites for using the lims-exporter-api.

    Each new user of lims-exporter-api needs to do a one time setup as following:

    1. Ask Clarity LIMS administrator to add you as Clarity API user.
    2. On gdx-login, copy /boston/diag/transfer/sw/.genosqlrc.yaml and /boston/diag/transfer/sw/tsd-import/src/lims_exporter_api/.genologicsrc to your $HOME directory.
    3. Replace USERNAME and PASSWORD with your own Clarity username and password the copy of .genologicsrc in your $HOME directory.
    4. On sleipnir, copy /boston/diag/transfer/sw/.s3cfg to your $HOME.
  8. Study the following procedures and sign ("lesekvittere") them:

    1. Clarity LIMS
    2. HTS - Mismatch between TaqMan SNP-ID and sequencing data
    3. HTS - Samples that fail QC in bioinformatic pipeline
    4. HTS - Use of reference materials for internal quality control
    5. HTS Bioinf - Demultiplexing and quality control of raw sequencing data
    6. HTS Bioinf - Deployment of vcpipe for production
    7. HTS Bioinf - Release and deployment of tsd-import
    8. HTS Bioinf - Execution and monitoring of pipeline
    9. HTS Bioinf - Storage and security of sensitive data
    10. HTS Bioinf - Basepipe pipeline
    11. HTS Bioinf - Trio pipeline
    12. HTS Bioinf - ELLA daily operations
    13. HTS Bioinf - anno with anno-targets
  9. Run two analyses together with a bioinformatician who has already passed training.

  10. Sign in "Opplæringssjekkliste" together with the bioinformatic coordinator or the production coordinator (see procedure [HTS Bioinf Group roles]).

Background

The bioinformatics pipeline is fully automated. Currently, the only manual steps involved are quality control of the raw sequencing data (FastQC), transfer of data from NSC to TSD, and monitoring that the pipeline executes and finishes successfully on TSD.

References