Skip to content

HTS Bioinf - Trio pipeline

Summary

The trio pipeline is a pipeline implemented in Nextflow.io for processing trio data. It is a joint analysis of sequencing data from a proband (child) and her/his mother and father to allow detection of de novo variants or proband-specific recessive variants. This document describes the data processing in the pipeline environment under the assumption that basepipe pipelines were run successfully for all members of the trio.

Responsibility

Execution of the production pipeline is performed by a bioinformatician from "Enhet for Genomdiagnostikk" (GDx). The bioinformatician must have been trained for production according to procedure HTS Bioinf - Training for running pipeline.


Trio pipeline implementation

Input

The important inputs are .g.vcf.gz files generated by the basepipe pipeline for all three members of the trio.

Output

The important output files of the pipeline are explained below:

  • The joint variant calling VCF file after quality annotation
  • Gender and pedigree check results

Usage and requirements

The pipeline is implemented in the vcpipe repository. It is primarily meant to be run as part of the automation system included. For more information see scripts in vcpipe repository. Its dependencies include vcpipe-essentials, vcpipe-refdata and sensitive-db.

Pipeline stages

Pre-processing

In this stage, the utility code verifies that:

  • the .analysis and .sample configuration files are sane;
  • the pedigree described in those configuration files is consistent;
  • gene panel, capture kit, and bundle files are present;
  • basepipe results are available.

A pedigree .ped file is created at this stage.

Joint variant calling stage

For samples processed with the GATK pipeline, the .g.vcf.gz files generated by the basepipe pipeline for proband, mother and father are merged in the GenotypeGVCFs step, following of the best practice workflow for variant discovery.

After that, variant filtering is performed in the same way as for single samples. See procedure HTS Bioinf – Basepipe pipeline for details on tools and specifications.

For samples processed with the Dragen pipeline, the .g.vcf.gz files generated by the basepipe pipeline for proband, mother and father are merged by Dragen software based on DRAGEN recommendations (default settings unless explicitly set).

Input: .g.vcf.gz files generated for all three members of the trio by the basepipe pipeline

Output: the joint variant calling VCF file after quality annotation.

Processes included in the Nextflow script: triopipe_variantcalling_join and dragen_trio

Merge mitochondrial variants stage

The mitochondrial SNP and small indels VCF files generated by the basepipe pipeline for proband and mother are merged using BCFtools.

Input: mitochondrial variants in .vcf.gz files generated by the basepipe pipeline for proband and mother.

Output: the joint variant calling VCF file after quality annotation

Processes included in the Nextflow script: triopipe_mt_variantcalling_join

Gender and pedigree test stage

The open source software VCFped enables detection of trios and close pairwise relationships alongside sex estimation in a multi-sample VCF file. A .ped format pedigree file can be provided to VCFped for comparison. The test fails if its results do not match the provided pedigree. Sex is estimated by calculating the heterozygosity rate in chromosome X. When the rate is lower than 10%, the estimate is male. When the rate exceeds 25%, the estimate is female.

Input: the joint variant calling VCF file after quality annotation and a .ped format pedigree file

Output: outputs from VCFped

Processes included in the Nextflow script: pedigree_gender_check

Integrated quality control report stage

The sex and pedigree test results will be incorporated into different quality control reports with detailed quality control metrics for all three members of the trio.

Input: results from VCFped and quality control results for all members of the trio

Output: .qc_result.json, .qc_report.md and .qc_warnings.md

hap.py stage

In the hap.py stage, the predicted SNP and small indel results from the joint variant calling will be compared with the high confidence SNP and small indel calls provided by GIAB. This stage is only applied on control samples (HG002, HG003 and HG004). The results will be used for the samples that need trend analysis (see procedure: HTS - Use of NA samples for quality control).

Input: Quality annotated VCF files for the whole capture kit region

Output: Sensitivity and PPV in the defined region

POST command stage

In this stage, the quality control parameters will be checked and the files listed in the configuration files will be copied to the durable/production/data/analyses-results/trios directory.

Input: No input (this stage will start only when all other processes are finished)

Output: No output

Processes included in the Nextflow script: postcmd_triopipe

References

  1. VCFped: https://github.com/magnusdv/VCFped