HTS Bioinf - Copy Number Variation in exome pipelines
Summary
CNV calling is done as part of the exome and target pipelines as described in their respective specs (ref). In a nutshell, exCopyDepth calls any exon whose median coverage differs significantly (i.e. 1.5 standard deviations) from the median coverage across a number of reference samples (typically several batches prior to the current sample). cnvScan then annotates each call with various databases and counts occurrences in an in-house database. The resulting variant list is filtered by genes in the gene panel.
CNV worksheet in Excel report
The columns are described below:
Column name | Description |
---|---|
chr | Chromosome |
start | Start genomic location |
end | End genomic location |
cnv_state | Deletion (1) / Duplication (3) |
score | Default CNV prediction score |
len | Length of the CNV (in bp) |
inDB_count | In-house database CNV count |
inDB_MinMaxMedian | Minimum, Maximum and Median database score (CNVQ) |
gene_name | List of genes internal to the CNV. Genes completely internal to CNV are indicated with :F and Genes partially covered are indicated with :P |
gene_type | Gene type in GENCODE |
gene_id | Ensembl GeneID |
exon_count | Number of exons of genes partially covered by CNV |
UTR | UTRs of genes partially covered by CNV |
transcript | Transcripts of genes partially covered by CNV |
phastConElement_count | PhastCon element count |
phastConElement_minMax | Minimum and Maximum PhastCon element scores |
haplo_insufIdx_count | Haploinsufficiency count |
haplo_insufIdx_score | Haploinsufficiency score |
Gene_intolarance_score | Gene intolerance score |
sanger_cnv | Sanger CNV count |
dgv_cnv | DGV CNV count |
dgv_varType | DGV Type |
dgv_varSubType | DGV SubType |
dgv_pubmedId | DGV PubMedID |
DGV_Stringency2_count | Inclusive map CNV count |
DGV_Stringency2_PopFreq | |
DGV_Stringency12_count | Stringency map CNV count |
DGV_Stringency12_popFreq | |
1000g_del | 1000 genome deletion count |
1000g_ins | 1000 genome duplication count |
omim_morbidMap | OMIM gene |
ddd_mutConsequence | DECIPHER development disorder consequence |
ddd_diseaseName | DECIPHER development disorder |
ddd_pubmedId | DECIPHER development disorder PubMedID |
clinVar_disease | ClinVar CNV count |
hgvs_varName | HGVS name of the CNV if reported in ClinVar |
Data sources
Source | Information |
---|---|
Functionally significant information | |
GENCODE | HGNC Gene name |
Gene type | |
GeneID (Ensembl) | |
TranscriptID (Ensembl) | |
Exon count internal to CNV | |
UTR internal to CNV | |
PhastCon | PhastCon element count |
PhastCon element Scores (Minimum and Maximum) | |
Haploinsufficiency index | Haploinsufficiency score |
Gene intolerance | Gene intolerance score |
Known CNVs | |
Sanger high resolution CNVs | Sanger CNV count |
DGV: Database of Genomic Variants | DGV CNV count |
Variant type | |
Variant subtype | |
Pubmed ID | |
Curated high quality DGV | CNVs from 2 stringency levels |
CNV population frequencies | |
1000 Genomes CNVs | Deletions & Insertions |
Clinically relevant information | |
OMIM morbid map | OMIM disease |
Pubmed ID | |
DECIPHER | DECIPHER development disorder genes |
ClinVar | ClinVar disease |
HGVS name of the variant |
Filtering
Original study
cnvScan recommend additional filtering, where a call is kept if all the following conditions are satisfied:
rule | explanation |
---|---|
score >= 10 | Only higher scores are kept, as a higher score means a more confident call. |
inDBScore_MinMaxMedian[2] >= 10 (if available) | Similarly, the median of all the calls made in the in-house database, for this exon, should be higher than 10. |
DGV_Stringency2_count == NA or DGV_Stringency12_count == NA (if available) | Only calls not appearing in the Inclusing map or the Stringent map are kept. See Zarrei [3]. |
1000g_del == NA or 1000g_ins == NA (if avail) | Only calls not appearing in the 1000 Genome Project data are kept. |
Actual implementation
In our pipeline these rules were NOT kept. Instead the CNV calls across the entire exome are filtered down to the gene panel specified in a given analysis.
Software
exCopyDepth
exCopyDepth is a piece of software created by Pubudu. S. Samarakoon. It is published in Samarakoon [1] and the source code is described in the paper. It can call CNVs in a batch of samples.
In brief, the algorithm compares the median read depth per exon and reports the exons that deviate significatly from the normal coverage distribution. Those with lower coverage (with respect to the others in the batch) are marked as deletions; while those with higher coverage are marked as duplications.
Since waiting for 30+ samples is not suitable in diagnostic context, we adapted the running of the tool for one sample at a time. For this we collect the coverage statistics on a reference dataset (i.e. the previous few batches) and use it as background to which we compare each individaul sample.
cnvScan
cnvScan is a piece of software created by Pubudu. S. Samarakoon. It is published in Samarakoon [2] and the source code is available on github. The tool first annotates the CNV calls using various databases, as well as the number of times this CNV has been seen previously (i.e. count of occurrences in the in-house database). As a second step cnvScan uses ad hoc rules when filtering the CNVs.
We did not implement the filtering step, instead we sliced the calls to report only those in a given gene panel.
References
-
P. S. Samarakoon, H. S. Sorte, B. E. Kristiansen, T. Skodje, Y. Sheng, G. E. Tjønnfjord, B. Stadheim, A. Stray-Pedersen, O. K. Rødningen, and R. Lyle (2014) Identification of copy number variants from exome sequence data. , BMC Genomics , vol. 15, no. 1, p. 661, Jan. 2014. https://doi.org/10.1186/1471-2164-15-661
-
P. S. Samarakoon, H. S. Sorte, A. Stray-Pedersen, O. K. Rødningen, T. Rognes, and R. Lyle, (2106). cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data. , BMC Genomics , vol. 17, no. 1, p. 51. https://doi.org/10.1186/s12864-016-2374-2
-
Zarrei, M., MacDonald, J. R., Merico, D., & Scherer, S. W. (2015). A copy number variation map of the human genome. Nature Reviews Genetics , 16(3), 172–183. https://doi.org/10.1038/nrg3871
-
Huang, N., Lee, I., Marcotte, E. M., Hurles, M. E., & Nielsen, H. (2010). Characterising and Predicting Haploinsufficiency in the Human Genome. PLoS Genetics , 6(10), e1001154. https://doi.org/10.1371/journal.pgen.1001154