Skip to content

HTS Bioinf - Copy Number Variation in exome pipelines


CNV calling is done as part of the exome and target pipelines as described in their respective specs (ref). In a nutshell, exCopyDepth calls any exon whose median coverage differs significantly (i.e. 1.5 standard deviations) from the median coverage across a number of reference samples (typically several batches prior to the current sample). cnvScan then annotates each call with various databases and counts occurrences in an in-house database. The resulting variant list is filtered by genes in the gene panel.

CNV worksheet in Excel report

The columns are described below:

Column name Description
chr Chromosome
start Start genomic location
end End genomic location
cnv_state Deletion (1) / Duplication (3)
score Default CNV prediction score
len Length of the CNV (in bp)
inDB_count In-house database CNV count
inDB_MinMaxMedian Minimum, Maximum and Median database score (CNVQ)
gene_name List of genes internal to the CNV. Genes completely internal to CNV are indicated with :F and Genes partially covered are indicated with :P
gene_type Gene type in GENCODE
gene_id Ensembl GeneID
exon_count Number of exons of genes partially covered by CNV
UTR UTRs of genes partially covered by CNV
transcript Transcripts of genes partially covered by CNV
phastConElement_count PhastCon element count
phastConElement_minMax Minimum and Maximum PhastCon element scores
haplo_insufIdx_count Haploinsufficiency count
haplo_insufIdx_score Haploinsufficiency score
Gene_intolarance_score Gene intolerance score
sanger_cnv Sanger CNV count
dgv_cnv DGV CNV count
dgv_varType DGV Type
dgv_varSubType DGV SubType
dgv_pubmedId DGV PubMedID
DGV_Stringency2_count Inclusive map CNV count
DGV_Stringency12_count Stringency map CNV count
1000g_del 1000 genome deletion count
1000g_ins 1000 genome duplication count
omim_morbidMap OMIM gene
ddd_mutConsequence DECIPHER development disorder consequence
ddd_diseaseName DECIPHER development disorder
ddd_pubmedId DECIPHER development disorder PubMedID
clinVar_disease ClinVar CNV count
hgvs_varName HGVS name of the CNV if reported in ClinVar

Data sources

Source Information
Functionally significant information
Gene type
GeneID (Ensembl)
TranscriptID (Ensembl)
Exon count internal to CNV
UTR internal to CNV
PhastCon PhastCon element count
PhastCon element Scores (Minimum and Maximum)
Haploinsufficiency index Haploinsufficiency score
Gene intolerance Gene intolerance score
Known CNVs
Sanger high resolution CNVs Sanger CNV count
DGV: Database of Genomic Variants DGV CNV count
Variant type
Variant subtype
Pubmed ID
Curated high quality DGV CNVs from 2 stringency levels
CNV population frequencies
1000 Genomes CNVs Deletions & Insertions
Clinically relevant information
OMIM morbid map OMIM disease
Pubmed ID
DECIPHER DECIPHER development disorder genes
ClinVar ClinVar disease
HGVS name of the variant


Original study

cnvScan recommend additional filtering, where a call is kept if all the following conditions are satisfied:

rule explanation
score >= 10 Only higher scores are kept, as a higher score means a more confident call.
inDBScore_MinMaxMedian[2] >= 10 (if available) Similarly, the median of all the calls made in the in-house database, for this exon, should be higher than 10.
DGV_Stringency2_count == NA or DGV_Stringency12_count == NA (if available) Only calls not appearing in the Inclusing map or the Stringent map are kept. See Zarrei [3].
1000g_del == NA or 1000g_ins == NA (if avail) Only calls not appearing in the 1000 Genome Project data are kept.

Actual implementation

In our pipeline these rules were NOT kept. Instead the CNV calls across the entire exome are filtered down to the gene panel specified in a given analysis.



exCopyDepth is a piece of software created by Pubudu. S. Samarakoon. It is published in Samarakoon [1] and the source code is described in the paper. It can call CNVs in a batch of samples.

In brief, the algorithm compares the median read depth per exon and reports the exons that deviate significatly from the normal coverage distribution. Those with lower coverage (with respect to the others in the batch) are marked as deletions; while those with higher coverage are marked as duplications.

Since waiting for 30+ samples is not suitable in diagnostic context, we adapted the running of the tool for one sample at a time. For this we collect the coverage statistics on a reference dataset (i.e. the previous few batches) and use it as background to which we compare each individaul sample.


cnvScan is a piece of software created by Pubudu. S. Samarakoon. It is published in Samarakoon [2] and the source code is available on github. The tool first annotates the CNV calls using various databases, as well as the number of times this CNV has been seen previously (i.e. count of occurrences in the in-house database). As a second step cnvScan uses ad hoc rules when filtering the CNVs.

We did not implement the filtering step, instead we sliced the calls to report only those in a given gene panel.


  1. P. S. Samarakoon, H. S. Sorte, B. E. Kristiansen, T. Skodje, Y. Sheng, G. E. Tjønnfjord, B. Stadheim, A. Stray-Pedersen, O. K. Rødningen, and R. Lyle (2014) Identification of copy number variants from exome sequence data. , BMC Genomics , vol. 15, no. 1, p. 661, Jan. 2014.

  2. P. S. Samarakoon, H. S. Sorte, A. Stray-Pedersen, O. K. Rødningen, T. Rognes, and R. Lyle, (2106). cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data. , BMC Genomics , vol. 17, no. 1, p. 51.

  3. Zarrei, M., MacDonald, J. R., Merico, D., & Scherer, S. W. (2015). A copy number variation map of the human genome. Nature Reviews Genetics , 16(3), 172–183.

  4. Huang, N., Lee, I., Marcotte, E. M., Hurles, M. E., & Nielsen, H. (2010). Characterising and Predicting Haploinsufficiency in the Human Genome. PLoS Genetics , 6(10), e1001154.

Other documents

HTS Bioinf - Update sensitive databases