HTS Bioinf - Create region files for the new capture kits
Scope
This procedure is to explain how to generate region files for a new capture kit. The region files will be used in the variant calling pipeline.
Responsibility
A bioinformatician is responsible for running commands that act on the input data to provide region files. The input files can be downloaded from the capture kit provider.
Procedure
Tools used: Linux commands, bedtools and in-house scripts.
- Obtain probe/bait regions from the capture kit provider (e.g. the probe/bait regions for Agilent SureSelect Human All
Exon capture kit is in
*_Covered.bed
). -
Create
noslop_bed
file in thevcpipe-bundle
, this will create a chromosome position sorted, non-overlapped, BED format region file:e.g. for Agilent SureSelect Human All Exon capture kit:
-
Create
noslop_list
file in thevcpipe-bundle
, this will create a chromosome position sorted, non-overlapped, LIST format region filee.g. for Agilent SureSelect Human All Exon capture kit:
grep “^@SQ" /bundle/genomic/gatkBundle_2.5/human_g1k_v37_decoy.dict > agilent_cre_v02.baits.list cat S30409818_Covered.bed |\ grep "^chr" |\ sort –k1,1V –k2,2n –k3,3n |\ sed 's/^chr//g' |\ bedtools merge –c 4 –o distinct –i - |\ awk 'FS="\t", OFS="\t" {print $1, $2, $3, "+", $4}' >> agilent_cre_v02.baits.list
-
Create
slop50_list
file in thevcpipe-bundle
(bedtools
should be in thePATH
and this is only applied for Agilent SureSelect Human All Exon capture kit):grep "^@SQ" /bundle/genomic/gatkBundle_2.5/human_g1k_v37_decoy.dict |\ cut -f2,3 |\ awk -F"[:\t]" 'OFS="\t" {print $2,$4}' > hg19.genome bedtools slop -b 50 -i agilent_cre_v02.baits.bed -g hg19.genome > agilent_cre_v02.baits.slop50.bed grep “^@SQ" /bundle/genomic/gatkBundle_2.5/human_g1k_v37_decoy.dict > agilent_cre_v02.baits.slop50.merged.list bedtools merge –c 4 –o distinct -i agilent_cre_v02.baits.slop50.bed |\ awk 'FS="\t", OFS="\t" {print $1, $2, $3, "+", $4}' >> agilent_cre_v02.baits.slop50.merged.list
-
Create a directory in the
captureKit
directory undervcpipe-bundle
repository for the capture kit, all the newly generated files need to be stored there.