HTS Bioinf - Create region files for new capture kits
Scope
Described herein is the procedure for generating region files for a new capture kit to be used in the variant calling pipeline.
Responsibility
A bioinformatician is responsible for running the commands that act on the input data to generate region files for the new capture kit. The input files can be downloaded from the capture kit provider.
Procedure
Tools used: Linux commands, BED tools and in-house scripts.
-
Obtain probe/bait regions from the capture kit provider (e.g. the probe/bait regions for the Agilent SureSelect Human All Exon capture kit is in
*_Covered.bed
). -
Create
noslop_bed
file in thevcpipe-bundle
, this will create a chromosome position-sorted, non-overlapped, BED format region file:e.g., for the Agilent SureSelect Human All Exon capture kit:
-
Create
noslop_list
file in thevcpipe-bundle
, this will create a chromosome position sorted, non-overlapped, LIST format region file:e.g., for the Agilent SureSelect Human All Exon capture kit:
grep "^@SQ" /bundle/genomic/gatkBundle_2.5/human_g1k_v37_decoy.dict > agilent_cre_v02.baits.list cat S30409818_Covered.bed \ | grep "^chr" \ | sort -k1,1V -k2,2n -k3,3n \ | sed 's/^chr//g' \ | bedtools merge -c 4 -o distinct -i - \ | awk 'FS = "\t", OFS = "\t" {print $1, $2, $3, "+", $4}' \ >> agilent_cre_v02.baits.list
-
Create
slop50_list
file in thevcpipe-bundle
(the directory holding thebedtools
executable should be inPATH
):e.g., for the Agilent SureSelect Human All Exon capture kit:
grep "^@SQ" /bundle/genomic/gatkBundle_2.5/human_g1k_v37_decoy.dict \ | cut -f2,3 \ | awk -F"[:\t]" 'OFS="\t" {print $2, $4}' \ > hg19.genome bedtools slop -b 50 -i agilent_cre_v02.baits.bed -g hg19.genome \ > agilent_cre_v02.baits.slop50.bed grep "^@SQ" /bundle/genomic/gatkBundle_2.5/human_g1k_v37_decoy.dict \ > agilent_cre_v02.baits.slop50.merged.list bedtools merge -c 4 -o distinct -i agilent_cre_v02.baits.slop50.bed \ | awk 'FS = "\t", OFS = "\t" {print $1, $2, $3, "+", $4}' \ >> agilent_cre_v02.baits.slop50.merged.list
-
Create a directory for the new capture kit in the
captureKit
directory of thevcpipe-bundle
repository and store all the newly generated files there.