Tools for postprocessing of structural variants
Standardizing a VCF file from our CNV-callers
The runnable script sv_standardizer
will print a standardized VCF to stdout
when executed as
where
VCF_FILENAME
is the VCF from one of the callers["manta", "delly", "tiddit", "cnvnator", "svaba", "canvas", "cnv_sv"]
or one of the end states["merged", "header", "ella"]
- If no filename is given, the function will take input from
stdin
CALLER_NAME
explicitly sets the source of theVCF_FILENAME
CALLER_NAME
can be omitted if theVCF_FILENAME
starts with{CALLER_NAME}_
SAMPLE_NAMES
is a comma-separated list of SAMPLE column names to ensure equal namin for all callers- It is expected that the order of SAMPLE columns is the same for all callers
Note: A hack is used for the standardization of a file in the "merged" state for filenames that contain Diag-
in order to remove unwanted INFO fields from v2.8.2 of SVDB merge
Postprocessing of the FILTER column
The FILTER column can be updated when a variant record fails a quality test.
Postprocessing is needed for callers such as CNVnator
, Delly
, Manta
and the Dragen SV and CNV callers.
The runnable script sv_postprocessing
will print a VCF with updated FILTER column to stdout
when executed as
sv_postprocessing STANDARDIZED_MERGED_VCF --set-filters WGS_SVPARAMS_FILTER --caller-priority CALLERS --set-filter-descriptions SET_FILTER_DESCRIPTIONS
where
STANDARDIZED_MERGED_VCF
has been standardized bysv_standardizer
and merged by SVDB- If no filename is given, the function will take input from
stdin
WGS_SVPARAMS_FILTER
is a JSON-formatted string defined in analysistypeconfig.schema.json underWGS.svparams.filter
. There is no additional schema insv_postprocessing
to check the filter definitions.CALLERS
is a prioritized list of callers. Only the filter of the most prioritized caller will be used.- Optional:
SET_FILTER_DESCRIPTIONS
is a dictionary{FILTER_NAME: "Description of filter"}
for the VCF header.FILTER_NAME
s are the keys ofWGS_SVPARAMS_FILTER
. If not set, the description will be the generic: "Custom filterFILTER_NAME
".
Filtering based on frequency or selecting a interpretation group
Filtering means to remove records from the VCF based on certain conditions
- remove variants with high frequencies (set frequency filters). Exceptions are allowed (set rescue filters)
- return a subset of variants (set interpretation group)
- in debug mode, high frequency variants are not removed, but frequency tags are added to the FILTER column based on frequencies (set frequency filters)
sv_wgs_filtering STANDARDIZED_MERGED_VCF \
--output-format OUTPUT_FORMAT \
--set-interpretation-group SET_INTERPRETATION_GROUP \
--set-frequency-filters SET_FREQUENCY_FILTERS \
--set-rescue-filters SET_RESCUE_FILTERS \
--caller-priority CALLER_PRIORITY \
where
STANDARDIZED_MERGED_VCF
has been standardized bysv_standardizer
- If no filename is given, the function will take input from
stdin
- Optional:
STANDARDIZED_MERGED_VCF
may also have been postprocessed bysv_postprocessing
OUTPUT-FORMAT
is one ofvcf
ortsv
orbed
SET_INTERPRETATION_GROUP
is a JSON-formatted string defined in analysistypeconfig.schema.json undersvparams.interpretation_groups
- The interpretation group is applied prior to frequency filtering
SET_FREQUENCY_FILTERS
is a JSON-formatted string defined in analysistypeconfig.schema.json undersvdb.criteria
. There is no additional schema insv_wgs_filtering
to check the filter definitions.- Note that frequency filtering by the Gnomad database is applied to variants from all callers, whereas filtering by INDB and SweGen databases is caller specific
SET_RESQUE_FILTERS
is a JSON-formatted string defined in analysistypeconfig.schema.json undersvdb.exceptions
CALLER_PRIORITY
is a list of callers in order of priority. The freqency database for the caller of highest priority will be used for filtering when multiple callers have been merged- Optional:
--debug
will show debug information and causeSET_FREQUENCY_FILTERS
to be applied as annotations and not filtered