Tools for postprocessing of structural variants

Standardizing a VCF file from our CNV-callers

The runnable script sv_standardizer will print a standardized VCF to stdout when executed as

sv_standardizer VCF_FILENAME --caller CALLER_NAME --sample SAMPLE_NAMES

where

VCF_FILENAME is the VCF from one of the callers ["manta", "delly", "tiddit", "cnvnator", "svaba", "canvas", "cnv_sv"] or one of the end states ["merged", "header", "ella"]
If no filename is given, the function will take input from stdin
CALLER_NAME explicitly sets the source of the VCF_FILENAME
CALLER_NAME can be omitted if the VCF_FILENAME starts with {CALLER_NAME}_
SAMPLE_NAMES is a comma-separated list of SAMPLE column names to ensure equal namin for all callers
It is expected that the order of SAMPLE columns is the same for all callers

Note: A hack is used for the standardization of a file in the "merged" state for filenames that contain Diag- in order to remove unwanted INFO fields from v2.8.2 of SVDB merge

Postprocessing of the FILTER column

The FILTER column can be updated when a variant record fails a quality test. Postprocessing is needed for callers such as CNVnator, Delly, Manta and the Dragen SV and CNV callers.

The runnable script sv_postprocessing will print a VCF with updated FILTER column to stdout when executed as

sv_postprocessing STANDARDIZED_MERGED_VCF --set-filters WGS_SVPARAMS_FILTER --caller-priority CALLERS --set-filter-descriptions SET_FILTER_DESCRIPTIONS

where

STANDARDIZED_MERGED_VCF has been standardized by sv_standardizer and merged by SVDB
If no filename is given, the function will take input from stdin
WGS_SVPARAMS_FILTER is a JSON-formatted string defined in analysistypeconfig.schema.json under WGS.svparams.filter. There is no additional schema in sv_postprocessing to check the filter definitions.
CALLERS is a prioritized list of callers. Only the filter of the most prioritized caller will be used.
Optional: SET_FILTER_DESCRIPTIONS is a dictionary {FILTER_NAME: "Description of filter"} for the VCF header. FILTER_NAMEs are the keys of WGS_SVPARAMS_FILTER. If not set, the description will be the generic: "Custom filter FILTER_NAME".

Filtering based on frequency or selecting a interpretation group

Filtering means to remove records from the VCF based on certain conditions

remove variants with high frequencies (set frequency filters). Exceptions are allowed (set rescue filters)
return a subset of variants (set interpretation group)
in debug mode, high frequency variants are not removed, but frequency tags are added to the FILTER column based on frequencies (set frequency filters)

sv_wgs_filtering STANDARDIZED_MERGED_VCF \
  --output-format OUTPUT_FORMAT \
  --set-interpretation-group SET_INTERPRETATION_GROUP \
  --set-frequency-filters SET_FREQUENCY_FILTERS \
  --set-rescue-filters SET_RESCUE_FILTERS \
  --caller-priority CALLER_PRIORITY \

where

STANDARDIZED_MERGED_VCF has been standardized by sv_standardizer
If no filename is given, the function will take input from stdin
Optional: STANDARDIZED_MERGED_VCF may also have been postprocessed by sv_postprocessing
OUTPUT-FORMAT is one of vcf or tsv or bed
SET_INTERPRETATION_GROUP is a JSON-formatted string defined in analysistypeconfig.schema.json under svparams.interpretation_groups
The interpretation group is applied prior to frequency filtering
SET_FREQUENCY_FILTERS is a JSON-formatted string defined in analysistypeconfig.schema.json under svdb.criteria. There is no additional schema in sv_wgs_filtering to check the filter definitions.
Note that frequency filtering by the Gnomad database is applied to variants from all callers, whereas filtering by INDB and SweGen databases is caller specific
SET_RESQUE_FILTERS is a JSON-formatted string defined in analysistypeconfig.schema.json under svdb.exceptions
CALLER_PRIORITY is a list of callers in order of priority. The freqency database for the caller of highest priority will be used for filtering when multiple callers have been merged
Optional: --debug will show debug information and cause SET_FREQUENCY_FILTERS to be applied as annotations and not filtered