Tools for postprocessing of structural variants
Standardizing a VCF file from our CNV-callers
The runnable script sv_standardizer
will print a standardized VCF to stdout
when executed as
where
VCF_FILENAME
is the VCF from one of the callers["manta", "delly", "tiddit", "cnvnator", "svaba", "canvas", "cnv_sv"]
or one of the end states["merged", "header", "ella"]
- If no filename is given, the function will take input from
stdin
CALLER_NAME
explicitly sets the source of theVCF_FILENAME
CALLER_NAME
can be omitted if theVCF_FILENAME
starts with{CALLER_NAME}_
SAMPLE_NAMES
is a comma-separated list of SAMPLE column names to ensure equal namin for all callers- It is expected that the order of SAMPLE columns is the same for all callers
Note: A hack is used for the standardization of a file in the "merged" state
for filenames that contain Diag-
in order to remove unwanted INFO fields from
v2.8.2 of SVDB merge
Postprocessing of the FILTER column
The FILTER column can be updated when a variant record fails a quality test.
Postprocessing is needed for callers such as CNVnator
, Delly
, Manta
and
the Dragen SV and CNV callers.
The runnable script sv_postprocessing
will print a VCF with updated FILTER
column to stdout
when executed as
sv_postprocessing STANDARDIZED_MERGED_VCF --set-filters WGS_SVPARAMS_FILTER --caller-priority CALLERS --set-filter-descriptions SET_FILTER_DESCRIPTIONS
where
STANDARDIZED_MERGED_VCF
has been standardized bysv_standardizer
and merged by SVDB- If no filename is given, the function will take input from
stdin
WGS_SVPARAMS_FILTER
is a JSON-formatted string defined in analysistypeconfig.schema.json underWGS.svparams.filter
. There is no additional schema insv_postprocessing
to check the filter definitions.CALLERS
is a prioritized list of callers. Only the filter of the most prioritized caller will be used.- Optional:
SET_FILTER_DESCRIPTIONS
is a dictionary{FILTER_NAME: "Description of filter"}
for the VCF header.FILTER_NAME
s are the keys ofWGS_SVPARAMS_FILTER
. If not set, the description will be the generic: "Custom filterFILTER_NAME
".
Filtering and selecting a interpretation group
Filtering means to remove records from the VCF based on certain conditions
- remove variants with high frequencies or other properties (set filters). Exceptions are allowed (set rescue filters)
- return a subset of variants (set interpretation group)
sv_wgs_filtering STANDARDIZED_MERGED_VCF
--output-format OUTPUT_FORMAT
--set-interpretation-group SET_INTERPRETATION_GROUP
--set-filters SET_FILTERS
--set-rescue-filters SET_RESCUE_FILTERS
--caller-priority CALLER_PRIORITY
where
STANDARDIZED_MERGED_VCF
has been standardized bysv_standardizer
- If no filename is given, the function will take input from
stdin
- Optional:
STANDARDIZED_MERGED_VCF
may also have been postprocessed bysv_postprocessing
OUTPUT-FORMAT
can take values such asvcf
,tsv
,bed
ormd
SET_INTERPRETATION_GROUP
is a JSON-formatted string defined in analysistypeconfig.schema.json undersvparams.interpretation_groups
- The interpretation group is applied prior to filtering
SET_FILTERS
is a JSON-formatted string defined in analysistypeconfig.schema.json undersvdb.criteria
. There is no additional schema insv_wgs_filtering
to check the filter definitions.- Note that frequency filtering by the Gnomad database is applied to variants from all callers, whereas filtering by INDB and SweGen databases is caller specific
SET_RESCUE_FILTERS
is a JSON-formatted string defined in analysistypeconfig.schema.json undersvdb.exceptions
CALLER_PRIORITY
is a list of callers in order of priority. The freqency database for the caller of highest priority will be used for filtering when multiple callers have been merged- Optional:
--debug
will show debug information
Note: There are additional options for this function. For the latest options,
check sv_wgs_filtering --help
Validating a VCF file
A VCF file can be validated by the script sv_validator
which will print an
error message if the format of the VCF does not comply with a pydantic model for
the VCF records
where
VCF_FILENAME
is the VCF file to be validated. If no filename is given, the function will take input fromstdin
.DATAMODEL
is the pydantic model to be used for validation. For datamodel options and the default model, checksv_validator --help