HTS Bioinf - CoNVaDING background data
When to create a new control set
The quality of CoNVaDING CNV calling is highly dependent on the background control data. The background control data set needs to be updated regularly.
The quality of CoNVaDING CNV calling is measured by the parameter SAMPLE_CV
given by CoNVaDING. When mean(SAMPLE_CV) > 0.04
for an EKG-project, the number of false positive results increases.
The value is monitored by EKG. The background data is expected to undergo updates every 3 months.
- EKG will send an email to diagnostic bioinformatics when it is time to update the CoNVaDING background data.
- EKG will determine which samples should be part of the new background data and which samples should be used for validating the new background data. This information will typically be provided in the form of an Excel spreadsheet located in
/ess/p22/data/durable/production/interpretations/communication/EKG communication/Oppdatering av bakgrunnskontroller
.
Note: Keep open communication with EKG to determine the reasoning behind adding projects.
Location of current background control sets
Previous background control sets are stored in the sensitive-db on TSD. Its location at the time of writing is /ess/p22/data/durable/production/reference/sensitive/sensitive-db
. Look in sensitive-db.json
for the entries related to "convading"
.
Creating a new data set
The source code for generating a new background data set is located at /ess/p22/data/durable/development/convading-background-data
. We'll assume this as the current working directory in the following. The version controlled source code is located in the subdirectory convading-background-data
. See the README.md
file there for more detailed information about how the code functions.
-
Update the
config.json
in this directory to contain exactly the same projects as mentioned in the Excel sheet.- Add all relevant projects to
config.json["background-projects"]
. The relevant projects are usually projects with poor CoNVaDING results. Discuss with EKG. - Set
config.json["test"]["projects"]["ignore-projects-before"]
to one year prior to the current date (format YYMMDD). These projects are used to check new background data for good SAMPLE_CV and number of CNVs per project. - Ask EKG for a list of samples with confirmed findings (not control samples) - add these to
config.json["test"]["specific-samples"]
.
- Add all relevant projects to
-
Commit the new
config.json
by running:git add config.json && git commit -m "Update background data YYMMDD"
. -
Generate the new background data. This will take from some hours to a day. Note that default work and output directories are relative to the current directory.
./sw/bin/generate-background-data >convading-background-data_YYMMDD.log
-
Write an Excel-document that contains the results for the Positive kontroller samples. The results are stored in the working directory that was used while creating the background controls. By default this is
workdir-convading-background-YYMMDD-HHMM
(where YYMMDD-HHMM is the time stamp for the creation of the directory): -
Move the Excel file
workdir-convading-background-YYMMDD-HHMM/CoNVaDING-result-results-specific-samples.xlsx
and the log fileconvading-background-data_YYMMDD.log
to/ess/p22/data/durable/production/interpretations/communication/EKG communication/Oppdatering av bakgrunnskontroller/
. -
Notify EKG about the completed run and ask them to inspect the Excel file and the log file. This is perhaps easiest with a short meeting to go through the logs together.
-
If EKG approves the background data, release the data and notify EKG about the release as explained in HTS Bioinf - Update of in-house annotation data. If EKG does not approve the background data, alter the projects used as background (perhaps add more projects?), and try again from step 2. One can also use the utility scripts under
convading-background-data/bin
to identify potential problems.
References
- CoNVaDING article: https://pubmed.ncbi.nlm.nih.gov/26864275
- CoNVaDING source code: https://github.com/molgenis/CoNVaDING