HTS Bioinf - Convading background data
When to create a new control set
The quality of CoNVaDING CNV calling is highly dependent on the background control data. The background control data set needs to be updated regularly.
Quality of CoNVaDING CNV calling is measured by the parameter SAMPLE_CV
that is given by CoNVaDING. When mean(SAMPLE_CV) > 0.04
for an EKG-project, the number of false positive results increases.
The value is monitored by EKG. It is expected that background data has to be updated every 3 months.
- EKG will send an email to diagnostic bioinformatics when it is time to update the CoNVaDING background data.
- EKG will determine which samples that should be part of the new background data and which samples should be used
for validating the new background data. This is summarized in the file Data til oppdatering av
bakgrunnskontroller.xlsx that is located in
/tsd/p22/data/durable/production/interpretations/communication/EKG communication/Oppdatering av bakgrunnskontroller/
Location of current background control sets
The previous background control sets are stored in the sensitive-db on TSD. Current location is
in /cluster/projects/p22/production/sw/vcpipe/sensitive-db/
. Look in bundle.json
for the entries related to
"convading".
Creating a new data set
- Log into the TSD VM
p22-ella-fo-01
- Enter the directory containing the repo for creating new background data
convading-background-data
. Current location is/tsd/p22/data/durable2/investigations/220225-convading-background-data/convading-background -data
- Update the
config.json
in this directory to contain exactly the same projects as mentioned in the Excel sheet Data til oppdatering av bakgrunnskontroller.xlsx- Add all Project ID's of the rows marked with Bakgrunnskontroller in the Prosjekter sheet to the list
config.json["background-projects"]
- Add all Project ID's of the rows marked with Sample CV control in the Prosjekter sheet to the list
config.json["test"]["projects"]["projects"]
- Positive controls are found in the Positive kontroller sheet. Find the oldest project that has Prøvenr CNVdel
or CNVdup and add the previous date to
config.json["test"]["positive-controls"]["ignore-projects-before"]
. For example, if the oldest project is EKG220103 , then the previous date is 220102. Note that projects that are older than 220103 lack important data and cannot be used directly. - Controls for manual inspection are also found in the Positive kontroller sheet. These samples have numerical
Prøvenummer (i.e. not CNVdel or CNVdup ). They are added individually as a list
to config.json["test"]["specific-samples"]
.
- Add all Project ID's of the rows marked with Bakgrunnskontroller in the Prosjekter sheet to the list
- Commit the new
config.json
by running:git add config.json && git commit -m "Update background data YYMMDD"
-
Make the new background data. This will take a week or longer. Consider using a
screen
session ifscreen
is available. Note that default work- and output directories are releative to the current directorycd /tsd/p22/data/durable2/investigations/220225-convading-background-data
./convading-background-data/bin/generate-background-data >convading-background-date_YYMMDD.log
-
Make an Excel-document that contains the results for the Positive kontroller -samples. The results are stored in the working directory that was used while creating the background controls. By default this is
workdir-convading-background-YYMMDD-HHMM
(where YYMMDD-HHMM is the time stamp for the creation of the directory): -
Move the Excel file
workdir-convading-background-YYMMDD-HHMM/CoNVaDING-result-results-specific- samples.xlsx
and the log fileconvading-background-date_YYMMDD.log
to/tsd/p22/data/durable/production/interpretations/communication/EKG communication/Oppdatering av bakgrunnskontroller/
- Send an email to EKG mailing list and ask them to investigate the Excel file and the log-file
- If EKG approves the background data, release the data and notify EKG about the release as explained in HTS Bioinf - Update sensitive database
- Blacklist EKG projects from entering ELLA for the first five runs after the new background has been released by
adding the regular expression
(?i)Diag-EKGYY.*
(where YY is the two last digits of the current year) to the file/tsd/p22/data/durable/production/ella/ops/prod-watcher-blacklist.txt
.
References
CoNVaDING paper: https://www.ncbi.nlm.nih.gov/pubmed/