Skip to content

HTS Bioinf - CoNVaDING background data

When to create a new control set

The quality of CoNVaDING CNV calling is highly dependent on the background control data. The background control data set needs to be updated regularly.

The quality of CoNVaDING CNV calling is measured by the parameter SAMPLE_CV given by CoNVaDING. When mean(SAMPLE_CV) > 0.04 for an EKG-project, the number of false positive results increases.

The value is monitored by EKG. The background data is expected to undergo updates every 3 months.

  1. EKG will send an email to diagnostic bioinformatics when it is time to update the CoNVaDING background data.
  2. EKG will determine which samples should be part of the new background data and which samples should be used for validating the new background data. This information will typically be provided in the form of an Excel spreadsheet located in /ess/p22/data/durable/production/interpretations/communication/EKG communication/Oppdatering av bakgrunnskontroller.

Note: Keep open communication with EKG to determine the reasoning behind adding projects.

Location of current background control sets

Previous background control sets are stored in the sensitive-db on TSD. Its location at the time of writing is /ess/p22/data/durable/production/reference/sensitive/sensitive-db. Look in sensitive-db.json for the entries related to "convading".

Creating a new data set

The source code for generating a new background data set is located at /ess/p22/data/durable/development/convading-background-data. We'll assume this as the current working directory in the following. The version controlled source code is located in the subdirectory convading-background-data. See the README.md file there for more detailed information about how the code functions.

  1. Update the config.json in this directory to contain exactly the same projects as mentioned in the Excel sheet.

    • Add all relevant projects to config.json["background-projects"]. The relevant projects are usually projects with poor CoNVaDING results. Discuss with EKG.
    • Set config.json["test"]["projects"]["ignore-projects-before"] to one year prior to the current date (format YYMMDD). These projects are used to check new background data for good SAMPLE_CV and number of CNVs per project.
    • Ask EKG for a list of samples with confirmed findings (not control samples) - add these to config.json["test"]["specific-samples"].
  2. Commit the new config.json by running: git add config.json && git commit -m "Update background data YYMMDD".

  3. Generate the new background data. This will take from some hours to a day. Note that default work and output directories are relative to the current directory.

    • ./sw/bin/generate-background-data >convading-background-data_YYMMDD.log
  4. Write an Excel-document that contains the results for the Positive kontroller samples. The results are stored in the working directory that was used while creating the background controls. By default this is workdir-convading-background-YYMMDD-HHMM (where YYMMDD-HHMM is the time stamp for the creation of the directory):

    ./sw/bin/make-excel-report \
        --path workdir-convading-background-YYMMDD-HHMM \
        --regions regions/KREFT_v04-KREFT38_v01-extended-regions-split-exons.csv \
        --output workdir-convading-background-YYMMDD-HHMM
    
  5. Move the Excel file workdir-convading-background-YYMMDD-HHMM/CoNVaDING-result-results-specific-samples.xlsx and the log file convading-background-data_YYMMDD.log to /ess/p22/data/durable/production/interpretations/communication/EKG communication/Oppdatering av bakgrunnskontroller/.

  6. Notify EKG about the completed run and ask them to inspect the Excel file and the log file. This is perhaps easiest with a short meeting to go through the logs together.

  7. If EKG approves the background data, release the data and notify EKG about the release as explained in HTS Bioinf - Update of in-house annotation data. If EKG does not approve the background data, alter the projects used as background (perhaps add more projects?), and try again from step 2. One can also use the utility scripts under convading-background-data/bin to identify potential problems.

References

  • CoNVaDING article: https://pubmed.ncbi.nlm.nih.gov/26864275
  • CoNVaDING source code: https://github.com/molgenis/CoNVaDING