Skip to content

HTS Bioinf - Convading background data

When to create a new control set

The quality of CoNVaDING CNV calling is highly dependent on the background control data. The background control data set needs to be updated regularly.

Quality of CoNVaDING CNV calling is measured by the parameter SAMPLE_CV that is given by CoNVaDING. When mean(SAMPLE_CV) > 0.04 for an EKG-project, the number of false positive results increases.

The value is monitored by EKG. It is expected that background data has to be updated every 3 months.

  1. EKG will send an email to diagnostic bioinformatics when it is time to update the CoNVaDING background data.
  2. EKG will determine which samples that should be part of the new background data and which samples should be used for validating the new background data. This is summarized in the file Data til oppdatering av bakgrunnskontroller.xlsx that is located in /tsd/p22/data/durable/production/interpretations/communication/EKG communication/Oppdatering av bakgrunnskontroller/

Location of current background control sets

The previous background control sets are stored in the sensitive-db on TSD. Current location is in /cluster/projects/p22/production/sw/vcpipe/sensitive-db/. Look in bundle.json for the entries related to "convading".

Creating a new data set

  1. Log into the TSD VM p22-ella-fo-01
  2. Enter the directory containing the repo for creating new background data convading-background-data. Current location is /tsd/p22/data/durable2/investigations/220225-convading-background-data/convading-background -data
  3. Update the config.json in this directory to contain exactly the same projects as mentioned in the Excel sheet Data til oppdatering av bakgrunnskontroller.xlsx
    • Add all Project ID's of the rows marked with Bakgrunnskontroller in the Prosjekter sheet to the list config.json["background-projects"]
    • Add all Project ID's of the rows marked with Sample CV control in the Prosjekter sheet to the list config.json["test"]["projects"]["projects"]
    • Positive controls are found in the Positive kontroller sheet. Find the oldest project that has Prøvenr CNVdel or CNVdup and add the previous date to config.json["test"]["positive-controls"]["ignore-projects-before"]. For example, if the oldest project is EKG220103 , then the previous date is 220102. Note that projects that are older than 220103 lack important data and cannot be used directly.
    • Controls for manual inspection are also found in the Positive kontroller sheet. These samples have numerical Prøvenummer (i.e. not CNVdel or CNVdup ). They are added individually as a list to config.json["test"]["specific-samples"].
  4. Commit the new config.json by running: git add config.json && git commit -m "Update background data YYMMDD"
  5. Make the new background data. This will take a week or longer. Consider using a screen session if screen is available. Note that default work- and output directories are releative to the current directory

    • cd /tsd/p22/data/durable2/investigations/220225-convading-background-data
    • ./convading-background-data/bin/generate-background-data >convading-background-date_YYMMDD.log
  6. Make an Excel-document that contains the results for the Positive kontroller -samples. The results are stored in the working directory that was used while creating the background controls. By default this is workdir-convading-background-YYMMDD-HHMM (where YYMMDD-HHMM is the time stamp for the creation of the directory):

    ./convading-background-data/bin/make-excel-report \
        --path workdir-convading-background-YYMMDD-HHMM \
        --regions KREFT_v04-KREFT38_v01-extended-regions-split-exons.csv \
        --output workdir-convading-background-YYMMDD-HHMM
    
  7. Move the Excel file workdir-convading-background-YYMMDD-HHMM/CoNVaDING-result-results-specific- samples.xlsx and the log file convading-background-date_YYMMDD.log to /tsd/p22/data/durable/production/interpretations/communication/EKG communication/Oppdatering av bakgrunnskontroller/

  8. Send an email to EKG mailing list and ask them to investigate the Excel file and the log-file
  9. If EKG approves the background data, release the data and notify EKG about the release as explained in HTS Bioinf - Update sensitive database
  10. Blacklist EKG projects from entering ELLA for the first five runs after the new background has been released by adding the regular expression (?i)Diag-EKGYY.* (where YY is the two last digits of the current year) to the file /tsd/p22/data/durable/production/ella/ops/prod-watcher-blacklist.txt.

References

CoNVaDING paper: https://www.ncbi.nlm.nih.gov/pubmed/