Skip to content

HTS Bioinf - Demultiplexing and quality control of raw sequencing data

Scope

This specification states how Norwegian Sequencing Center (NSC) should demultiplex and quality control raw sequencing data runs, and deliver them to the Enhet for HTS-diagnostikk.

NSC is responsible for delivering sequencing data from genome, exome and target capture.

Responsibility

NSC is responsible to deliver, demultiplex and run basic quality controls on the delivered data. Before being transferred, data is stored in the NSC administrated servers (HTS Bioinf - Storage and security of sensitive data).

The bioinformatician in charge of production is responsible to check the consistency of the data and contact NSC in case of any problems.


Specification

  1. Demultiplex diagnostic samples by running bcl2fastq2 software with default settings.
  2. Run FASTQC using default settings on each sample's FASTQ files.
  3. Copy files in the list below to beta server at 192.168.1.41:/boston/diag/nscDelivery:

    • FASTQ files for all samples
    • FASTQC raw results for all samples
    • PDF quality reports for all samples
    • md5sum for fastq and PDF files (md5sum.txt)
    • sequencing statistics for all samples (Demultiplex_Stats.htm)
  4. Verify that permissions and user group settings respect the following:

    The user group of the files should be: ous-diag

    The permission code of the files should be: 770

  5. Send a standard NSC delivery email to email lists diag-lab and diag-bioinf at UiO.

    When EKG or EHG have setup a run, send the email also to EKG or ehg-hts at OUS, respectively.

All in-house scripts concerning demultiplexing must be saved on LIMS production server and version-controlled.

The repos are also synced to github. https://github.com/nsc-norway/lims, https://github.com/nsc-norway/pipeline

Background

Several samples are normally pooled together in the same flowcell lane during sequencing, which results in reads from different samples being mixed together in the initial output files from sequencing.

To separate different samples into separate files, demultiplexing software must be run on the initial output files. After demultiplexing, quality control on reads (FASTQC) can be performed separately for each sample.

NSC is responsible for demultiplexing.

In demultiplexing, there are few thresholds:

  • Only data passing the purity filter (PF) are written to the FASTQ files. The PF is internal to the sequencing machine and not configurable.
  • We allow at most one mismatch in the index reads. So if an index read differs from the specified sample index by 1 base, it is still assigned to the sample.

After demultiplexing, samples can be delivered. The lims-exporter-api and pipeline will take the responsibility to check the QC in an automatic way.

Firstly, the Q30 of each sample will be checked by lims-exporter-api. If the sample Q30 is below Illumina threshold criteria given below, and if the priority of the sample is 1, then the sample will be sent to "Manager Review" in Clarity. Otherwise, the sample will be exported and will run through pipeline. The pipeline will check those QC as it has been doing.

Illuminas threshold criteria for good sequencing quality (obtained from http://www.illumina.com

  • HiSeq2500: ≥80% of bases higher than Q30 (2x125bp)
  • HiSeq3000/4000 and HiseqX: ≥75% of bases higher than Q30 (2x150bp)
  • MiSeq: ≥80% of bases higher than Q30 (2x150bp)
  • NextSeq 500: ≥75% of bases higher than Q30 (2x150bp) for both High Output and Mid Output
  • NovaSeq 6000: ≥85% of bases higher than Q30 (2x150bp) for all flowcells

The Clarity LIMS system monitors the sequencer as it is sequencing. When a sequencing run is finished, the LIMS system will automatically trigger the following steps:

  • Copy the run directory structure and essential files for demultiplexing to the NSC server
  • Demultiplexing
  • Quality controls on the sequencing data
  • Prepare for delivery

Log files (containing commands used for demultiplexing and software version numbers for these steps) are saved under DemultiplexLogs directory under the run directory located on NSC server.

Other relevant documents

  • HTS - Kvalitetskontroll
  • MiSeq
  • NextSeq 500