Skip to content

HTS Bioinf - Demultiplexing and quality control of raw sequencing data

Scope

This document states how the Norwegian Sequencing Center (NSC) should demultiplex and quality control raw sequencing data runs and deliver them to the "Enhet for genomdiagnostikk".

NSC is responsible for delivering sequencing data from genome, exome and target capture.

Responsibility

NSC is responsible for delivering and demultiplexing sequencing data and for running basic quality controls on the delivered data. Before being transferred, the data are stored in the NSC administrated servers (HTS Bioinf - Storage and security of sensitive data).

The bioinformatician in charge of production is responsible for checking the consistency of the data and contacting NSC in case of any problems.

Specification

  1. Demultiplex diagnostic samples by running the bcl2fastq2 software with default settings. For NovaSeq X, demultiplexing is done directly on NovaSex X using BCL Convert.

  2. Run FastQC on each sample's FASTQ files using default settings. For NovaSeq X, FastQC is not run.

  3. Copy files in the list below to the gdx-login server at <IP-address>:/boston/diag/nscDelivery:

    • FASTQ files for all samples
    • FastQC raw results for all samples if FastQC is run
    • PDF quality reports for all samples if FastQC is run
    • MD5 checksums for FASTQ and PDF files (md5sum.txt) if not sequenced on NovaSeq X
    • sequencing statistics for all samples (Demultiplex_Stats.htm)
  4. Verify that permissions and user group settings respect the following:

    • The user group of the files should be ous-diag
    • The permission code of the files should be 770
  5. Send a standard NSC delivery email to email lists diag-lab and diag-bioinf at UiO.

    When EKG or EHG have setup a run, send the email also to ekg or ehg-hts at OUS, respectively.

All in-house scripts concerning demultiplexing must be saved on the LIMS production server and version-controlled.

The repos are also synced to github at https://github.com/nsc-norway/lims and https://github.com/nsc-norway/pipeline.

Background

Several samples are normally pooled together in the same flowcell lane during sequencing, which results in reads from different samples being mixed together in the initial output files from sequencing.

To separate different samples into separate files, demultiplexing software must be run on the initial output files.

After demultiplexing, quality control on reads (FastQC) can be performed separately for each sample.

NSC is responsible for demultiplexing.

In demultiplexing, there are few thresholds:

  • Only data passing the purity filter (PF) are written to the FASTQ files. The PF is internal to the sequencing machine and not configurable.
  • We allow at most one mismatch in the index reads. So if an index read differs from the specified sample index by 1 base, it is still assigned to the sample.

After demultiplexing and quality control (FastQC), the samples can be delivered. The downstream pipeline will perform additional QC in an automated way.

Firstly, the Q30 of each sample will be checked by lims-exporter-api. If the sample Q30 is below Illumina's threshold criteria (given below) and the priority of the sample is 1, then the sample will be sent to "Manager Review" in Clarity. Otherwise, the sample will be exported and will run through the pipeline. The pipeline will check those QC as it has been doing.

Illumina's threshold criteria for good sequencing quality (obtained from http://www.illumina.com):

  • HiSeq2500: ≥80% of bases above Q30 (2x125bp)
  • HiSeq3000/4000 and HiseqX: ≥75% of bases above Q30 (2x150bp)
  • MiSeq: ≥80% of bases above Q30 (2x150bp)
  • NextSeq 500: ≥75% of bases above Q30 (2x150bp) for both High Output and Mid Output
  • NovaSeq 6000: ≥85% of bases above Q30 (2x150bp) for all flowcells

The Clarity LIMS system monitors the sequencer as it is sequencing. When a sequencing run is finished, the LIMS system will automatically trigger the following steps:

  • Copy the run directory structure and essential files for demultiplexing to the NSC server
  • Demultiplexing
  • Quality controls on the sequencing data
  • Prepare for delivery

Log files (containing commands used for demultiplexing and software version numbers for these steps) are saved in the DemultiplexLogs directory under the run directory located on the NSC server.

Other relevant documents

  • HTS - Kvalitetskontroll
  • MiSeq
  • NextSeq 500