HTS Bioinf - Demultiplexing and quality control of raw sequencing data
This specification states how Norwegian Sequencing Center (NSC) should demultiplex and quality control raw sequencing data runs, and deliver them to the Enhet for HTS-diagnostikk.
NSC is responsible for delivering sequencing data from genome, exome and target capture.
NSC is responsible to deliver, demultiplex and run basic quality controls on the delivered data. Before being transferred, data is stored in the NSC administrated servers (HTS Bioinf - Storage and security of sensitive data).
The bioinformatician in charge of production is responsible to check the consistency of the data and contact NSC in case of any problems.
- Demultiplex diagnostic samples by running
bcl2fastq2software with default settings.
FASTQCusing default settings on each sample's
Copy files in the list below to beta server at
FASTQfiles for all samples
FASTQCraw results for all samples
md5sumfor fastq and PDF files (
- sequencing statistics for all samples (
Verify that permissions and user group settings respect the following:
The user group of the files should be:
The permission code of the files should be:
Send a standard NSC delivery email to email lists
When EKG or EHG have setup a run, send the email also to
ehg-htsat OUS, respectively.
All in-house scripts concerning demultiplexing must be saved on LIMS production server and version-controlled.
Several samples are normally pooled together in the same flowcell lane during sequencing, which results in reads from different samples being mixed together in the initial output files from sequencing.
To separate different samples into separate files, demultiplexing software must be run on the initial output files.
After demultiplexing, quality control on reads (
FASTQC) can be performed separately for each sample.
NSC is responsible for demultiplexing.
In demultiplexing, there are few thresholds:
- Only data passing the purity filter (
PF) are written to the
PFis internal to the sequencing machine and not configurable.
- We allow at most one mismatch in the index reads. So if an index read differs from the specified sample index by 1 base, it is still assigned to the sample.
After demultiplexing, samples can be delivered. The
lims-exporter-api and pipeline will take the responsibility to check the QC in an automatic way.
Q30 of each sample will be checked by
lims-exporter-api. If the sample
Q30 is below Illumina
threshold criteria given below, and if the priority of the sample is
1, then the sample will be sent to "Manager Review" in
Clarity. Otherwise, the sample will be exported and will run through pipeline. The pipeline will check those QC as it has been
Illuminas threshold criteria for good sequencing quality (obtained from http://www.illumina.com
- HiSeq2500: ≥80% of bases higher than Q30 (2x125bp)
- HiSeq3000/4000 and HiseqX: ≥75% of bases higher than Q30 (2x150bp)
- MiSeq: ≥80% of bases higher than Q30 (2x150bp)
- NextSeq 500: ≥75% of bases higher than Q30 (2x150bp) for both High Output and Mid Output
- NovaSeq 6000: ≥85% of bases higher than Q30 (2x150bp) for all flowcells
The Clarity LIMS system monitors the sequencer as it is sequencing. When a sequencing run is finished, the LIMS system will automatically trigger the following steps:
- Copy the run directory structure and essential files for demultiplexing to the NSC server
- Quality controls on the sequencing data
- Prepare for delivery
Log files (containing commands used for demultiplexing and software version numbers for these steps) are saved under
DemultiplexLogs directory under the run directory located on NSC server.
Other relevant documents
- HTS - Kvalitetskontroll
- NextSeq 500