Skip to content

HTS Bioinf - Storage and security of sensitive data

This document describes security policies for sensitive patient data from the HTS sequencers and the bioinformatic pipeline. It also describes the location of the data and how long they are stored.

Sensitive data

The sensitive data relevant for this procedure are:

  • the raw patient DNA sequencing data (FASTQ files)
  • the processed sequencing data (BAM files)
  • the variants that the pipelines identify (VCF files)
  • NIPT raw sequencing data, analysis results and relevant lab data
  • the sample identifier, even in cases when no other information is connected to it

Data storage

Sensitive data are stored at the following locations:

  • NSC infrastructure (diagnostic area)
  • TSD infrastructure at UiO, project p22
  • OUS network drive connected to hospital PCs under K:\Systemdata\MedGen\LAB\HTS\Tolkning HTS-data

Data security

In general, sensitive data must not be brought to a computer or hard drive outside the department. Storing and handling of sensitive data is restricted to NSC infrastructure, TSD infrastructure, and servers controlled by Sykehuspartner and connected to hospital PCs. Some dedicated portable hard drives that store data in an encrypted and password-protected manner can be used to transfer sensitive data between the mentioned servers and a local computer. The local computer must be offline (no active WiFi, Ethernet, or Bluetooth) as long as the hard drive is connected.

Data permissions

Permissions in NSC infrastructure

Permissions in NSC depend on file area and are in part set by access control lists on Boston and maintained by NSC sysadms, and by Unix file permissions set from VMs. Refer to NSC documentation for Boston for complete and detailed permission settings. In summary:

For runScratch (location of un-multiplexed run data), all users in NSC have at least read access. In particular:

  • members of nsc-seqgroup (all NSC lab and bioinf users) has read, write and execute (rwx) permissions.
  • ous-diag-lab (diagnostic lab user group) has read permission.
  • ous-diag-bioinf (diagnostic bioinf user group) has rwx permissions.

For data/diag (location of diagnostic data after demultiplexing), in general, NSC sysadms and diagnostic bioinformaticians in group ous-diag-bioinf have rwx permissions, others have read permission. In particular:

  • for data/diag/nscDelivery, the group ous-diag-delv (delivery group, including ous-diag-* and nsc-seq) has rwx permissions.
  • for data/diag/diagInternal, the group ous-diag has rwx permissions.
  • for data/diag/transfer, mounted on Sleipnir, ous-diag-bioinf has rwx permissions.

Permissions in TSD infrastructure

In TSD, only p22 members can access data. Files and directories are given the following permissions:

  • Owners of files and directories have read, write and execute (rwx) permissions. The p22-serviceuser owns most files and directories.
  • Bioinformaticians in group p22-diag-ous-bioinfo-group have rwx permissions.
  • Others have only read and execute (rx).

Data locations

Sequencing data in the form of FASTQ files are automatically delivered to a specific delivery directory in the NSC infrastructure. Diagnostic bioinformaticians transfer the files to project p22 in TSD according to the procedure "HTS Bioinf - Execution and monitoring of pipeline".

Paths on TSD

DURABLE=/ess/p22/data/durable

Short-name Path Description
deploys
vcpipe $DURABLE/production/sw/variantcalling/vcpipe Deploy vcpipe code
vcpipe public reference data $DURABLE/production/reference/public Deploy vcpipe public reference data
vcpipe sensitive reference data $DURABLE/production/reference/sensitive Deploy vcpipe sensitive reference data
tsd-import $DURABLE/production/sw/automation/tsd-import Deploy tsd-import code
releases $DURABLE/production/sw/archive Releases are copied and deployed from here
sensitive database code $DURABLE/development/sw/sensitive-db-factory Version controlled (git) source data
production
analyses-work $DURABLE/production/data/analyses-work Production analyses (Nextflow work directory)
analyses-results $DURABLE/production/data/analyses-results Production analyses results after completed work
samples $DURABLE/production/data/samples Production storage of sequence data and metadata
interpretations $DURABLE/production/interpretations Production interpretations (output)
EllA $DURABLE/production/ella/ella-prod/data/analyses Production EllA (output)
archives
analyses archive $DURABLE/production/archive/analyses Analyses archives
software archive $DURABLE/production/sw/archive Software releases archives
NIPT archive $DURABLE/production/archive/nipt NIPT lab, sequencing and output files backup

Duration of file storage for production data

Files on NSC are deleted when production duty changes (usually every 14th day), and in addition more often if needed. This includes BCL files for runs that are exclusive to diagnostics, which are stored in (/boston/diag/runs). BCL files for runs that are shared between diagnostics and NSC are kept at the NSC infrastructure longer, in accordance with NSC routines.

The original FASTQ files, the final BAM files, the variant files, the log files and NIPT sequencing and analysis data - are kept indefinitely on TSD.

Duration of file storage for test data

Copies or links to sensitive data used for testing or method development should normally be deleted after the testing has been finished. If there is a need to keep the data, the data should be moved to a subdirectory under the investigation directory on TSD: /ess/p22/data/durable/production/investigations/

Backup policy

The network drive at OUS is backed up nightly by Sykehuspartner. NSC is responsible for internal backups of data at NSC. TSD takes tape-backups every night and in addition regular snapshots for the last 3 days. Note that only the following directories are backed up at TSD:

  • /ess/p22/data/durable
  • /ess/p22/home

Patient requests for data export

If patients request access to their data, we will export the data in encrypted form and deliver them on USB thumb drive, as described in the procedure "Utlevering av medisinsk genetiske data til pasienter, forskningsdeltakere og pårørende" (eHB, dokID 128206). Such requests can only come from OUS managers.

Patient request for blocking data access to staff members (sperring av tilgang)

If patients request that their data are to be made unavailable to specific members of OUS staff, a staff member not affected by the block should delete any data in NSC infrastructure, move production data in TSD to /ess/p22/data/durable/production/staff_restricted_access/$SUBDIR/, where $SUBDIR is made on a case-by-case basis and its subdirectories are only accessible by the bioinformatic coordinator role or alternatively one other member of staff not affected by the block.


A risk analysis for TSD is available at K:\Felles\KDI\AMG\Risikostyring\Risikovurdering\Vedlegg_ROS_TSD_230915.docx