Skip to content

HTS Bioinf - Infrastructure

This document describes the different IT-infrastructures used by bioinformaticians, who administrates them, how they are managed, and briefly what each infrastructure is used for.

The work of diagnostic bioinformaticians (bioinf) is conducted on three different infrastructures, accessed from personal computers:

  • Norwegian Sequencing Center (NSC/NorSeq) - the joint infrastructure for research and diagnostics in building 25
  • Tjeneste for sensitive data (TSD) - the compute cluster for sensitive data at the University of Oslo (UiO)
  • Development servers - both physical and cloud-based servers

NSC machines and network

NSC manages the server machines where sequencing data are initially stored and pre-processed after sequencing.

The servers used by diagnostic bioinformaticians are gdx-login, sleipnir, gdx-executor, gdx-webui and gdx-db. NSC has also other servers used for preprocessing (demultiplexing and quality control [FastQC]) of sequencing data but these are not directly used by diagnostic bioinformaticians.

  • gdx-login is used for exporting data from the LIMS (Clarity) and for the lab to get access to sleipnir.
  • sleipnir is used to transfer data (sequencing and other) from the NSC network to TSD and from TSD to NSC.
  • The VMs gdx-executor, gdx-webui and gdx-db are used for running pipelines on NSC.

System administrator of all NSC machines is Pål Marius Bjørnstad. Arvind Sundaram is backup administrator. NSC system administrators are responsible for keeping NSC machines up and updated, for data backup, user management, etc. There is a simple data processor agreement between NSC and HTS diagnostics.

The NSC infrastructure also consists of the servers hosting the Clarity LIMS. Diagnostic bioinformaticians do not operate on these machines directly but communicate with them from gdx-login via APIs.

Diagnostic sequencing data are pre-processed and quality assured in automatic pipelines before being delivered to a restricted file area only accessible by diagnostic bioinformaticians (and system administrators). When connecting to the NSC network from personal computers via cables, any other network connections (wifi and extra physical network ports, if any) must be turned off.

TSD

Tjeneste for sensitive data (TSD), a secure HPC environment for sensitive data at UiO, is the main IT-infrastructure used for storage, processing, and analysis of sequencing data. Diagnostic data are processed exclusively in project p22. The following is a list of virtual machines (VMs) and servers in project p22 at TSD, that are used by AMG diagnostic personnel:

Virtual machines

  • p22-rhel8-01-pool: Standard Linux login server. Not to be used for expensive computations.
  • p22-hpc-01, and p22-hpc-02 as backup: Run pipeline executor (which in turn starts the Nextflow processes responsible for submitting jobs to the computing cluster)
  • p22-hpc-02: Runs filelock-exporter, which transfers sequencing data from the file lock, and the pipeline monitor webUI.
  • p22-hpc-03: Used for testing and development. Also backup for p22-hpc-01 and p22-hpc-02.
  • p22-anno-01: Runs annotation services
  • p22-app-01: General purpose VM for running services with low requirements to CPU/RAM, and services that are not deemed essential for production. Can also be used for running one-off tasks/chores using Singularity.
  • p22-podman-01: General purpose VM for running anything Podman related. Useful as a testing ground for podman work, and running one-off tasks/chores using podman.
  • p22-podman-02: Reserved for running ELLA production using Podman/docker compose
  • p22-podman-03: Reserved for running ELLA staging using Podman/docker compose
  • p22-ella-01: Runs ELLA production
  • p22-ella-fo-01: Failover VM for ELLA production
  • p22-ella-dev: Runs ELLA development/validation versions
  • p22-ella-stage: Runs ELLA staging (release testing)
  • p22-dbpg-prod01: (p22-dbpg02.tsd.usit.no): Runs database server for ELLA
  • p22-dbpg-prod02: (p22-dbpg03.tsd.usit.no): Runs database server for the pipeline executor and others
  • p22-win01: Windows login, failover
  • p22-win02: Windows login used by lab engineers from OUS, mostly EKG
  • p22-win03: Windows login used by lab engineers from OUS, mostly EGG, ELL

The Postgres VMs are managed as a service by USIT and only USIT can access them. http://p22-app-01:8080 has an overview of all webservices.

If you are using the VMware Horizon client and get a blank/white screen after opening a session, you may need to disable VMware Blast. In the main window, go to File -> Configure VMware Blast. Uncheck anything that is checked and try reconnecting.

Database naming

The databases in TSD run on the database VMs listed above. Note that their names can be misleading. There can be several databases per VM.

  • DBs for ELLA: vardb{, _staging, _test, _validation} run on p22-dbpg-prod01 VM.
  • DBs for MegaQC: p22_megaqc{, _validation} run on p22-dbpg-prod02 VM.
  • DBs for vcpipe: vcpipe{, _staging, _test} run on p22-dbpg-prod02 VM.

Administration and access

All machines are administrated by the system administrators of TSD at USIT. Any requests should go to TSD Drift tsd-drift@usit.uio.no. In case of unexpected downtime with significant impact (long duration etc), the bioinformatic coordinator should contact the leader of TSD directly by phone or email to expedite the ticket handling.

The bioinformatic group coordinator is the TSD administrator of the p22 TSD project, and is responsible for user management. New users outside the diagnostic bioinformatics group must be approved by the unit managers at AMG. The unit managers should also report when user accounts should be terminated. Additionally, the bioinformatic coordinator should periodically go through the list of users with the unit managers and remove any inactive user accounts.

Development servers

Bioinformaticians use personal computers in general, but rely on development servers for doing heavier computations, such as running and testing variant calling pipelines and storing large data sets. The following servers and cloud services are used for development:

  • zed: Development and CI/testing server, a Hetzner Robot machine. Runs tests/jobs triggered by Gitlab. Dedicated team members have admin access.
  • gitlab.com: Git server.
  • digitalocean.com: Mainly storage of large files and test runners.

DigitalOcean is used for storing various test and reference data like the reference genome and is administrated by the bioinformatic coordinator.

See also