Skip to content

HTS Bioinf - ELLA core production and maintenance

Scope

ELLA: software for interpretations of genetic variants.

Supervisor: software module to monitor processes.

Supervisor UI: the webpage to access Supervisor functionality

TSD: Tjenester for Sensitive Data (infrastructure at USIT, University of Oslo)

The procedure describes the daily duties for running ELLA in production:

  • monitor the infrastructure
  • handle errors
  • communication with TSD and users
  • handling ad hoc situations
  • releasing new version of ELLA

Responsibility

Responsible person: A core developer of ELLA.

Automated tests

Automated software tests are run on every commit of changes to the code in ELLA's Gitlab repository. Below is a short summary of the different types of automated tests performed.

  • End-to-end tests : A complete app with frontend (browser) and backend is started. Several use-cases are executed through the browser simulating a user clicking and entering text.
  • Unit tests : Testing of smaller units (normally functions) of code.
  • API / Integration tests : A database instance and the backend is started. Tests use the backend API and/or database to test various scenarios, ensuring that larger components of code is working correctly. The tests are run twice, once against current database schema and once again where the database schema has been migrated from baseline to current.

On duty

The core developers organize so one person is always on duty. If this is not possible for some reason, the leader should be informed (in advance if possible).

Location of the application

The application is running on TSD at the University of Oslo on the p22-ella-01 VM. The url is http://p22-ella-01:

Singularity images, ops-scripts, and other files related to ELLA is located at /ess/p22/data/durable/production/ella. The directories and scripts mentioned later in this document are relative to this unless given with absolute path.

General communication

General communication with users and TSD

  • info on downtime and releases
  • password help
  • questions
  • answer the emails that are sent to the ella-support mailing list

Handling TSD errors

When errors are detected, send email to tsd-drift@usit.uio.no. Use the email link on http://www.uio.no/english/services/it/research/sensitive-data/contact/index.html and fill in the required fields. Put diag-bioinf and ella-support in CC so updates are distributed to relevant people.

If ELLA is down, or you know about upcoming downtime, contact the users using the following list of emails:

  • EGG
  • EKG
  • EHD
  • SAK
  • ELL
  • asglan, lretters, uxatjr, yngsej
  • ella-support (cc + reply to)

Handling other errors

When users report errors try to figure the source of error (user, application, system) and take appropriate action. Add ella-support as CC to any relevant emails.

If the error is (or suspected to be) related to the VM p22-ella-01 and cause significant problems for users, ELLA can be started from the failover VM p22-ella-fo-01. Stop ELLA on p22-ella-01 before starting it on failover VM.

Debug checklist:

  • Are there any reported issues from TSD?
  • Are there any Zabbix related warnings?
  • Check if VM load is unusually high
  • Check if database server experience unusually high loads
  • Check response times of various resources
  • Check if activity is unusually high or if there are unusually many 500 responses
  • Inspect logs
  • Inspect uiexception table (relevant if users report error message shown as toasts in the ELLA UI)
  • Stop all other instances of ELLA
  • Check if any recent updates could cause the issues

Supervisor (process manager)

ELLA is setup to be managed by Supervisor. The configuration is located in ops/supervisor/supervisor.cfg.

To start ella:

  • ssh into p22-ella-01

  • login as service user

  • Run

    cd /ess/p22/data/durable/production/ella/ops/
    ./run-supervisor.sh
    

This will start the supervisor process with service user. Head over to http://p22-ella-01:9000 and ensure that everything is looking correct.

tail the log at ella-prod/logs/api.log and check that it's started correctly and that requests are handled.

Logs

All log files are located in ella-prod/logs.

Direct access to database

By running the script ops/prod-psql.sh you get a psql prompt as a read-only user to the database.

Normally, changes to database happens through migrations done in the ella upgrades. In rare cases, however, it might be needed to modify or correct data directly on the production database.

All modifications to production database should be written as scripts in tasks/YYYYDDMM_task_name/name_of_task.sql for tracking purposes. The code must be reviewed by another person and all operations should be tested and confirmed in a transaction that is rolled back. When the script is validated and reviewed, it can be committed. For extensive changes where validation in the API or UI is required, the script should be tested against a staging database first.

Change management and deployment

AMG related config change requests, errors or similar are registered in Jira (https://ousamg.atlassian.net/projects/LA). The issue is discussed with System responsible before work is started.

All other issues are stored alongside the source code is stored in git (https://gitlab.com/alleles/ella).

Work is done in a separate branch and the app is tested in a dedicated test environment. When approved by users and system responsible the changes are added to the main branch.

Before deployment the source code is tagged and an application is created and transferred to TSD. The application is started in the staging environment and if approved by superuser and System responsible the application can be deployed to production. The superuser verifies the changes.

Changes are documented by filling out the attachment "Endringskontroll for endringer av ELLA". Hotfixes (versions fixing important bugs in existing functionality, but not changing the functionality of the application itself) do not need to fill out "Endringskontroll". Hotfixes are denoted with the PATCH version number being increased (versions are given as MAJOR.MINOR.PATCH). The fix will still be documented in the release notes of the application itself.

Deploying a new version

Before release:

  1. Agree a suitable time with core users to perform verification and acceptance tests (verification test is done on staging, acceptance test on prod before release)
  2. Notify users in advance by e-mail
  3. Add broadcast message using the cli (ella-cli broadcast new "...")
  4. Prepare and test all config changes on staging

Create release:

  1. Create MR to merge dev into master
  2. Tag master
  3. When CI completes, run make release from ELLA repo root directory to create release on Gitlab
  4. Download Singularity image from release page on Gitlab

Deploying:

  1. Upload Singularity image to TSD: archive/releases
  2. Stop ELLA from p22-ella-01:9000
  3. Backup database with ops/db-prod-dump.sh
  4. Update symlink ella-prod/ella.sif to new release image
  5. If migration is required, run prod-cli.sh + ella-cli database upgrade head
  6. If required, update usergroups and/or filterconfig using CLI
  7. Start ELLA from p22-ella-01:9000
  8. Add broadcast message "Akseptansetest pågår, vennligst vent med å bruke ELLA som normal til du får beskjed."
  9. Notify testers to start acceptance test
  10. When the acceptance test is OK, send e-mail to users and remove broadcast message

Templates

  • Endringskontroll for endringer av ELLA.docx
  • MAL - Akseptansetest i produksjonsmiljø.xlsx
  • MAL - Verifisering av endring i testmiljø.xlsx

Other relevant documents