HTS Bioinf - ELLA core production and maintenance
Scope
ELLA: software for interpretations of genetic variants.
Supervisor: software module to monitor processes.
Supervisor UI: the webpage to access Supervisor functionality
TSD: Tjenester for Sensitive Data (infrastructure at USIT, University of Oslo)
The procedure describes the daily duties for running ELLA in production:
- monitor the infrastructure
- handle errors
- communication with TSD and users
- handling ad hoc situations
- releasing new version of ELLA
Responsibility
Responsible person: A core developer of ELLA.
Automated tests
Automated software tests are run on every commit of changes to the code in ELLA's Gitlab repository. Below is a short summary of the different types of automated tests performed.
- End-to-end tests : A complete app with frontend (browser) and backend is started. Several use-cases are executed through the browser simulating a user clicking and entering text.
- Unit tests : Testing of smaller units (normally functions) of code.
- API / Integration tests : A database instance and the backend is started. Tests use the backend API and/or database to test various scenarios, ensuring that larger components of code is working correctly. The tests are run twice, once against current database schema and once again where the database schema has been migrated from baseline to current.
On duty
The core developers organize so one person is always on duty. If this is not possible for some reason, the leader should be informed (in advance if possible).
Location of the application
The application is running on TSD at the University of Oslo on the p22-ella-01
VM.
The url is http://p22-ella-01:
Singularity images, ops-scripts, and other files related to ELLA is located at /ess/p22/data/durable/production/ella
. The directories and scripts mentioned later in this document are relative to this unless given with absolute path.
General communication
General communication with users and TSD
- info on downtime and releases
- password help
- questions
- answer the emails that are sent to the
ella-support
mailing list
Handling TSD errors
When errors are detected, send email to tsd-drift@usit.uio.no. Use the email link on
http://www.uio.no/english/services/it/research/sensitive-data/contact/index.html and fill in the required fields. Put diag-bioinf
and ella-support
in CC so updates are distributed to relevant people.
If ELLA is down, or you know about upcoming downtime, contact the users using the following list of emails:
- EGG
- EKG
- EHD
- SAK
- ELL
- asglan, lretters, uxatjr, yngsej
- ella-support (cc + reply to)
Handling other errors
When users report errors try to figure the source of error (user, application, system) and take appropriate action. Add ella-support
as CC to any relevant emails.
If the error is (or suspected to be) related to the VM p22-ella-01
and cause significant problems for users, ELLA can be started from the failover VM p22-ella-fo-01
. Stop ELLA on p22-ella-01
before starting it on failover VM.
Debug checklist:
- Are there any reported issues from TSD?
- Are there any Zabbix related warnings?
- Check if VM load is unusually high
- Check if database server experience unusually high loads
- Check response times of various resources
- Check if activity is unusually high or if there are unusually many 500 responses
- Inspect logs
- Inspect uiexception table (relevant if users report error message shown as toasts in the ELLA UI)
- Stop all other instances of ELLA
- Check if any recent updates could cause the issues
Supervisor (process manager)
ELLA is setup to be managed by Supervisor. The configuration is located in ops/supervisor/supervisor.cfg
.
To start ella:
-
ssh
intop22-ella-01
-
login as service user
-
Run
This will start the supervisor process with service user. Head over to http://p22-ella-01:9000 and ensure that everything is looking correct.
tail
the log at ella-prod/logs/api.log
and check that it's started correctly and that requests are handled.
Logs
All log files are located in ella-prod/logs
.
Direct access to database
By running the script ops/prod-psql.sh
you get a psql
prompt as a read-only user to the database.
Normally, changes to database happens through migrations done in the ella upgrades. In rare cases, however, it might be needed to modify or correct data directly on the production database.
All modifications to production database should be written as scripts in tasks/YYYYDDMM_task_name/name_of_task.sql
for tracking purposes. The code must be reviewed by another person and all operations should be tested and confirmed in a transaction that is rolled back. When the script is validated and reviewed, it can be committed. For extensive changes where validation in the API or UI is required, the script should be tested against a staging database first.
Change management and deployment
AMG related config change requests, errors or similar are registered in Jira (https://ousamg.atlassian.net/projects/LA). The issue is discussed with System responsible before work is started.
All other issues are stored alongside the source code is stored in git (https://gitlab.com/alleles/ella).
Work is done in a separate branch and the app is tested in a dedicated test environment. When approved by users and system responsible the changes are added to the main branch.
Before deployment the source code is tagged and an application is created and transferred to TSD. The application is started in the staging environment and if approved by superuser and System responsible the application can be deployed to production. The superuser verifies the changes.
Changes are documented by filling out the attachment "Endringskontroll for endringer av ELLA". Hotfixes (versions fixing important bugs in existing functionality, but not changing the functionality of the application itself) do not need to fill out "Endringskontroll". Hotfixes are denoted with the PATCH version number being increased (versions are given as MAJOR.MINOR.PATCH). The fix will still be documented in the release notes of the application itself.
Deploying a new version
Before release:
- Agree a suitable time with core users to perform verification and acceptance tests (verification test is done on staging, acceptance test on prod before release)
- Notify users in advance by e-mail
- Add broadcast message using the cli (
ella-cli
broadcast new "...") - Prepare and test all config changes on staging
Create release:
- Create MR to merge
dev
intomaster
- Tag
master
- When CI completes, run
make release
from ELLA repo root directory to create release on Gitlab - Download Singularity image from release page on Gitlab
Deploying:
- Upload Singularity image to TSD:
archive/releases
- Stop ELLA from
p22-ella-01:9000
- Backup database with
ops/db-prod-dump.sh
- Update symlink
ella-prod/ella.sif
to new release image - If migration is required, run
prod-cli.sh
+ella-cli database upgrade head
- If required, update usergroups and/or filterconfig using CLI
- Start ELLA from
p22-ella-01:9000
- Add broadcast message "Akseptansetest pågår, vennligst vent med å bruke ELLA som normal til du får beskjed."
- Notify testers to start acceptance test
- When the acceptance test is OK, send e-mail to users and remove broadcast message
Templates
- Endringskontroll for endringer av ELLA.docx
- MAL - Akseptansetest i produksjonsmiljø.xlsx
- MAL - Verifisering av endring i testmiljø.xlsx
Other relevant documents
- HTS Bioinf - ELLA daily operations
- HTS Bioinf - Execution and monitoring of pipeline HTS Bioinf - Process for updates to production pipelines
- ELLA-brukerveiledning og resultathåndtering EKG