Skip to content

HTS bioinf - Genomic coordinates liftover

Scope

This document outlines the implementation summary and outputs of the liftover from hg19 to hg38 genome build.

Reference genome coordinates mapping

The mapping between reference genome assemblies (e.g., GRCh37/hg19 to GRCh38/hg38) involves remapping genomic coordinates to account for multiple types of reference changes:

  • Sequence additions and refinements
  • Representation of complex and repetitive regions
  • Correction of assembly errors
  • Chromosomal rearrangements and alternate loci

Our implementation utilizes CrossMap tool with custom post-processing to ensure high-quality coordinate conversion.

Implementation

The coordinate liftover process follows these key steps:

  1. Variant positions are extracted from the source VCF file (either SNVs or structural variants)
  2. Coordinates are converted using CrossMap and the UCSC chain file
  3. Multi-mapped cases are resolved through a region-based secondary analysis
  4. The original VCF file is annotated with target genome coordinates and mapping quality indicator

Liftover annotations

Liftover annotations (explained in Table 1) are shown in the region section in ELLA, see below.

Figure 1

Figure 1: Display of hg38 Liftover annotations in ELLA Region section. When hg38_map is not FAILED, the links will be active and the target genome coordinates non-empty.

Figure 2

Figure 2: Display of hg38 Liftover annotations in ELLA Region section. When hg38_map not FAILED, there will be no content in the gnomAD v4 link or coordinates.

Notes

  • The vertical header HG38 LIVTOVER contains the link to UCSC Liftover tool
  • The vertical header GNOMAD v4 contains the link to the respective region in GnomAD v4 browser in GRCh38 build
  • gnomAD v4 link will be inactive in case of failed liftover

Table 1: Hg38 liftover annotations

Field Description
hg38_chr Target chromosome in hg38 reference
hg38_start Starting position in hg38 (1-based)
hg38_end End position in hg38
hg38_coord Complete coordinate representation
hg38_map Mapping quality indicator

If mapping status is FAILED (hg38_map=FAILED) only hg38_map tag is added.

Mapping quality indicator

The hg38_map field classifies coordinate conversion confidence as following:

UNIQUE: The source coordinates mapped unambiguously to a single location in the target reference genome. This represents the highest confidence level.

REGION: The source coordinates initially mapped to multiple potential locations, but region-based analysis determined a most likely mapping. These coordinates are generally reliable but may warrant additional confirmation.

FAILED: The source coordinates could not be reliably mapped to the target reference genome. This may occur due to significant reference differences, complex regions, or deleted sequences. These variants require manual review before clinical interpretation for the target genome.

Structural variant considerations

For structural variants, the liftover process includes specialized handling:

  • END positions are extracted from INFO fields
  • For variants with SVLEN but no END, the appropriate end position is calculated