gnomAD Variants gnomAD Constraint Metrics Track Settings

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Genome Aggregation Database (gnomAD) Predicted Constraint Metrics (LOEUF, pLI, and Z-scores)

Track collection: Genome Aggregation Database (gnomAD) Genome and Exome Variants

Description

With the gnomAD v4.1 data release, the v4 Pre-Release track has been replaced with the gnomAD v4.1 track. The v4.1 release includes a fix for the allele number issue. The v4.1 track shows variants from 807,162 individuals, including 730,947 exomes and 76,215 genomes. This includes the 76,156 genomes from the gnomAD v3.1.2 release as well as new exome data from 416,555 UK Biobank individuals. For more detailed information on gnomAD v4.1, see the related blog post.

The gnomAD v3.1 track shows variants from 76,156 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. 4,454 genomes were added to the number of genomes in the previous v3 release. For more detailed information on gnomAD v3.1, see the related blog post.

The gnomAD v3.1.1 track contains the same underlying data as v3.1, but with minor corrections to the VEP annotations and dbSNP rsIDs. On the UCSC side, we have now included the mitochondrial chromosome data that was released as part of gnomAD v3.1 (but after the UCSC version of the track was released). For more information about gnomAD v3.1.1, please see the related changelog.

GnomAD Genome Mutational Constraint is based on v3.1.2 and is available only on hg38. It shows the reduced variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions too. Positive values are red and reflect stronger mutation constraint (and less variation), indicating higher natural selection pressure in a region. Negative values are green and reflect lower mutation constraint (and more variation), indicating less selection pressure and less functional effect. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see preprint in the Reference section for details). The chrX scores were added as received from the authors, as there are no de novo mutation data available on chrX (for estimating the effects of regional genomic features on mutation rates), they are more speculative than the ones on the autosomes.

The gnomAD Predicted Constraint Metrics track contains metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1 and identifies genes subject to strong selection against various classes of mutation. This includes data on both the gene and transcript level.

The gnomAD v2 tracks show variants from 125,748 exomes and 15,708 whole genomes, all mapped to the GRCh37/hg19 reference sequence and lifted to the GRCh38/hg38 assembly. The data originate from 141,456 unrelated individuals sequenced as part of various population-genetic and disease-specific studies collected by the Genome Aggregation Database (gnomAD), release 2.1.1. Raw data from all studies have been reprocessed through a unified pipeline and jointly variant-called to increase consistency across projects. For more information on the processing pipeline and population annotations, see the following blog post and the 2.1.1 README.

gnomAD v2 data are based on the GRCh37/hg19 assembly. These tracks display the GRCh38/hg38 lift-over provided by gnomAD on their downloads site.

On hg38 only, a subtrack "Gnomad mutational constraint" aka "Genome non-coding constraint of haploinsufficient variation (Gnocchi)" captures the depletion of variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions, too. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see Chen et al 2024 in the Reference section for details). The chrX scores were added as received from the authors, as there are no mutations available for chrX, they are more speculative than the ones on the autosomes.

For questions on the gnomAD data, also see the gnomAD FAQ.

More details on the Variant type(s) can be found on the Sequence Ontology page.

To view the full description, click here.

All tracks in this collection (9)

Maximum display mode: Reset to defaults

Select views (Help):

constraintV2 ▾

constraintV4 ▾

constraintV4.1 ▾

List subtracks: only selected/visible all ()
	hide Configure	Gene LoF	gnomAD Predicted Loss of Function Constraint Metrics By Gene (LOEUF and pLI) v2.1.1	Data format
	hide Configure	Gene Missense	gnomAD Predicted Missense Constraint Metrics By Gene (Z-scores) v2.1.1	Data format
	hide Configure	Transcript LoF v2	gnomAD Predicted Loss of Function Constraint Metrics By Transcript (LOEUF and pLI) v2.1.1	Data format
	hide Configure	Transcript LoF v4	gnomAD Predicted Loss of Function Constraint Metrics By Transcript (LOEUF and pLI) v4	Data format
	hide Configure	Transcript LoF v4.1	gnomAD Predicted Loss of Function Constraint Metrics By Transcript (LOEUF and pLI) v4.1	Data format
	hide Configure	Transcript Missense v2	gnomAD Predicted Missense Constraint Metrics By Transcript (Z-scores) v2.1.1	Data format
	hide Configure	Transcript Missense v4	gnomAD Predicted Missense Constraint Metrics By Transcript (Z-scores) v4	Data format
	hide Configure	Transcript Missense v4.1	gnomAD Predicted Missense Constraint Metrics By Transcript (Z-scores) v4.1	Data format

Source data version: Release v4.1 (April 19, 2024), Release v4 (November 2023), Release 2.1.1 (March 6, 2019)
Assembly: Human Dec. 2013 (GRCh38/hg38)

new Note: September 30, 2024

Description

The Genome Aggregation Database (gnomAD) - Predicted Constraint Metrics track set contains metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1, v4.0, or v4.1 and identifies genes subject to strong selection against various classes of mutation.

This track includes several subtracks of constraint metrics calculated at gene (canonical transcript) and transcript level. For more information see the following blog post. The metrics include:

Observed and expected variant counts per transcript/gene
Observed/Expected ratio (O/E)
Z-scores of the observed counts compared to expected
Probability of loss of function intolerance (pLI), for predicted loss-of-function (pLoF) variation only

Display Conventions and Configuration

There are two "groups" of tracks in this set, and three gnomAD versions (v2.1.1, v4.0, and v4.1):

Gene/Transcript LoF Constraint tracks: Predicted constraint metrics at the whole gene level or whole transcript level for three different types of variation: missense, synonymous, and predicted loss of function. The Gene Constraint track displays metrics for a canonical transcript per gene defined as the longest isoform. The Transcript Constraint track displays metrics for all transcript isoforms. Items on both tracks are shaded according to the pLI score, with outlier items shaded in grey.

Please note there is no gene-level track available for v4.0 and v4.1.
Gene/Transcript Missense Constraint tracks: The missense constraint tracks are built similarly to the LoF constraint tracks, however the items displayed are based on missense Z scores. All items are colored black, and individual Z scores can be seen on mouseover.

All tracks follow the general configuration settings for bigBed tracks. Mouseover on the Gene/Transcript Constraint tracks shows the pLI score and the loss of function observed/expected upper bound fraction (LOEUF), while mouseover on the Regional Constraint track shows only the missense O/E ratio. Clicking on items in any track brings up a table of constraint metrics.

Clicking the grey box to the left of the track, or right-clicking and choosing the Configure option, brings up the interface for filtering items based on their pLI score, or labeling the items based on their Ensembl identifier and/or Gene Name.

Methods

Please see the gnomAD browser help page and FAQ for further explanation of the topics below.

Observed and Expected Variant Counts

Observed count: The number of unique single-nucleotide variants in each transcript/gene with 123 or fewer alternative alleles (MAF < 0.1%).

Expected count: A depth-corrected probability prediction model that takes into account sequence context, coverage, and methylation was used to predict expected variant counts. For more information please see Lek et al., 2016.

Variants found in exons with a median depth < 1 were removed from both counts.

The O/E constraint score is the ratio of the observed/expected variants in that gene. Each item in this track shows the O/E ratio for three different types of variation: missense, synonymous, and loss-of-function. The O/E ratio is a continuous measurement of how tolerant a gene or transcript is to a certain class of variation. When a gene has a low O/E value, it is under stronger selection for that class of variation than a gene with a higher O/E value. Because Counts depend on gene size and sample size, the precision of the values varies a lot from one gene to the next. Therefore, the 90% confidence interval (CI) is also displayed along with the O/E ratio to better assist interpretation of the scores.

When evaluating how constrained a gene is, it is essential to consider the CI when using O/E. In research and clinical interpretation of Mendelian cases, pLI > 0.9 has been widely used for filtering. Accordingly, the Gnomad team suggests using the upper bound of the O/E confidence interval LOEUF < 0.35 as a threshold if needed.

Please see the Methods section below for more information about how the scores were calculated.

pLI and Z-scores

The pLI and Z-scores of the deviation of observed variant counts relative to the expected number are intended to measure how constrained or intolerant a gene or transcript is to a specific type of variation. Genes or transcripts that are particularly depleted of a specific class of variation (as observed in the gnomAD data set) are considered intolerant of that specific type of variation. Z-scores are available for the missense and synonynmous categories and pLI scores are available for the loss-of-function variation.

Missense and Synonymous: Positive Z-scores indicate more constraint (fewer observed variants than expected), and negative scores indicate less constraint (more observed variants than expected). A greater Z-score indicates more intolerance to the class of variation. Z-scores were generated by a sequence-context-based mutational model that predicted the number of expected rare (< 1% MAF) variants per transcript. The square root of the chi-squared value of the deviation of observed counts from expected counts was multiplied by -1 if the observed count was greater than the expected and vice versa. For the synonymous score, each Z-score was corrected by dividing by the standard deviation of all synonymous Z-scores between -5 and 5. For the missense scores, a mirrored distribution of all Z-scores between -5 and 0 was created, and then all missense Z-scores were corrected by dividing by the standard deviation of the Z-score of the mirror distribution.

Loss-of-function: pLI closer to 1 indicates that the gene or transcript cannot tolerate protein truncating variation (nonsense, splice acceptor and splice donor variation). The gnomAD team recommends transcripts with a pLI >= 0.9 for the set of transcripts extremely intolerant to truncating variants. pLI is based on the idea that transcripts can be classified into three categories:

null: heterozygous or homozygous protein truncating variation is completely tolerated
recessive: heterozygous variants are tolerated but homozygous variants are not
haploinsufficient: heterozygous variants are not tolerated

An expectation-maximization algorithm was then used to assign a probability of belonging in each class to each gene or transcript. pLI is the probability of belonging in the haploinsufficient class.

Please see Samocha et al., 2014 and Lek et al., 2016 for further discussion of these metrics.

Transcripts Included

For version 2.1.1 only, the GENCODE transcripts were filtered according to the following criteria:

Must have methionine at start of coding sequence
Must have stop codon at end of coding sequence
Must be divisible by 3
Must have at least one observed variant when removing exons with median depth < 1
Must have reasonable number of missense and synonymous variants as determined by a Z-score cutoff

For version v2.1.1, the gnomAD gene/transcript data is based on hg19. In order to map transcripts and genes to the hg38 genome the following steps were taken:

Transcript track: The gnomAD ENST identifiers were attempted to be matched to all GENCODE versions between V20 and V44, giving coordinate priorities to the most recent models. In total 74550/80950 transcripts were mapped.
Genes track: The gnomAD file ENSG identifiers were attempted to be matched to all GENCODE versions between V20 and V44, giving coordinate priorities to the most recent models. This mapped 19221/19704 genes. The remainder of the genes were attempted to be mapped using the same strategy, but matching on gene symbols instead of ENSG identifiers. In total 19567/19704 genes were mapped.

For version v4.0 and v4.1, the gnomAD transcript data is based on hg38. In order to map the transcripts to hg38, the transcript version numbers in the gnomAD download file were joined with GENCODE V39 and NCBI RefSeq coordinates available at UCSC.

UCSC Track Methods

Version based on gnomAD v2.1.1

Gene and Transcript Constraint tracks

Per gene and per transcript data were downloaded from the gnomAD Google Storage bucket:

gs://gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz
gs://gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_transcript.txt.bgz

These data were then joined to the Gencode set of genes/transcripts available at the UCSC Genome Browser (see previous section) and then transformed into a bigBed 12+5. For the full list of commands used to make this track please see the makedoc.

Version based on gnomAD v4.0

Gene and Transcript Constraint tracks

Per gene and per transcript data were downloaded from the gnomAD Google Storage bucket:

https://storage.googleapis.com/gcp-public-data--gnomad/release/4.0/constraint/gnomad.v4.0.constraint_metrics.tsv

These data were then joined to the Gencode/NCBI set of genes/transcripts available at the UCSC Genome Browser and then transformed into a bigBed 12+5. For the full list of commands used to make this track please see the makedoc.

Version based on gnomAD v4.1

Gene and Transcript Constraint tracks

Per gene and per transcript data were downloaded from the gnomAD Google Storage bucket:

https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/constraint/gnomad.v4.1.constraint_metrics.tsv

Data Access

The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. The genome annotation is stored in a bigBed file that can be downloaded from the download server. The exact filenames can be found in the track configuration file. Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example:

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/gnomAD/pLI/pliByTranscript.bb -chrom=chr6 -start=0 -end=1000000 stdout

Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information.

More information about using and understanding the gnomAD data can be found in the gnomAD FAQ site.

Credits

Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the ODC Open Database License (ODbL) as described here.

References

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 18;536(7616):285-91. PMID: 27535533; PMC: PMC5018207

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May;581(7809):434-443. PMID: 32461654; PMC: PMC7334197

Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, Khera AV, Lowther C, Gauthier LD, Wang H et al. A structural variation reference for medical and population genetics. Nature. 2020 May;581(7809):444-451. PMID: 32461652; PMC: PMC7334194

Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA, Singer-Berk M, Mudge JM, Karjalainen J, Satterstrom FK, O'Donnell-Luria AH et al. Transcript expression-aware annotation improves rare variant interpretation. Nature. 2020 May;581(7809):452-458. PMID: 32461655; PMC: PMC7334198