Description
This container track represents dosage sensitivity map data from Collins et al 2022. There are
two tracks, one corresponding to the probability of haploinsufficiency (pHaplo) and
one to the probability of triplosensitivity (pTriplo).
Rare copy-number variants (rCNVs) include deletions and duplications that occur
infrequently in the global human population and can confer substantial risk for
disease. Collins et al aimed to quantify the properties of haploinsufficiency (i.e.,
deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout
the human genome by analyzing rCNVs from nearly one million individuals to construct a
genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage
sensitive segments associated with at least one disorder. These segments were typically
gene-dense and often harbored dominant dosage sensitive driver genes. An ensemble
machine learning model was built to predict dosage sensitivity probabilities (pHaplo &
pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559
triplosensitive genes, including 648 that were uniquely triplosensitive.
Display Conventions and Configuration
Each of the tracks is displayed with a distinct item (bed track) covering the entire gene locus wherever
a score was available. Clicking on an item provides a link to DECIPHER which contains the sensitivity scores as well as
additional information. Mousing over the items will display the gene symbol, the ESNG ID for that gene,
and the respective sensitivity score for the track rounded to two decimal places. Filters are
also available to specify specific score thresholds to display for each of the tracks.
Coloring and Interpretation
Each of the tracks is colored based on standardized cutoffs for pHaplo and pTriplo as described by the
authors:
pHaplo scores ≥0.86 indicate that the average effect sizes of deletions are as strong as
the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7)
(Karczewski et al., 2020).
pHaplo scores ≥0.55 indicate an odds ratio ≥2.
pTriplo scores ≥0.94 indicate that the average effect sizes of deletions are as strong as
the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7)
(Karczewski et al., 2020).
pHaplo scores ≥0.68 indicate an odds ratio ≥2.
Applying these cutoffs defined 2,987 haploinsufficient (pHaplo≥0.86) and 1,559
triplosensitive (pTriplo≥0.94) genes with rCNV effect sizes comparable to loss-of-function
of gold-standard PTV-constrained genes.
See below for a summary of the color scheme:
- Dark red items - pHaplo ≥ 0.86
- Bright red items - pHaplo < 0.86
- Dark blue items - pTriplo ≥ 0.94
- Bright blue items - pTriplo < 0.94
Methods
The data were downloaded from Zenodo which consisted of a 3-column file with
gene symbols, pHaplo, and pTriplo scores. Since the data were created using
GENCODEv19 models, the hg19 data was mapped using those coordinates by picking the earliest
transcription start site of all of the respective gene transcripts and the furthest
transcription end site. This leads to some gene boundaries that are not representative of a real
transcript, but since the data are for gene loci annotations this maximum coverage was used.
Finally, both scores were rounded to two decimal points for easier interpretation.
For hg38, we attempted to use updated gene positions using a few different datasets since
gene symbols have been updated many times since GENCODEv19. A summary of the workflow
can be seen below, with each subsequent step being used only for genes where mapping failed:
- Gene symbols were mapped using MANE1.0. < 2000 items failed mapping here.
- Mapping with GENCODEv45 was attempted.
- Mapping with GENCODEv20 was attempted. At this point, 448 items were not mapped.
- Finally, any missing items were lifted using the hg19 track. 19/448 items failed
mapping due to their regions having been split from hg19 to hg38.
In summary, the hg19 track was mapped using the original GENCODEv19 mappings, and a series
of steps were taken to map the hg38 gene symbols with updated coordinates. 19/18641 items
could not be mapped and are missing from the hg38 tracks.
The complete
makeDoc can be found online. This includes all of the track creation steps.
Data Access
The raw data can be explored interactively with the Table Browser, or
the Data Integrator. For automated access, this track, like all
others, is available via our API. However, for bulk
processing, it is recommended to download the dataset.
For automated download and analysis, the genome annotation is stored at UCSC in bigBed
files that can be downloaded from
our download server.
Individual regions or the whole genome annotation can be obtained using our tool
bigBedToBed which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here.
The tools can also be used to obtain features confined to a given range, e.g.,
bigBedToBed -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/hg19/bbi/dosageSensitivityCollins2022/pHaploDosageSensitivity.bb stdout
Please refer to our
Data Access FAQ
for more information.
Credits
Thanks to DECIPHER for their support and assistance with the data. We would also like to
thank Anna Benet-Pagès for suggesting and assisting in track development and interpretation.
References
Collins RL, Glessner JT, Porcu E, Lepamets M, Brandon R, Lauricella C, Han L, Morley T, Niestroj LM,
Ulirsch J et al.
A cross-disorder dosage sensitivity map of the human genome.
Cell. 2022 Aug 4;185(16):3041-3055.e25.
PMID: 35917817; PMC: PMC9742861
|