Description
This track shows annotations from The Gene Curation Coalition (GenCC).
The GenCC provides information pertaining to the validity of gene-disease relationships,
with a current focus on Mendelian diseases. Curated gene-disease relationships are submitted
by GenCC member organizations that currently provide online resources (e.g. ClinGen, DECIPHER,
Orphanet, etc.), as well as diagnostic laboratories that have committed to sharing their internal
curated gene-level knowledge (e.g. Ambry Genetics, Illumina, Invitae, etc.).
The GenCC aims to clarify overlap between gene curation efforts and develop
consistent terminology for validity, allelic requirement and mechanism
of disease. Each item on this track corresponds with a gene, and contains
a large number of information such as associated disease, evidence classification,
specific submission notes and identifiers from different databases. In cases where
multiple annotations exist for the same gene, multiple items are displayed.
Display Conventions and Configuration
Each item displayed represents a submission to the GenCC database. The displayed
name is a combination of the gene symbol and the disease's original submission ID.
This submission ID is either the OMIM#, MONDO# or Orphanet#. Clicking
on any item will display the complete meta data for that item, including
linkouts to the GenCC, NCBI, Ensembl, HGNC, GeneCards, Pombase (MONDO),
and Human Phenotype Ontology (HPO). Mousing over any item will display the
associated disease title, the classification title, and the mode of inheritance
title.
Items are colored based on the GenCC classification, or validation, of the
evidence in the color scheme seen in the table below.
For more information on this process, see the GenCC
validity terms FAQ. A filter for the track is also available
to display a subset of the items based on their classification.
Color |
Evidence classification |
| Definitive |
| Strong |
| Moderate |
| Supportive |
| Limited |
| Disputed Evidence |
| Refuted Evidence |
| No Known Disease Relationship |
Limitations: Most entries include both NM_ accessions as well as ENST and ENSG identifiers.
From the original file, which contains no coordinates, two genes were not mapped
to the hg38 genome, SLCO1B7 and ATXN8. This means that the hg38 track has 2 fewer items
than what can be found in the GenCC download file. For hg19, one additional
gene was not mapped, KCNJ18. In addition to this, the GenCC data in the Genome
Browser does not include OMIM data due to licensing restrictions. For more
information, see the Methods section below.
Data Access
The source data can be explored in
GenCC database. The source files can also be found on the GenCC downloads page.
The GenCC data on the UCSC Genome Browser can be explored interactively with the
Table Browser or the
Data Integrator.
For automated download and analysis, the genome annotation is stored at UCSC in bigBed
files that can be downloaded from
our download server.
The data may also be explored interactively using our
REST API.
The file for this track may also be locally explored using our tools bigBedToBed
which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here.
The tools can also be used to obtain features confined to a given range, e.g.,
bigBedToBed -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/genCC.bb stdout
Methods
The data were downloaded from the GenCC downloads page in tsv format. Manual
curation was performed on the file to remove newline characters and tab characters present in
the submission notes, in total fewer than 20 manual edits were made.
The track was first built on hg38 by associating the gene symbols with the NCBI MANE 1.0
release transcripts. These coordinates were added to the items as well as the NM_ accession,
ENST ID and ENSG ID. For items where there was no gene symbol match in MANE (~130), the gene
symbols were queried against GENCODEv40 comprehensive set release. In places where multiple
transcript matches were found, the earliest transcription start and latest end site was used
from among the transcripts to encompass the entire gene coordinates. Two genes were not able
to be mapped for hg38, SLCO1B7 and ATXN8, resulting in two missing submissions in the Genome
Browser when compared to the raw file. Lastly, the items were colored according to their
evidence classification as seen on the GenCC database.
For hg19, the hg38 NM_ accessions were used to convert the item coordinates according to the
latest hg19 refseq release. For items that failed to convert, the gene symbols were queried
using the GENCODEv40 hg19 lift comprehensive set. One additional gene symbol failed to map in
hg19, KCNJ18, leading to 3 fewer items on this track when compared to the raw file.
For both assemblies, GenCC OMIM data is excluded do to data restrictions.
For complete documentation of the processing of these tracks, read the
GenCC MakeDoc.
Credits
Thanks to the entire GenCC
committee for creating these annotations and making them available.
References
DiStefano MT, Goehringer S, Babb L, Alkuraya FS, Amberger J, Amin M, Austin-Tse C, Balzotti M, Berg
JS, Birney E et al.
The Gene Curation Coalition: A global effort to harmonize gene-disease evidence resources.
Genet Med. 2022 May 4;.
PMID: 35507016
|