GERP Track Settings
 
GERP scores for mammalian alignments   (All Comparative Genomics tracks)

Display mode:      Duplicate track

Type of graph:
Track height: pixels (range: 20 to 128)
Data view scaling: Always include zero: 
Vertical viewing range: min:  max:   (range: -10 to 5)
Transform function:Transform data points by: 
Windowing function: Smoothing window:  pixels
Negate values:
Draw y indicator lines:at y = 0.0:    at y =
Graph configuration help
Data schema/format description and download
Assembly: Mouse July 2007 (NCBI37/mm9)
Data last updated at UCSC: 2011-04-14 17:10:02

Description

Genomic Evolutionary Rate Profiling (GERP) is a method for producing position-specific estimates of evolutionary constraint using maximum likelihood evolutionary rate estimation. It also discovers "constrained elements" where multiple positions combine to give a signal that is indicative of a putative functional element; this track shows the position-specific scores only, not the element predictions.

Constraint intensity at each individual alignment position is quantified in terms of a "rejected substitutions" (RS) score, defined as the number of substitutions expected under neutrality minus the number of substitutions "observed" at the position. This concept was described, and a first implementation of GERP was presented, in Cooper et al (2005). GERP++ as described in Davydov et al (2010) uses a more rigorous set of algorithms to calculate site-specific RS scores and to discover evolutionarily constrained elements.

Sites are scored independently. Positive scores represent a substitution deficit (i.e., fewer substitutions than the average neutral site) and thus indicate that a site may be under evolutionary constraint. Negative scores indicate that a site is probably evolving neutrally; negative scores should not be interpreted as evidence of accelerated rates of evolution because of too many strong confounders, such as alignment uncertainty or rate variance. Positive scores scale with the level of constraint, such that the greater the score, the greater the level of evolutionary constraint inferred to be acting on that site.

We applied GERP, as implemented in the GERP++ software package, to quantify the level of evolutionary constraint acting on each site in mm9, based on an alignment of 22 mammals to mm9 with a maximum phylogenetic scope of 4.14 substitutions per neutral site. Gaps in the alignment are treated as missing data, which means that the number of substitutions per neutral site will be less than 4.14 in sites where one or more species has a gap. Thus, RS scores range from a maximum of 4.14 down to a below-zero minimum, which we cap at -8.28. RS scores will vary with alignment depth and level of sequence conservation. A score of 0 indicates that the alignment was too shallow at that position to get a meaningful estimate of constraint. Should classification into "constrained" and "unconstrained" sites be desired, a threshold may be chosen above which sites are considered "constrained". In practice, we find that a RS score threshold of 2 provides high sensitivity while still strongly enriching for truly constrained sites.

Methods

Given a multiple sequence alignment and a phylogenetic tree with branch lengths representing the neutral rate between the species within that alignment, GERP++ quantifies constraint intensity at each individual position in terms of rejected substitutions, the difference between the neutral rate and the estimated evolutionary rate at the position. GERP++ begins with a pre-defined neutral tree relating the genomes present within the alignment that supplies both the total neutral rate across the entire tree and the relative length of each individual branch. For each alignment column, we estimate a scaling factor, applied uniformly to all branches of the tree, that maximizes the probability of the observed nucleotides in the alignment column. The product of the scaling factor and the neutral rate defines the 'observed' rate of evolution at each position. GERP++ uses the HKY85 model of evolution with the transition/transversion ratio set to 2.0 and nucleotide frequencies estimated from the multiple alignment.

To generate RS scores for each position in the mouse genome, we used GERP++ to analyze the TBA alignment of mm9 to 22 other mammalian species (the most distant mammalian species being platypus) spanning over 2.5 billion positions (see the description for the 'Conservation' track for details of this alignment). The alignment was compressed to remove gaps in the human sequence, and GERP++ scores were computed for every position with at least 3 ungapped species present. Importantly, the human sequence was removed from the alignment and not included in either the neutral rate estimation or the site-specific "observed" estimates, and therefore is not included in the RS score. This is consistent with the published work on GERP, and is done to eliminate the confounding influence of deleterious derived alleles segregating in the human population that are present in the reference sequence. The phylogenetic tree used was the generally accepted topology. Neutral branch lengths were estimated from 4-fold degenerate sites in the alignment.

Credits

The RS scores were generated by David Goode, Dept. of Genetics, Stanford University. GERP++ was developed by Eugene Davydov and Serafim Batzoglou, Dept. of Computer Science, Stanford University; Arend Sidow, Depts. of Pathology and Genetics, Stanford University; and Gregory Cooper, HudsonAlpha Institute for Biotechnology, Huntsville, AL.

References

Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program, Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005 Jul;15(7):901-13. PMID: 15965027; PMC: PMC1172034

Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol. 2010 Dec 2;6(12):e1001025. PMID: 21152010; PMC: PMC2996323

For more information on using GERP to detect putatively functional genetic variation:

Cooper GM, Goode DL, Ng SB, Sidow A, Bamshad MJ, Shendure J, Nickerson DA. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat Methods. 2010 Apr;7(4):250-1. PMID: 20354513; PMC: PMC3145250

Goode DL, Cooper GM, Schmutz J, Dickson M, Gonzales E, Tsai M, Karra K, Davydov E, Batzoglou S, Myers RM et al. Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Res. 2010 Mar;20(3):301-10. PMID: 20067941; PMC: PMC2840986