Description
These tracks contain cDNA and gene alignments produced by
the TransMap cross-species alignment algorithm
from other vertebrate species in the UCSC Genome Browser.
For closer evolutionary distances, the alignments are created using
syntenically filtered LASTZ or BLASTZ alignment chains, resulting
in a prediction of the orthologous genes in human. For more distant
organisms, reciprocal best alignments are used.
TransMap maps genes and related annotations in one species to another
using synteny-filtered pairwise genome alignments (chains and nets) to
determine the most likely orthologs. For example, for the mRNA TransMap track
on the human assembly, more than 400,000 mRNAs from 25 vertebrate species were
aligned at high stringency to the native assembly using BLAT. The alignments
were then mapped to the human assembly using the chain and net alignments
produced using BLASTZ, which has higher sensitivity than BLAT for diverged
organisms.
Compared to translated BLAT, TransMap finds fewer paralogs and aligns more UTR
bases.
Display Conventions and Configuration
This track follows the display conventions for
PSL alignment tracks.
This track may also be configured to display codon coloring, a feature that
allows the user to quickly compare cDNAs against the genomic sequence. For more
information about this option, click
here.
Several types of alignment gap may also be colored;
for more information, click
here.
Methods
- Source transcript alignments were obtained from vertebrate organisms
in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank
mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,
were used as available.
- For all vertebrate assemblies that had BLASTZ alignment chains and
nets to the human (hg19) genome, a subset of the alignment chains were
selected as follows:
- For organisms whose branch distance was no more than 0.5
(as computed by phyloFit, see Conservation track description for details),
syntenic filtering was used. Reciprocal best nets were used if available;
otherwise, nets were selected with the netfilter -syn command.
The chains corresponding to the selected nets were used for mapping.
- For more distant species, where the determination of synteny is difficult,
the full set of chains was used for mapping. This allows for more genes to
map at the expense of some mapping to paralogous regions. The
post-alignment filtering step removes some of the duplications.
- The pslMap program was used to do a base-level projection of
the source transcript alignments via the selected chains
to the human genome, resulting in pairwise alignments of the source transcripts to
the genome.
- The resulting alignments were filtered with pslCDnaFilter
with a global near-best criteria of 0.5% in finished genomes
(human and mouse) and 1.0% in other genomes. Alignments
where less than 20% of the transcript mapped were discarded.
To ensure unique identifiers for each alignment, cDNA and gene accessions were
made unique by appending a suffix for each location in the source genome and
again for each mapped location in the destination genome. The format is:
accession.version-srcUniq.destUniq
Where srcUniq is a number added to make each source alignment unique, and
destUniq is added to give the subsequent TransMap alignments unique
identifiers.
For example, in the cow genome, there are two alignments of mRNA BC149621.1.
These are assigned the identifiers BC149621.1-1 and BC149621.1-2.
When these are mapped to the human genome, BC149621.1-1 maps to a single
location and is given the identifier BC149621.1-1.1. However, BC149621.1-2
maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note
that multiple TransMap mappings are usually the result of tandem duplications, where both
chains are identified as syntenic.
Data Access
The raw data for these tracks can be accessed interactively through the
Table Browser or the
Data Integrator.
For automated analysis, the annotations are stored in
bigPsl files (containing a
number of extra columns) and can be downloaded from our
download server,
or queried using our API. For more
information on accessing track data see our
Track Data Access FAQ.
The files are associated with these tracks in the following way:
- TransMap Ensembl - hg19.ensembl.transMapV5.bigPsl
- TransMap RefGene - hg19.refseq.transMapV5.bigPsl
- TransMap RNA - hg19.rna.transMapV5.bigPsl
- TransMap ESTs - hg19.est.transMapV5.bigPsl
Individual regions or the whole genome annotation can be obtained using our tool
bigBedToBed, which can be compiled from the source code or downloaded as
a precompiled binary for your system. Instructions for downloading source code and
binaries can be found
here.
The tool can also be used to obtain only features within a given range, for example:
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/transMap/V5/hg19.refseq.transMapV5.bigPsl
-chrom=chr6 -start=0 -end=1000000 stdout
Credits
This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data
submitted to the international public sequence databases by
scientists worldwide and annotations produced by the RefSeq,
Ensembl, and GENCODE annotations projects.
References
Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S,
Lau C et al.
Targeted discovery of novel human exons by comparative genomics.
Genome Res. 2007 Dec;17(12):1763-73.
PMID: 17989246; PMC: PMC2099585
Stanke M, Diekhans M, Baertsch R, Haussler D.
Using native and syntenically mapped cDNA alignments to improve de novo gene finding.
Bioinformatics. 2008 Mar 1;24(5):637-44.
PMID: 18218656
Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D.
Comparative genomics search for losses of long-established genes on the human lineage.
PLoS Comput Biol. 2007 Dec;3(12):e247.
PMID: 18085818; PMC: PMC2134963
|
|