ENC RNA-seq CSHL Long RNA-seq Track Settings
 
Long RNA-seq from ENCODE/Cold Spring Harbor Lab

Track collection: ENCODE RNA-seq

+  Description
+  All tracks in this collection (5)

Maximum display mode:       Reset to defaults   
Select views (Help):
Contigs ▾       Plus Signal ▾       Minus Signal ▾       Splice Junctions ▾       Alignments ▾      
Select subtracks by localization and cell line:
 All Localization Whole Cell  Chromatin  Cytosol  Nucleolus  Nucleoplasm  Nucleus 
Cell Line
GM12878 (Tier 1) 
H1-hESC (Tier 1) 
K562 (Tier 1) 
A549 (Tier 2) 
B cells CD20+ (Tier 2) 
HeLa-S3 (Tier 2) 
HepG2 (Tier 2) 
HUVEC (Tier 2) 
IMR90 (Tier 2) 
MCF-7 (Tier 2) 
Monocytes CD14+ (Tier 2) 
SK-N-SH (Tier 2) 
AG04450 
BJ 
CD34+ Mobilized 
HAoAF 
HAoEC 
HCH 
HFDPC 
HMEC 
HMEpC 
hMNC-PB 
hMSC-AT 
hMSC-BM 
hMSC-UC 
HOB 
HPC-PL 
HPIEpC 
HSaVEC 
HSMM 
HVMF 
HWP 
NHDF 
NHEK 
NHEM.f M2 
NHEM M2 
NHLF 
SkMC 
SK-N-SH RA 
Cell Line
 All Localization Whole Cell  Chromatin  Cytosol  Nucleolus  Nucleoplasm  Nucleus 
Select subtracks further by: (select multiple categories and items - Help)
RNA Extract:
Replicate rank:

List subtracks: only selected/visible    all    ()
  Cell Line↓1 Localization↓2 RNA Extract↓3 views↓4 Replicate rank↓5   Track Name↓6    Restricted Until↓7
 
hide
 Configure
 GM12878  Whole Cell  PolyA+  Plus Signal  1st  GM12878 whole cell polyA+ RNA-seq Plus signal Rep 1 from ENCODE/CSHL    Data format   2011-06-29 
 
hide
 Configure
 GM12878  Whole Cell  PolyA+  Plus Signal  2nd  GM12878 whole cell polyA+ RNA-seq Plus signal Rep 2 from ENCODE/CSHL    Data format   2011-06-30 
 
hide
 Configure
 GM12878  Whole Cell  PolyA+  Minus Signal  1st  GM12878 whole cell polyA+ RNA-seq Minus signal Rep 1 from ENCODE/CSHL    Data format   2011-06-29 
 
hide
 Configure
 GM12878  Whole Cell  PolyA+  Minus Signal  2nd  GM12878 whole cell polyA+ RNA-seq Minus signal Rep 2 from ENCODE/CSHL    Data format   2011-06-30 
 
hide
 Configure
 GM12878  Whole Cell  PolyA+  Contigs  Pooled  GM12878 whole cell polyA+ RNA-seq Contigs Pooled from ENCODE/CSHL    Data format   2011-11-18 
 
hide
 Configure
 GM12878  Whole Cell  PolyA-  Plus Signal  1st  GM12878 whole cell polyA- RNA-seq Plus signal Rep 1 from ENCODE/CSHL    Data format   2011-10-11 
 
hide
 Configure
 GM12878  Whole Cell  PolyA-  Plus Signal  2nd  GM12878 whole cell polyA- RNA-seq Plus signal Rep 2 from ENCODE/CSHL    Data format   2011-06-29 
 
hide
 Configure
 GM12878  Whole Cell  PolyA-  Minus Signal  1st  GM12878 whole cell polyA- RNA-seq Minus signal Rep 1 from ENCODE/CSHL    Data format   2011-10-11 
 
hide
 Configure
 GM12878  Whole Cell  PolyA-  Minus Signal  2nd  GM12878 whole cell polyA- RNA-seq Minus signal Rep 2 from ENCODE/CSHL    Data format   2011-06-29 
 
hide
 Configure
 GM12878  Whole Cell  PolyA-  Contigs  Pooled  GM12878 whole cell polyA- RNA-seq Contigs Pooled from ENCODE/CSHL    Data format   2011-11-18 
     Restriction Policy
Assembly: Human Feb. 2009 (GRCh37/hg19)

Description

These tracks were generated by the ENCODE Consortium. They contain information about human RNAs greater than 200 nucleotides in length that were obtained as short reads from the Illumina GAIIx platform. Data are available from biological replicates of several cell lines. In addition to profiling Poly-A+ and Poly-A- RNA from whole cells, there are also data from various subcellular compartments. In many cases, there are Cap Analysis of Gene Expression (CAGE, see the RIKEN CAGE Loc track), Small RNA-seq (less than 200 nucleotides, see the CSHL Sm RNA-seq track) and Pair-End di-TAG-RNA (PET-RNA, see the GIS RNA PET track) datasets available from the same biological replicates.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here.

To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide.

Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments.

Contigs
The Contigs represent blocks of overlapping mapped reads from the pooled biological replicates. Specific column specifications can be found in the supplemental directory.
Signals
The Plus Signal and Minus Signal views show the density of mapped reads on the plus and minus strands (wiggle format), respectively.
Alignments
The Alignments view shows individual reads mapped from biological replicates to the genome and indicates where bases may mismatch. Every mapped read is displayed, i.e. uncollapsed. The alignment file follows the standard SAM format. See the SAM Format Specification for more information on the SAM/BAM file format.
Splice Junctions
Subset of aligned reads that cross splice junctions. Specific column specifications can be found in the supplemental directory.

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.

Additional views are available on the downloads page.

Methods

Cell Culture

Cells were grown according to the approved ENCODE cell culture protocols.

Library Preparation

The published cDNA sequencing protocol was used. This protocol generates directional libraries and reports the transcripts' strand of origin. Exogenous RNA spike-ins were added to each endogenous RNA isolate and carried through library construction and sequencing. The Illumina PhiX control library was also spiked-in at 1% to each completed human library just prior to cluster formation. Accompanying each RNA-seq dataset is a Protocol document available for download as a PDF. This document contains details about the RNA isolations and treatments, library construction, spike-ins as well as quality control figures for individual libraries. The spike-in sequence and the concentrations are available for download in the supplemental directory.

Sequencing and Mapping

The libraries were sequenced on the Illumina GAIIx platform as paired-ends for 76 or 101 cycles for each read. The average depth of sequencing was ~200 million reads (100 million paired-ends). The data were mapped against hg19 using Spliced Transcript Alignment and Reconstruction (STAR) written by Alex Dobin (CSHL). More information about STAR, including the parameters used for these data, is available from the Gingeras lab.

For each experiment there are additional element data views data files available for download. These elements were assessed for reproducibility using a nonparametric irreproducible detection (IDR) rate script. The IDR values for each element are included in the files for end-users to use as a threshold. An IDR value of 0.1 means that the probability of detecting that element in a third experiment equivalent in depth to the sum of the bioreplicates is 90%. In addition, expression values for annotated genes, transcripts and exons were computed. Further explanation of these files is available for download in the supplemental directory.

Verification

FPKM (fragments per kilobase of exon per million fragments mapped) values were calculated for annotated Gencode exons and Spearman values were compared. In general, Rho values are greater than .90 between biological replicates.

Release Notes

This is release 3 (Sept 2012) of this track for hg19. It has no new experiments, but has additional files for many experiments. The hMNC-CB experiment has been revoked. The doubly compressed spike-ins files have been uncompressed. The hMNC-PB experiment has been replaced with improved depth. The current downloadable elements files (Transcripts, Genes and Exons) were generated using GENCODE V10, while the older datasets were generated using GENCODE V7. The "view" metadata will specify V7 or V10 for these files.

Errata

6/6/2013 - CSHL reports that one lane of reads is missing from the SK-N-SH-RA fastq read2 file (wgEncodeCshlLongRnaSeqSknshraCellPapFastqRd2Rep1.fastq.gz).

Credits

These data were generated and analyzed by the transcriptome group led by Tom Gingeras at Cold Spring Harbor Laboratories and the laboratory of Roderic Guigo at the Center for Genomic Regulation (CRG) in Barcelona.

Contact: Carrie Davis

References

Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, Hallen L, Krobitsch S, Lehrach H, Soldatov A. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 2009 Oct;37(18):e123.

Publications

Cheng C, Alexander R, Min R, Leng J, Yip KY, Rozowsky J, Yan KK, Dong X, Djebali S, Ruan Y et al. Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res. 2012 Sep;22(9):1658-67.

Deng X, Hiatt JB, Nguyen DK, Ercan S, Sturgill D, Hillier LW, Schlesinger F, Davis CA, Reinke VJ, Gingeras TR et al. Evidence for compensatory upregulation of expressed X-linked genes in mammals, Caenorhabditis elegans and Drosophila melanogaster. Nat Genet. 2011 Oct 23;43(12):1179-85.

Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG et al. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. 2012 Sep;22(9):1775-89.

Dong X, Greven MC, Kundaje A, Djebali S, Brown JB, Cheng C, Gingeras TR, Gerstein M, Guigó R, Birney E et al. Modeling gene expression using chromatin features in various cellular contexts. Genome Biol. 2012 Sep 5;13(9):R53.

ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.

Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 2011 Sep;21(9):1543-51.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.