ENC RNA-seq HAIB RNA-seq Track Settings
 
RNA-seq from ENCODE/HAIB

Track collection: ENCODE RNA-seq

+  Description
+  All tracks in this collection (5)

Maximum display mode:       Reset to defaults   
Select views (Help):
Raw Signal ▾       Alignments ▾      
Select subtracks by treatment and cell line:

  Replicate: 1 2 3 4
 All
Treatment
Treatment
All 
Cell Line










Cell Line
A549 (Tier 2)   A549 (Tier 2)
SK-N-SH (Tier 2)   SK-N-SH (Tier 2)
BE2 C   BE2 C
ECC-1   ECC-1
Jurkat   Jurkat
PANC-1   PANC-1
PFSK-1   PFSK-1
T-47D   T-47D
U87   U87
List subtracks: only selected/visible    all    ()
  Cell Line↓1 Treatment↓2 Replicate↓3 views↑4   Track Name↓5    Restricted Until↓6
 
dense
 Configure
 A549  DEX 1hr 100nM  1  Raw Signal  A549 DEX 1 hr 100 nM RNA-seq Raw Signal Rep 1 from ENCODE/HAIB    Data format   2011-10-21 
 
dense
 Configure
 A549  DEX 1hr 100pM  1  Raw Signal  A549 DEX 1 hr 100 pM RNA-seq Raw Signal Rep 1 from ENCODE/HAIB    Data format   2011-10-21 
 
dense
 Configure
 A549  DEX 1hr 1nM  1  Raw Signal  A549 DEX 1 hr 1 nM RNA-seq Raw Signal Rep 1 from ENCODE/HAIB    Data format   2011-10-21 
 
dense
 Configure
 A549  DEX 1hr 500pM  1  Raw Signal  A549 DEX 1 hr 500 pM RNA-seq Raw Signal Rep 1 from ENCODE/HAIB    Data format   2011-10-21 
 
dense
 Configure
 A549  DEX 1hr 5nM  1  Raw Signal  A549 DEX 1 hr 5 nM RNA-seq Raw Signal Rep 1 from ENCODE/HAIB    Data format   2011-10-21 
 
dense
 Configure
 A549  ETOH 1hr 0.02pct  1  Raw Signal  A549 ETOH 1 hr 0.02% RNA-seq Raw Signal Rep 1 from ENCODE/HAIB    Data format   2011-10-21 
     Restriction Policy
Assembly: Human Feb. 2009 (GRCh37/hg19)

Description

This track was produced as part of the ENCODE Project. RNA-seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly (Mortazavi et al., 2008). Biological replicates of ENCODE cell lines were grown on separate culture plates, total RNA was purified and polyA selected two times. The mRNA extract was then fragmented by magnesium-catalyzed hydrolysis and reverse transcribed to cDNA by random priming and amplification. The cDNA was sequenced on an Illumina Genome Analyzer (GAI or GAIIx).

The DNA sequences were aligned to the NCBI Build37 (hg19) version of the human genome using the sequence alignment programs ELAND (Illumina) or Bowtie (Langmead et al., 2009). The first 10 residues of sequencing have a weak characteristic nucleotide bias of unknown origin. This RNA-seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks (cell lines, replicates and growth conditions) that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track:

Alignments
The Alignments view shows reads mapped to the genome. See the Bowtie Manual for more information about the SAM Bowtie output (including tag definitions) and the SAM Format Specification for more information on the SAM/BAM file format.
The reads are named using the following convention:
Lane #:Tile #:X-coordinate:Y-coordinate
Raw Signal
Density graph of signal enrichment based on a normalized aligned read density (Read Per Million, RPM). RPM is reported in the score field and is equal to the number of reads at that position divided by the total number of reads divided by one million. The Raw Signal view displays dense, continuous data as a graph and the RPM measure assists in visualizing the relative amount of a given transcript across multiple samples.

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.

Methods

Experimental Procedures

Cells were grown according to the approved ENCODE cell culture protocols. Cells were lysed in RLT buffer (Qiagen RNEasy kit) and processed on RNEasy midi columns according to the manufacturer's protocol, with the inclusion of the "on-column" DNase digestion step to remove residual genomic DNA. The mRNA was isolated from at least 10 ug of total RNA with oligo(dT) two times (Dynabeads mRNA PurificationgKit, Invitrogen). Alternatively, cells were lysed and mRNA was purified directly two times with oligo(dT) (Dynabeads mRNA DIRECT Kit, Invitrogen). A quantity of 100 ng of mRNA was fragmented by magnesium-catalyzed hydrolysis and reverse transcribed to cDNA by random priming according to the protocol in Mortazavi et al. (2008). The cDNA was prepared for sequencing on the Genome Analyzer flowcell according to the protocol for the ChIPSeq DNA genomic DNA kit (Illumina). The sequencing libraries were size-selected around 225 bp and amplified with 15 rounds of PCR.

Libraries were sequenced with an Illumina Genome Analyzer I or an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Single end reads of 36 nt in length were obtained.

Data Processing and Analysis

FastQ files were made from qseq files generated by the Illumina pipeline (Casava 1.7). The Raw Signal files (bigWig) were generated from bedgraph files and the score was calculated as the number of reads at that position divided by the total number of reads divided by one million.

Casava export files were aligned to the NCBI Build37 (hg19) version of the human genome with ELAND (Illumina), generating SAM files. FastQ files of experiments that were previously aligned to NCBI Build36 (hg18) were aligned to NCBI Build37 (hg19) using Bowtie (Langmead et al., 2009; parameters: -S -n 2 -k 11 -m 10 --best), also generating SAM files. SAM files were converted to BAM files with SAMtools (Li et al., 2009).

Gene expression within GENCODE V7 (Harrow et al., 2006) gene models was estimated using Cufflinks v0.9.3 (Roberts et al., 2011). Estimates of transcript abundance were reported in Fragments Per Kilobase of exon per Million fragments mapped (FPKM). FPKM is calculated by dividing the total number of fragments that align to the gene model by the size of the spliced transcript (exons) in kilobases. This number is then divided by the total number of reads in millions for the experiment. FPKM is reported in the last column of the GTF (TranscriptGencV7) files.

Raw Data (fastQ), Raw Signal (bigWig), Alignments (BAM) and Transcript GENCODE V7 (GTF) files are available from the Downloads page.

Verification

  • The mapped data were visually inspected to verify the majority of the reads fell within known exons.
  • Biological replicates confirm expression measurements with r > 0.90.

Release Notes

Update (May 2012): the labels of the Raw Signal subtracks have been updated because they were originally labeled as Signals instead of Raw Signals.

This is the first NCBI Build37 (hg19) release of this track (Feb 2012).
This release includes the 3 datasets (Jurkat, A549/DEX100nm, and A549/EtOH2pct) previously released on NCBI Build36 (hg18) and adds data for several more cell types and growth conditions in replicate. Four types of download files are available for each replicate including the Raw Data (fastQ), Transcript GENCODE V7 (GTF), Raw Signal (bigWig), and Alignments (BAM).

Credits

These data were produced by the Dr. Richard Myers Lab at the HudsonAlpha Institute for Biotechnology.

Contact: Dr. Florencia Pauli

References

Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9.

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 Jul;5(7):621-8.

Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 2011;12(3):R22.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.