Description
This track contains chromatin interaction data from the University of Washington ENCODE
group generated using 5C (Chromatin Conformation Capture Carbon Copy).
The 5C method is used here to define short and long-range range interactions
between transcription start sites (TSS) and DNaseI hypersensitive sites (DHS) or other genomic features.
The 5C method is summarized below.
Transcription factors bind to promoter-associated proteins, bringing the associated DNA sequences in close proximity to each other. Cross linking the DNA and proteins immobilizes these interactions and thus maintains their close proximity. Cleavage of the sample with restriction endonuclease followed by ligation results in hybrid molecules where a fragment with a regulatory element is physically associated with a fragment containing a TSS. The interactions are then detected by oligonucleotide-dependent, ligation-mediated assays, where one set of primers is complementary to the end of fragments with a TSS and the second set of primers are complementary to fragments with a feature. Primers are designed to the forward strand of the feature and the reverse strand of the TSS so that ligation only occurs between TSS and feature, not between different features. Specific interactions are detected by massively parallel sequencing.
The data in this track comprises two different experiment types focusing on targeted
regions:
Gene-targeted project
Analysis of DNase I hypersensitive sites reveals many genes where there
are multiple sites restricted to the cell type where a protein
is observed to be expressed. These sites potentially identify regulatory sites for the gene.
This set of experiments attempts to observe interactions between these DHS sites and transcription starts
in 25 regions selected based on genes expressed in GM06990 (B-lymphocyte),
BJ (foreskin fibroblast), HepG2 (liver cancer cell line), or SK-N-SH_RA
(neuroblastoma cell line, SKNSH, differentiated with retinoic acid).
Myc project
Genome wide association studies have identified SNPs linked to prostate, colon, and
breast cancer in the gene desert region upstream of the myc gene.
5C of HindIII fragments interacting with those containing refSeq txStarts in
this region were performed in 5 cell types: GM12878 (B-lymphocyte),
CaCo2 (colon cancer cell line), LNCaP (prostate cancer cell line), MCF7
(breast cancer cell line), and K562 (erythroleukemia cell line).
File Conventions
The following types of data are available for download:
- Matrix
-
Interaction files are in a matrix format indicating interaction strength,
with "reverse primer name | genome version | reverse HindIII fragment coordinates" in
the top row and "forward primer name | genome version | forward primer fragment coordinates"
in the first column. The number of sequences mapped to each interaction fills the matrix.
In order to understand the Matrix data, you must download the associated primer data file.
- Primer
-
Primer data files include the sequences of the primers used in the experiments and
sequences for control sites in the ENCODE pilot ENr313 gene desert region on chr16.
These files are available for download in the supplemental materials.
-
- Raw Data
-
Sequencing files are provided in fastQ format.
Methods
Cells were grown according to the approved
ENCODE cell culture protocols.
The isolated nuclei were formaldehyde cross-linked. The DNA isolated from the nuclei was cleaved with restriction enzyme, ligated, and cross-links removed to create a 3C library (Dekker et al., 2002). Primers complementary to the TSS and feature were added, annealed and ligated to produce a 5C library (Dostie et al., 2006). The DNA fragments generated in the ligation mediated-reactions were partially digested with DNaseI, end-repaired and ligated to adapters, before sequencing.
The sequencing reads generated were mapped to the predicted ligation products.
The number of sequences mapping to predicted junction fragments were tabulated from sequencing runs.
The number of times a sequence was detected for a given interaction between a TSS and feature indicates the relative strength of the interation.
Gene-targeted project
Forward primers were
designed to HindIII sites in a 230-415 kb sequence centered on the DNase I hypersensitive sites
of interest. Reverse primers were designed to HindIII sites for all transcription starts extending
1 Mb on either side of the region targeted by the forward primer set. Matrix files are labeled by
the coordinates of the region covered by the forward primer set. These experiments were done in a
multiplex manner with the forward and reverse primers for all 25 regions mixed together in a single
reaction. Two replicates were performed for 4 cell lines for 25 regions.
High-throughput sequencing was performed on an ABI SOLiD instrument collecting 50 bp reads.
The interaction files provided map all the reads in the output sequence without a mismatch threshold.
Myc project
Forward primers were designed
to HindIII fragments of 4.29 Mb section of human chromosome 8 centered on the gene desert 5~R of the
myc gene. Reverse primers were designed to all HindIII fragments containing refseq txStarts in a 7.6 Mb
region extending > 2 Mb on either side of the forward primer set.
High-throughput sequencing was performed on an ABI SOLiD instrument collecting 50 bp reads.
The interaction files provided map all the reads in the output sequence without a mismatch threshold.
Verification
Data were verified by sequencing biological replicates displaying correlation coefficient > 0.9.
Credits
These data were generated by the University of Washington ENCODE Group.
Contact: Richard Sandstrom
References
Dekker J, Rippe K, Dekker M, Kleckner N.
Capturing chromosome conformation.
Science 2002 Feb 15;295(5558):1306-11.
Dostie J, Richmond TA, Arnaout RA, Selzer RR, Lee WL, Honan TA, Rubio ED, Krumm A, Lamb J, Nusbaum C et al.
Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements.
Genome Res 2006 Oct;16(10):1299-309.
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior
consent, submit publications that use an unpublished ENCODE dataset until
nine months following the release of the dataset. This date is listed in
the Restricted Until column, above. The full data release policy
for ENCODE is available
here.
|