PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls
Top Cited Papers
- 4 January 2009
- journal article
- research article
- Published by Springer Nature in Nature Biotechnology
- Vol. 27 (1), 66-75
- https://doi.org/10.1038/nbt.1518
Abstract
Repetitive sequences and chromatin accessibility can confound scoring of chromatin immunoprecipitation data generated by high–throughput sequencing. Using data sets they produce for human RNA polymerase II and the transcription factor STAT1, Rozowsky et al. compensate for these biases by correcting for 'mappability' and normalizing the data against an input–DNA control. Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.Keywords
This publication has 24 references indexed in Scilit:
- Mapping short DNA sequencing reads and calling variants using mapping quality scoresGenome Research, 2008
- Mapping and sequencing of structural variation from eight human genomesNature, 2008
- SOAP: short oligonucleotide alignment programBioinformatics, 2008
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologiesGenome Research, 2007
- Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencingNature Methods, 2007
- FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatinGenome Research, 2006
- A Direct Approach to False Discovery RatesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002
- The Human Genome Browser at UCSCGenome Research, 2002
- Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBFNature, 2001