PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Top Cited Papers

4 January 2009

journal article
research article
Published by Springer Nature in Nature Biotechnology

Vol. 27 (1), 66-75
https://doi.org/10.1038/nbt.1518

Abstract

Repetitive sequences and chromatin accessibility can confound scoring of chromatin immunoprecipitation data generated by high–throughput sequencing. Using data sets they produce for human RNA polymerase II and the transcription factor STAT1, Rozowsky et al. compensate for these biases by correcting for 'mappability' and normalizing the data against an input–DNA control. Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

Keywords

This publication has 24 references indexed in Scilit:

Mapping short DNA sequencing reads and calling variants using mapping quality scores
Genome Research, 2008
Mapping and sequencing of structural variation from eight human genomes
Nature, 2008
SOAP: short oligonucleotide alignment program
Bioinformatics, 2008
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
Nature, 2007
Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologies
Genome Research, 2007
Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing
Nature Methods, 2007
FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin
Genome Research, 2006
A Direct Approach to False Discovery Rates
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2002
The Human Genome Browser at UCSC
Genome Research, 2002
Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF
Nature, 2001

Cited by 527 articles