Factors Influencing the Identification of Transcription Factor Binding Sites by Cross-Species Comparison
- 13 September 2002
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (10), 1523-1532
- https://doi.org/10.1101/gr.323602
Abstract
As the number of sequenced genomes has grown, the questions of which species are most useful and how many genomes are sufficient for comparison have become increasingly important for comparative genomics studies. We have systematically addressed these questions with respect to phylogenetic footprinting of transcription factor (TF) binding sites in the γ-proteobacteria, and have evaluated the statistical significance of our motif predictions. We used a study set of 166Escherichia coli genes that have experimentally identified TF binding sites upstream of the gene, with orthologous data from nine additional γ-proteobacteria for phylogenetic footprinting. Just three species were sufficient for ∼74.0% of the motif predictions to correspond to the experimentally reported E. coli sites, and important characteristics to consider when choosing species were phylogenetic distance, genome size, and natural habitat. We also performed simulations using randomized data to determine the critical maximum a posteriori probability (MAP) values for statistical significance of our motif predictions (P = 0.05). Approximately 60% of motif predictions containing sites from just three species had average MAP values above these critical MAP values. The inclusion of a species very closely related to E. coliincreased the number of statistically significant motif predictions, despite substantially increasing the critical MAP value.[Supplemental material is available online athttp://www.genome.org. In addition, our motif predictions for the study set and the entire E. coli genome are available online athttp://www.wadsworth.org/resnres/bioinfo/.]Keywords
This publication has 40 references indexed in Scilit:
- Algorithms for Phylogenetic FootprintingJournal of Computational Biology, 2002
- The Evolution of DNA Regulatory Regions for Proteo-Gamma Bacteria by Interspecies ComparisonsGenome Research, 2002
- Transcriptional regulation of pentose utilisation systems in the Bacillus/Clostridium group of bacteriaFEMS Microbiology Letters, 2001
- Computational analysis of the transcriptional regulation of pentose utilization systems in the gamma subdivision of ProteobacteriaFEMS Microbiology Letters, 2001
- Multimerization of Phosphorylated and Non-phosphorylated ArcA Is Necessary for the Response Regulator Function of the Arc Two-component Signal Transduction SystemJournal of Biological Chemistry, 2001
- Surveying Saccharomyces Genomes to Identify Functional Elements by Comparative DNA Sequence AnalysisGenome Research, 2001
- A Comparative Genomics Approach to Prediction of New Members of RegulonsGenome Research, 2001
- Borrelia burgdorferi and Treponema pallidum: a comparison of functional genomics, environmental adaptations, and pathogenic mechanismsJournal of Clinical Investigation, 2001
- Who's your neighbor? New computational approaches for functional genomicsNature Biotechnology, 2000
- So many genomes, so little timeNature Biotechnology, 2000