Distinguishing Regulatory DNA From Neutral Sites
Open Access
- 1 January 2003
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (1), 64-72
- https://doi.org/10.1101/gr.817703
Abstract
We explore several computational approaches to analyzing interspecies genomic sequence alignments, aiming to distinguish regulatory regions from neutrally evolving DNA. Human–mouse genomic alignments were collected for three sets of human regions: (1) experimentally defined gene regulatory regions, (2) well-characterized exons (coding sequences, as a positive control), and (3) interspersed repeats thought to have inserted before the human–mouse split (a good model for neutrally evolving DNA). Models that potentially could distinguish functional noncoding sequences from neutral DNA were evaluated on these three data sets, as well as bulk genome alignments. Our analyses show that discrimination based on frequencies of individual nucleotide pairs or gaps (i.e., of possible alignment columns) is only partially successful. In contrast, scoring procedures that include the alignment context, based on frequencies of short runs of alignment columns, dramatically improve separation between regulatory and neutral features. Such scoring functions should aid in the identification of putative regulatory regions throughout the human genome.Keywords
This publication has 29 references indexed in Scilit:
- Covariation in Frequencies of Substitution, Deletion, Transposition, and Recombination During Eutherian EvolutionGenome Research, 2003
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- Harvesting the mouse genomeComparative and Functional Genomics, 2002
- The Human Genome Browser at UCSCGenome Research, 2002
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002
- Initial sequencing and analysis of the human genomeNature, 2001
- Identification of regulatory regions which confer muscle-specific gene expressionJournal of Molecular Biology, 1998
- Locus control regions of mammalian β-globin gene clusters: combining phylogenetic analyses and experimental results to gain functional insightsGene, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997