Distribution and intensity of constraint in mammalian genomic sequence
Top Cited Papers
Open Access
- 17 June 2005
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 15 (7), 901-913
- https://doi.org/10.1101/gr.3577405
Abstract
Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the results of such an analysis on an alignment of sequences from 29 mammalian species. The alignment captures ∼3.9 neutral substitutions per site and spans ∼1.9 Mbp of the human genome. We identify constrained elements from 3 bp to over 1 kbp in length, covering ∼5.5% of the human locus. Our estimate for the total amount of nonexonic constraint experienced by this locus is roughly twice that for exonic constraint. Constrained elements tend to cluster, and we identify large constrained regions that correspond well with known functional elements. While constraint density inversely correlates with mobile element density, we also show the presence of unambiguously constrained elements overlapping mammalian ancestral repeats. In addition, we describe a number of elements in this region that have undergone intense purifying selection throughout mammalian evolution, and we show that these important elements are more numerous than previously thought. These results were obtained with Genomic Evolutionary Rate Profiling (GERP), a statistically rigorous and biologically transparent framework for constrained element identification. GERP identifies regions at high resolution that exhibit nucleotide substitution deficits, and measures these deficits as “rejected substitutions.” Rejected substitutions reflect the intensity of past purifying selection and are used to rank and characterize constrained elements. We anticipate that GERP and the types of analyses it facilitates will provide further insights and improved annotation for the human genome as mammalian genome sequence data become richer.Keywords
This publication has 62 references indexed in Scilit:
- Evidence for Widespread Degradation of Gene Control Regions in Hominid GenomesPLoS Biology, 2005
- Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolutionNature, 2004
- Highly Conserved Non-Coding Sequences Are Associated with Vertebrate DevelopmentPLoS Biology, 2004
- Genome sequence of the Brown Norway rat yields insights into mammalian evolutionNature, 2004
- Patterns of Insertions and Their Covariation With Substitutions in the Rat, Mouse, and Human GenomesGenome Research, 2004
- Conserved fragments of transposable elements in intergenic regions: evidence for widespread recruitment of MIR- and L2-derived sequences within the mouse and human genomesGenetics Research, 2003
- A vision for the future of genomics researchNature, 2003
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- Transcriptional Regulation of the Stem Cell Leukemia Gene (SCL) — Comparative Analysis of Five Vertebrate SCL LociGenome Research, 2002
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002