Identifying novel constrained elements by exploiting biased substitution patterns
Open Access
- 27 May 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (12), i54-i62
- https://doi.org/10.1093/bioinformatics/btp190
Abstract
Motivation: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations. Results: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection. Availability: The algorithms are implemented in a Java software package, called SiPhy, freely available at http://www.broadinstitute.org/science/software/. Contact:xhx@ics.uci.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 26 references indexed in Scilit:
- Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammalsNature, 2009
- Analysis of Sequence Conservation at Nucleotide ResolutionPLoS Computational Biology, 2007
- 2× genomes—Does depth matter?: Table 1.Genome Research, 2007
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Exact and Heuristic Algorithms for the Indel Maximum Likelihood ProblemJournal of Computational Biology, 2007
- A distal enhancer and an ultraconserved exon are derived from a novel retroposonNature, 2006
- Distribution and intensity of constraint in mammalian genomic sequenceGenome Research, 2005
- A Model of the Statistical Power of Comparative Genome Sequence AnalysisPLoS Biology, 2005
- Aligning Multiple Genomic Sequences With the Threaded Blockset AlignerGenome Research, 2004
- Biological Sequence AnalysisPublished by Cambridge University Press (CUP) ,1998