PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
Top Cited Papers
Open Access
- 14 June 2011
- journal article
- conference paper
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (13), i275-i282
- https://doi.org/10.1093/bioinformatics/btr209
Abstract
Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. Availability and Implementation: The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF Contact:mlin@mit.edu; manoli@mit.eduKeywords
This publication has 31 references indexed in Scilit:
- Extensive and coordinated transcription of noncoding RNAs within cell-cycle promotersNature Genetics, 2011
- RNA sequencing: advances, challenges and opportunitiesNature Reviews Genetics, 2010
- Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAsNature Biotechnology, 2010
- Evolution of pathogenicity and sexual reproduction in eight Candida genomesNature, 2009
- Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammalsNature, 2009
- Models of coding sequence evolutionBriefings in Bioinformatics, 2008
- Distinguishing protein-coding and noncoding genes in the human genomeProceedings of the National Academy of Sciences, 2007
- Evolution of genes and genomes on the Drosophila phylogenyNature, 2007
- Discovery of functional elements in 12 Drosophila genomes using evolutionary signaturesNature, 2007
- Sequencing and comparison of yeast species to identify genes and regulatory elementsNature, 2003