Using Multiple Alignments to Improve Gene Prediction
- 1 March 2006
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 13 (2), 379-393
- https://doi.org/10.1089/cmb.2006.13.379
Abstract
The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN can model the phylogenetic relationships between the aligned genome sequences, contextdependent substitution rates, and insertions and deletions. An implementation of N-SCAN was created and used to generate predictions for the entire human genome and the genome of the fruit fly Drosophila melanogaster. Analyses of the predictions reveal that N-SCAN's accuracy in both human and fly exceeds that of all previously published whole-genome de novo gene predictors.Keywords
This publication has 16 references indexed in Scilit:
- Gene prediction and verification in a compact genome with numerous small intronsGenome Research, 2004
- Aligning Multiple Genomic Sequences With the Threaded Blockset AlignerGenome Research, 2004
- Vertebrate gene predictions and the problem of large genesNature Reviews Genetics, 2003
- Human–Mouse Gene Identification by Comparative Evidence Integration and Evolutionary AnalysisGenome Research, 2003
- SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov ModelGenome Research, 2003
- Comparative Gene Prediction in Human and MouseGenome Research, 2003
- Leveraging the Mouse Genome for Gene Prediction in Human: From Whole-Genome Shotgun Reads to a Global Synteny MapGenome Research, 2003
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981