Using native and syntenically mapped cDNA alignments to improve de novo gene finding
Top Cited Papers
Open Access
- 24 January 2008
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (5), 637-644
- https://doi.org/10.1093/bioinformatics/btn013
Abstract
Computational annotation of protein coding genes in genomic DNA is a widely used and essential tool for analyzing newly sequenced genomes. However, current methods suffer from inaccuracy and do poorly with certain types of genes. Including additional sources of evidence of the existence and structure of genes can improve the quality of gene predictions. For many eukaryotic genomes, expressed sequence tags (ESTs) are available as evidence for genes. Related genomes that have been sequenced, annotated, and aligned to the target genome provide evidence of existence and structure of genes. We incorporate several different evidence sources into the gene finder AUGUSTUS. The sources of evidence are gene and transcript annotations from related species syntenically mapped to the target genome using TransMap, evolutionary conservation of DNA, mRNA and ESTs of the target species, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs we were able to correctly predict at least one splice form exactly correct in 57% of human genes. Also using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well-suited to annotate genomes closely related to genomes that are already annotated or for which extensive transcript evidence is available. Native cDNA evidence is most helpful when the alignments are used as compound information rather than independent positionwise information. AUGUSTUS is open source and available at http://augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://genome.ucsc.edu).Keywords
This publication has 30 references indexed in Scilit:
- GENCODE: producing a reference annotation for ENCODEGenome Biology, 2006
- Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNAGenome Biology, 2006
- EGASP: the human ENCODE Genome Annotation Assessment ProjectGenome Biology, 2006
- ExonHunter: a comprehensive approach to gene findingBioinformatics, 2005
- Integrating alternative splicing detection into gene predictionBMC Bioinformatics, 2005
- Gene and alternative splicing annotation with AIRGenome Research, 2005
- Using Multiple Alignments to Improve Gene PredictionLecture Notes in Computer Science, 2005
- The Ensembl Automatic Gene Annotation SystemGenome Research, 2004
- Improving the Arabidopsis genome annotation using maximal transcript alignment assembliesNucleic Acids Research, 2003
- HMM sampling and applications to gene finding and alternative splicingBioinformatics, 2003