MetaGene: prokaryotic gene finding from environmental genome shotgun sequences
Top Cited Papers
Open Access
- 5 October 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 34 (19), 5623-5630
- https://doi.org/10.1093/nar/gkl723
Abstract
Exhaustive gene identification is a fundamental goal in all metagenomics projects. However, most metagenomic sequences are unassembled anonymous fragments, and conventional gene-finding methods cannot be applied. We have developed a prokaryotic gene-finding program, MetaGene, which utilizes di-codon frequencies estimated by the GC content of a given sequence with other various measures. MetaGene can predict a whole range of prokaryotic genes based on the anonymous genomic sequences of a few hundred bases, with a sensitivity of 95% and a specificity of 90% for artificial shotgun sequences (700 bp fragments from 12 species). MetaGene has two sets of codon frequency interpolations, one for bacteria and one for archaea, and automatically selects the proper set for a given sequence using the domain classification method we propose. The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences. Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes. MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.Keywords
This publication has 31 references indexed in Scilit:
- Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNAScience, 2006
- Genomic Sequencing of Pleistocene Cave BearsScience, 2005
- Comparative Metagenomics of Microbial CommunitiesScience, 2005
- Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial CommunitiesPLoS Computational Biology, 2005
- Reverse Methanogenesis: Testing the Hypothesis with Environmental GenomicsScience, 2004
- Environmental Genome Shotgun Sequencing of the Sargasso SeaScience, 2004
- Community structure and metabolism through reconstruction of microbial genomes from the environmentNature, 2004
- The Uncultured Microbial MajorityAnnual Review of Microbiology, 2003
- Exploring prokaryotic diversity in the genomic eraGenome Biology, 2002
- Recognition of protein coding regions in DNA sequencesNucleic Acids Research, 1982