Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering
Open Access
- 10 April 2008
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 9 (1), 182
- https://doi.org/10.1186/1471-2105-9-182
Abstract
The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools.Keywords
This publication has 45 references indexed in Scilit:
- Quantitative assessment of protein function prediction from metagenomics shotgun sequencesProceedings of the National Academy of Sciences, 2007
- Use of simulated data sets to evaluate the fidelity of metagenomic processing methodsNature Methods, 2007
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2006
- MetaGene: prokaryotic gene finding from environmental genome shotgun sequencesNucleic Acids Research, 2006
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- Community structure and metabolism through reconstruction of microbial genomes from the environmentNature, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- Unravelling the ORFan puzzleComparative and Functional Genomics, 2003
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresStructure, 1997