Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering
- 22 August 2011
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 108 (35), 14637-14642
- https://doi.org/10.1073/pnas.1111435108
Abstract
High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples.Keywords
This publication has 35 references indexed in Scilit:
- Global patterns of 16S rRNA diversity at a depth of millions of sequences per sampleProceedings of the National Academy of Sciences, 2010
- FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance MatrixMolecular Biology and Evolution, 2009
- Infernal 1.0: inference of RNA alignmentsBioinformatics, 2009
- Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolasesProceedings of the National Academy of Sciences, 2009
- A renaissance for the pioneering 16S rRNA geneCurrent Opinion in Microbiology, 2008
- SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARBNucleic Acids Research, 2007
- Short pyrosequencing reads suffice for accurate microbial community analysisNucleic Acids Research, 2007
- Pyrosequencing enumerates and contrasts soil microbial diversityThe ISME Journal, 2007
- The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public dataNucleic Acids Research, 2006
- Microbial diversity in the deep sea and the underexplored “rare biosphere”Proceedings of the National Academy of Sciences, 2006