Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

22 August 2011

journal article
research article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences

Vol. 108 (35), 14637-14642
https://doi.org/10.1073/pnas.1111435108

Abstract

High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples.

Keywords

This publication has 35 references indexed in Scilit:

Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample
Proceedings of the National Academy of Sciences, 2010
FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix
Molecular Biology and Evolution, 2009
Infernal 1.0: inference of RNA alignments
Bioinformatics, 2009
Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases
Proceedings of the National Academy of Sciences, 2009
A renaissance for the pioneering 16S rRNA gene
Current Opinion in Microbiology, 2008
SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB
Nucleic Acids Research, 2007
Short pyrosequencing reads suffice for accurate microbial community analysis
Nucleic Acids Research, 2007
Pyrosequencing enumerates and contrasts soil microbial diversity
The ISME Journal, 2007
The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data
Nucleic Acids Research, 2006
Microbial diversity in the deep sea and the underexplored “rare biosphere”
Proceedings of the National Academy of Sciences, 2006

Cited by 83 articles