phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

Top Cited Papers

Open Access

22 April 2013

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 8 (4), e61217
https://doi.org/10.1371/journal.pone.0061217

Abstract

The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

Keywords

This publication has 68 references indexed in Scilit:

Structure, function and diversity of the healthy human microbiome
Nature, 2012
OTUbase: an R infrastructure package for operational taxonomic unit data
Bioinformatics, 2011
phangorn: phylogenetic analysis in R
Bioinformatics, 2010
Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample
Proceedings of the National Academy of Sciences, 2010
Picante: R tools for integrating phylogenies and ecology
Bioinformatics, 2010
QIIME allows analysis of high-throughput community sequencing data
Nature Methods, 2010
PANGEA: pipeline for analysis of next generation amplicons
The ISME Journal, 2010
CD-HIT Suite: a web server for clustering and comparing biological sequences
Bioinformatics, 2010
Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data
The ISME Journal, 2009
Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex
Nature Methods, 2008

Cited by 14631 articles