A Bioinformatician's Guide to Metagenomics

1 December 2008

journal article
review article
Published by American Society for Microbiology in Microbiology and Molecular Biology Reviews

Vol. 72 (4), 557-578
https://doi.org/10.1128/mmbr.00009-08

Abstract

SUMMARY As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for the best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe the chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step. We guide the reader through a standard workflow for a metagenomic project beginning with presequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic data sets in contrast to genome projects. Different types of data analyses particular to metagenomes are then presented, including binning, dominant population analysis, and gene-centric analysis. Finally, data management issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

Keywords

This publication has 156 references indexed in Scilit:

The minimum information about a genome sequence (MIGS) specification
Nature Biotechnology, 2008
Millimeter‐scale genetic gradients and community‐level molecular convergence in a hypersaline microbial mat
Molecular Systems Biology, 2008
Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite
Nature, 2007
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
Nature Methods, 2007
Genome dynamics in a natural archaeal population
Proceedings of the National Academy of Sciences, 2007
An obesity-associated gut microbiome with increased capacity for energy harvest
Nature, 2006
Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum
Proceedings of the National Academy of Sciences, 2006
Microbial diversity in the deep sea and the underexplored “rare biosphere”
Proceedings of the National Academy of Sciences, 2006
A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes
Proceedings of the National Academy of Sciences, 2006
Community structure and metabolism through reconstruction of microbial genomes from the environment
Nature, 2004

Cited by 369 articles