Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit
- 1 April 2021
- journal article
- review article
- Published by Springer Nature in Nature Protocols
- Vol. 16 (4), 1785-1801
- https://doi.org/10.1038/s41596-020-00480-3
Abstract
Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research. This tutorial explains how to evaluate and benchmark metagenome assembly, binning and profiling methods using standards and software provided by the CAMI initiative.Keywords
All Related Versions
Funding Information
- Saint Petersburg State University (PURE 51555639)
- Australian Research Council’s Discovery Projects funding scheme
This publication has 91 references indexed in Scilit:
- The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-omeGigaScience, 2012
- An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaeaThe ISME Journal, 2011
- Use of simulated data sets to evaluate the fidelity of metagenomic processing methodsNature Methods, 2007
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- Toward Automatic Reconstruction of a Highly Resolved Tree of LifeScience, 2006
- Towards a Genome-Based Taxonomy for ProkaryotesJournal of Bacteriology, 2005
- Environmental Genome Shotgun Sequencing of the Sargasso SeaScience, 2004
- Transplacental passage of IgG antibody to group B streptococcus serotype IaThe Journal of Pediatrics, 1984
- Age of mothers with breast cancer and sex of their children.BMJ, 1981
- The role of protein factors in the biosynthesis of proteinsCell, 1974