PANGEA: pipeline for analysis of next generation amplicons
Open Access
- 25 February 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in The ISME Journal
- Vol. 4 (7), 852-861
- https://doi.org/10.1038/ismej.2010.16
Abstract
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’.Keywords
This publication has 34 references indexed in Scilit:
- Culture-independent identification of gut bacteria correlated with the onset of diabetes in a rat modelThe ISME Journal, 2009
- A comprehensive survey of soil acidobacterial diversity using pyrosequencing and clone library analysesThe ISME Journal, 2009
- The influence of sex, handedness, and washing on the diversity of hand surface bacteriaProceedings of the National Academy of Sciences, 2008
- Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplexNature Methods, 2008
- A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexingNucleic Acids Research, 2007
- Pyrosequencing enumerates and contrasts soil microbial diversityThe ISME Journal, 2007
- Recharacterization of ancient DNA miscoding lesions: insights in the era of sequencing-by-synthesisNucleic Acids Research, 2006
- Microbial diversity in the deep sea and the underexplored “rare biosphere”Proceedings of the National Academy of Sciences, 2006
- Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencingNature Medicine, 2006
- A Greedy Algorithm for Aligning DNA SequencesJournal of Computational Biology, 2000