Short pyrosequencing reads suffice for accurate microbial community analysis

Top Cited Papers
Open Access
Abstract
Pyrosequencing technology allows us to characterize microbial communities using 16S ribosomal RNA (rRNA) sequences orders of magnitude faster and more cheaply than has previously been possible. However, results from different studies using pyrosequencing and traditional sequencing are often difficult to compare, because amplicons covering different regions of the rRNA might yield different conclusions. We used sequences from over 200 globally dispersed environments to test whether studies that used similar primers clustered together mistakenly, without regard to environment. We then tested whether primer choice affects sequence-based community analyses using UniFrac, our recently-developed method for comparing microbial communities. We performed three tests of primer effects. We tested whether different simulated amplicons generated the same UniFrac clustering results as near-full-length sequences for three recent large-scale studies of microbial communities in the mouse and human gut, and the Guerrero Negro microbial mat. We then repeated this analysis for short sequences (100-, 150-, 200- and 250-base reads) resembling those produced by pyrosequencing. The results show that sequencing effort is best focused on gathering more short sequences rather than fewer longer ones, provided that the primers are chosen wisely, and that community comparison methods such as UniFrac are surprisingly robust to variation in the region sequenced.