Viral Population Estimation Using Pyrosequencing
Open Access
- 9 May 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 4 (5), e1000074
- https://doi.org/10.1371/journal.pcbi.1000074
Abstract
The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate-based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug-resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an expectation–maximization (EM) algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies. The genetic diversity of viral populations is important for biomedical problems such as disease progression, vaccine design, and drug resistance, yet it is not generally well understood. In this paper, we use pyrosequencing, a novel DNA sequencing technique, to reconstruct viral populations. Pyrosequencing produces DNA sequences, called reads, in numbers much greater than standard DNA sequencing techniques. However, these reads are substantially shorter and more error-prone than those obtained from standard sequencing techniques. Therefore, pyrosequencing data requires new methods of analysis. Here, we develop mathematical and statistical tools for reconstructing viral populations using pyrosequencing. To this end, we show how to correct errors in the reads and assemble them into the different viral strains present in the population. We apply these methods to HIV-1 populations from drug-resistant patients and show that our techniques produce results quite close to accepted techniques at a lower cost and potentially higher resolution.Keywords
All Related Versions
This publication has 41 references indexed in Scilit:
- Accuracy and quality of massively parallel DNA pyrosequencingGenome Biology, 2007
- Characterization of mutation spectra with ultra-deep pyrosequencing: Application to HIV-1 drug resistanceGenome Research, 2007
- HIV-1 Subtype B Protease and Reverse Transcriptase Amino Acid CovariationPLoS Computational Biology, 2007
- Microbial diversity in the deep sea and the underexplored “rare biosphere”Proceedings of the National Academy of Sciences, 2006
- HAPLOFREQ—Estimating Haplotype Frequencies EfficientlyJournal of Computational Biology, 2006
- A haplotype map of the human genomeNature, 2005
- Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial CommunitiesPLoS Computational Biology, 2005
- Population Genetic Analysis of the Protease Locus of Human Immunodeficiency Virus Type 1 Quasispecies Undergoing Drug Selection, Using a Denaturing Gradient-Heteroduplex Tracking AssayJournal of Virology, 2001
- Transition between Stochastic Evolution and Deterministic Evolution in the Presence of Selection: General Theory and Application to VirologyMicrobiology and Molecular Biology Reviews, 2001
- Antigenic Diversity Thresholds and the Development of AIDSScience, 1991