Abstract
Drug resistance is a major problem in the treatment of AIDS, due to the very high mutation rate of human immunodeficiency virus (HIV) and subsequent rapid development of resistance to new drugs. Identification of mutations associated with drug resistance is critical for both individualized treatment selection and new drug design. We have performed an automated mutation analysis of HIV Type 1 (HIV-1) protease and reverse transcriptase (RT) from approximately 40,000 AIDS patient plasma samples sequenced by Specialty Laboratories Inc. from 1999 to mid-2002. This data set provides a nearly complete mutagenesis of HIV protease and enables the calculation of statistically significant K a / K s values for each individual amino acid mutation in protease and RT. Positive selection (i.e., a K a / K s ratio of> 1, indicating increased reproductive fitness) detected 19 of 23 known drug-resistant mutation positions in protease and 20 of 34 such positions in RT. We also discovered 163 new amino acid mutations in HIV protease and RT that are strong candidates for drug resistance or fitness. Our results match available independent data on protease mutations associated with specific drug treatments and mutations with positive reproductive fitness, with high statistical significance (the P values for the observed matches to occur by random chance are 10 −5.2 and 10 −16.6 , respectively). Our mutation analysis provides a valuable resource for AIDS research and will be available to academic researchers upon publication at http://www.bioinformatics.ucla.edu/HIV . Our data indicate that positive selection mapping is an analysis that can yield powerful insights from high-throughput sequencing of rapidly mutating pathogens.