A statistical model for HIV-1 sequence classification using the subtype analyser (STAR)

Abstract
Motivation: HIV-1 antiretroviral drug resistance testing produces large amounts of HIV-1 protease and reverse transcriptase sequences. These provide an excellent resource to study the incidence, spread and clinical significance of HIV-1 subtypes. We have produced a program, Subtype Analyser (STAR) that rapidly and accurately subtypes HIV-1. Here we have determined a robust and statistically validated model for subtype assignment. Results: We have significantly extended our HIV-1 subtyping tool (STAR), such that each query sequence when evaluated against subtype profile alignments, returns a discriminating score based on the ratio of subtype positive to negative amino acid positions. These scores were transformed into a Z-score distribution and evaluated. Of the 141 sequences used to define the subtype alignments, 98% were correctly reclassified. Inclusion of additional recombination detection within STAR increased the detection of known recombinant sequences to 95%. Availability: STAR is available as compiled (Linux Fedora 3) or source code from http://pgv19.virol.ucl.ac.uk/download/star_linux.tar Contact:p.kellam@ucl.ac.uk Supplementay Information:http://pgv19.virol.ucl.ac.uk/download/star_supplement