Bayesian approach to discovering pathogenic SNPs in conserved protein domains

Abstract
The success rate of association studies can be improved by selecting better genetic markers for genotyping or by providing better leads for identifying pathogenic single nucleotide polymorphisms (SNPs) in the regions of linkage disequilibrium with positive disease associations. We have developed a novel algorithm to predict pathogenic single amino acid changes, either nonsynonymous SNPs (nsSNPs) or missense mutations, in conserved protein domains. Using a Bayesian framework, we found that the probability of a microbial missense mutation causing a significant change in phenotype depended on how much difference it made in several phylogenetic, biochemical, and structural features related to the single amino acid substitution. We tested our model on pathogenic allelic variants (missense mutations or nsSNPs) included in OMIM, and on the other nsSNPs in the same genes (from dbSNP) as the nonpathogenic variants. As a result, our model predicted pathogenic variants with a 10% false‐positive rate. The high specificity of our prediction algorithm should make it valuable in genetic association studies aimed at identifying pathogenic SNPs. Hum Mutat 24:178–184, 2004.