Abstract
Present-day associations between haplotypes at a candidate locus and phenotypes exist when phenotypically important mutations occurred at some point during the evolution of the current array of genetic variation. A cladistic statistical design can be defined that focuses power by using the evolutionary history of the candidate DNA region. This paper shows how cladistic methodology is used for the analysis of case/control data, a common sampling design in genetic/disease association studies. A worked example is presented of the associations for sporadic early and late-onset forms of Alzheimer's disease with the 19q13.2 chromosomal region that includes the loci for apoproteins E, CI, and CII. This analysis confirms earlier reports of a strong association of the ApoE epsilon 4 allele with Alzheimer's disease but indicates that it is premature to consider this association causal, particularly for early onset cases. Associations were also found with the epsilon 2 allele, as previously reported, and with the 1 allele at the ApoCI locus. However, this analysis indicates that it is inappropriate both statistically and medically to use single markers as risk predictors when haplotype data are available, even when the mutation leading to the marker is identified as having a strong phenotypic association.