Abstract
With the discovery of single nucleotide polymorphisms (SNP) along the genome, genotyping of large samples of biallelic multilocus genetic phenotypes for (fine) mapping of disease genes or for population studies has become standard practice. A genetic trait, however, is mainly caused by an underlying defective haplotype, and populations are best characterized by their haplotype frequencies. Therefore, it is essential to infer from the phase-unknown genetic phenotypes in a sample drawn from a population the haplotype frequencies in the population and the underlying haplotype pairs in the sample in order to find disease predisposing genes by some association or haplotype sharing algorithm. Haplotype frequencies and haplotype pairs are estimated via a maximum likelihood approach by a well-known expectation maximization (EM) algorithm, adapting it to a large number (up to 30) of biallelic loci (SNP), and including nuclear family information, if available, into the analysis. Parents are treated as an independent sample from the population. Their genotyped offspring reduces the number of potential haplotype pairs for both parents, resulting in a higher accuracy of the estimation, and may also reduce computation time. In a series of simulations our approach of including nuclear family information has been tested against both the EM algorithm without nuclear family information and an alternative approach using GENEHUNTER for the haplotyping of the families, using the locus-by-locus allele counts of the sample. Our new approach is more precise in haplotyping in cases of a high number of heterozygous loci, whereas for a moderate number of heterozygous positions in the sample all three different approaches gave the same perfect results. Hum Mutat 17:289–295, 2001.