Algorithms for inferring haplotypes

14 September 2004

journal article
review article
Published by Wiley in Genetic Epidemiology

Vol. 27 (4), 334-347
https://doi.org/10.1002/gepi.20024

Abstract

Haplotype phase information in diploid organisms provides valuable information on human evolutionary history and may lead to the development of more efficient strategies to identify genetic variants that increase susceptibility to human diseases. Molecular haplotyping methods are labor-intensive, low-throughput, and very costly. Therefore, algorithms based on formal statistical theories were shown to be very effective and cost-efficient for haplotype reconstruction. This review covers 1) population-based haplotype inference methods: Clark's algorithm, expectation-maximization (EM) algorithm, coalescence-based algorithms (pseudo-Gibbs sampler and perfect/imperfect phylogeny), and partition-ligation algorithm implemented by a fully Bayesian model (Haplotyper) or by EM (PLEM); 2) family-based haplotype inference methods; 3) the handling of genotype scoring uncertainties (i.e., genotyping errors and raw two-dimensional genotype scatterplots) in inferring haplotypes; and 4) haplotype inference methods for pooled DNA samples. The advantages and limitations of each algorithm are discussed. By using simulations based on empirical data on the G6PD gene and TNFRSF5 gene, I demonstrate that different algorithms have different degrees of sensitivity to various extents of population diversities and genotyping error rates. Future development of statistical algorithms for addressing haplotype reconstruction will resort more and more to ideas based on combinatorial mathematics, graphical models, and machine learning, and they will have profound impacts on population genetics and genetic epidemiology with the advent of the human HapMap.

Keywords

This publication has 74 references indexed in Scilit:

Genetic variants in a haplotype block spanningIDE are significantly associated with plasma A?42 levels and risk for Alzheimer disease
Human Mutation, 2004
Common variants of ACE contribute to variable age-at-onset of Alzheimer’s disease
Human Genetics, 2004
The International HapMap Project
Nature, 2003
On the use of DNA pooling to estimate haplotype frequencies
Genetic Epidemiology, 2002
DNA Pooling: a tool for large-scale association studies
Nature Reviews Genetics, 2002
Detecting recent positive selection in the human genome from haplotype structure
Nature, 2002
Effectiveness of computational methods in haplotype prediction
Human Genetics, 2001
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms
Nature, 2001
Parameter Expansion for Data Augmentation
Journal of the American Statistical Association, 1999
The Effects of Genoiyping Errors and Interference on Estimation of Genetic Distance
Human Heredity, 1997

Cited by 145 articles