EFFICIENT RECONSTRUCTION OF HAPLOTYPE STRUCTURE VIA PERFECT PHYLOGENY

1 April 2003

journal article
research article
Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology

Vol. 1 (1), 1-20
https://doi.org/10.1142/s0219720003000174

Abstract

Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person's genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site, and no recombination occurred at the given region. We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results. Our webserver (http://www.cs.columbia.edu/compbio/hap/) is publicly available for predicting haplotypes from genotype data and partitioning genotype data into blocks.

Keywords

This publication has 11 references indexed in Scilit:

Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21
Science, 2001
High-resolution haplotype structure in the human genome
Nature Genetics, 2001
Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms
Journal of Computational Biology, 2001
Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation-Maximization Algorithm for Unphased Diploid Genotype Data
American Journal of Human Genetics, 2000
Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications
SIAM Journal on Computing, 1996
HAPLO: A Program Using the EM Algorithm to Estimate the Frequencies of Multi-site Haplotypes
Journal of Heredity, 1995
Efficient algorithms for inferring evolutionary trees
Networks, 1991
An Almost Linear-Time Algorithm for Graph Realization
Mathematics of Operations Research, 1988
A linear-time algorithm for testing the truth of certain quantified boolean formulas
Information Processing Letters, 1979
On the Complexity of Timetable and Multicommodity Flow Problems
SIAM Journal on Computing, 1976

Cited by 85 articles