EFFICIENT RECONSTRUCTION OF HAPLOTYPE STRUCTURE VIA PERFECT PHYLOGENY
- 1 April 2003
- journal article
- research article
- Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology
- Vol. 1 (1), 1-20
- https://doi.org/10.1142/s0219720003000174
Abstract
Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person's genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site, and no recombination occurred at the given region. We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results. Our webserver (http://www.cs.columbia.edu/compbio/hap/) is publicly available for predicting haplotypes from genotype data and partitioning genotype data into blocks.Keywords
This publication has 11 references indexed in Scilit:
- Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21Science, 2001
- High-resolution haplotype structure in the human genomeNature Genetics, 2001
- Inference of Haplotypes from Samples of Diploid Populations: Complexity and AlgorithmsJournal of Computational Biology, 2001
- Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation-Maximization Algorithm for Unphased Diploid Genotype DataAmerican Journal of Human Genetics, 2000
- Approximate Max-Flow Min-(Multi)Cut Theorems and Their ApplicationsSIAM Journal on Computing, 1996
- HAPLO: A Program Using the EM Algorithm to Estimate the Frequencies of Multi-site HaplotypesJournal of Heredity, 1995
- Efficient algorithms for inferring evolutionary treesNetworks, 1991
- An Almost Linear-Time Algorithm for Graph RealizationMathematics of Operations Research, 1988
- A linear-time algorithm for testing the truth of certain quantified boolean formulasInformation Processing Letters, 1979
- On the Complexity of Timetable and Multicommodity Flow ProblemsSIAM Journal on Computing, 1976