Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms
- 1 June 2001
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 8 (3), 305-323
- https://doi.org/10.1089/10665270152530863
Abstract
The next phase of human genomics will involve large-scale screens of populations for significant DNA polymorphisms, notably single nucleotide polymorphisms (SNPs). Dense human SNP maps are currently under construction. However, the utility of those maps and screens will be limited by the fact that humans are diploid and it is presently difficult to get separate data on the two "copies." Hence, genotype (blended) SNP data will be collected, and the desired haplotype (partitioned) data must then be (partially) inferred. A particular nondeterministic inference algorithm was proposed and studied by Clark (1990) and extensively used by Clark et al. (1998). In this paper, we more closely examine that inference method and the question of whether we can obtain an efficient, deterministic variant to optimize the obtained inferences. We show that the problem is NP-hard and, in fact, Max-SNP complete; that the reduction creates problem instances conforming to a severe restriction believed to hold in real data (Clark, 1990); and that even if we first use a natural exponential-time operation, the remaining optimization problem is NP-hard. However, we also develop, implement, and test an approach based on that operation and (integer) linear programming. The approach works quickly and correctly on simulated data.Keywords
This publication has 10 references indexed in Scilit:
- A New Statistical Method for Haplotype Reconstruction from Population DataAmerican Journal of Human Genetics, 2001
- Apolipoprotein E Variation at the Sequence Haplotype Level: Implications for the Origin and Maintenance of a Major Human PolymorphismAmerican Journal of Human Genetics, 2000
- A Good SNP May Be Hard to FindScience, 1999
- Drug Firms to Create Public Database of Genetic MutationsScience, 1999
- Linkage disequilibrium mapping of complex disease: fantasy or reality?Current Opinion in Biotechnology, 1998
- Haplotype Structure and Population Genetic Inferences from Nucleotide-Sequence Variation in Human Lipoprotein LipaseAmerican Journal of Human Genetics, 1998
- It's raining SNPs, hallelujah?Nature Genetics, 1998
- DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase geneNature Genetics, 1998
- The Future of Genetic Studies of Complex Human DiseasesScience, 1996
- On Two Problems in the Generation of Program Test PathsIEEE Transactions on Software Engineering, 1976