Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms

1 June 2001

journal article
Published by Mary Ann Liebert Inc in Journal of Computational Biology

Vol. 8 (3), 305-323
https://doi.org/10.1089/10665270152530863

Abstract

The next phase of human genomics will involve large-scale screens of populations for significant DNA polymorphisms, notably single nucleotide polymorphisms (SNPs). Dense human SNP maps are currently under construction. However, the utility of those maps and screens will be limited by the fact that humans are diploid and it is presently difficult to get separate data on the two "copies." Hence, genotype (blended) SNP data will be collected, and the desired haplotype (partitioned) data must then be (partially) inferred. A particular nondeterministic inference algorithm was proposed and studied by Clark (1990) and extensively used by Clark et al. (1998). In this paper, we more closely examine that inference method and the question of whether we can obtain an efficient, deterministic variant to optimize the obtained inferences. We show that the problem is NP-hard and, in fact, Max-SNP complete; that the reduction creates problem instances conforming to a severe restriction believed to hold in real data (Clark, 1990); and that even if we first use a natural exponential-time operation, the remaining optimization problem is NP-hard. However, we also develop, implement, and test an approach based on that operation and (integer) linear programming. The approach works quickly and correctly on simulated data.

Keywords

This publication has 10 references indexed in Scilit:

A New Statistical Method for Haplotype Reconstruction from Population Data
American Journal of Human Genetics, 2001
Apolipoprotein E Variation at the Sequence Haplotype Level: Implications for the Origin and Maintenance of a Major Human Polymorphism
American Journal of Human Genetics, 2000
A Good SNP May Be Hard to Find
Science, 1999
Drug Firms to Create Public Database of Genetic Mutations
Science, 1999
Linkage disequilibrium mapping of complex disease: fantasy or reality?
Current Opinion in Biotechnology, 1998
Haplotype Structure and Population Genetic Inferences from Nucleotide-Sequence Variation in Human Lipoprotein Lipase
American Journal of Human Genetics, 1998
It's raining SNPs, hallelujah?
Nature Genetics, 1998
DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene
Nature Genetics, 1998
The Future of Genetic Studies of Complex Human Diseases
Science, 1996
On Two Problems in the Generation of Program Test Paths
IEEE Transactions on Software Engineering, 1976

Cited by 125 articles