Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows

Open Access

1 July 2007

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 23 (13), i401-i407
https://doi.org/10.1093/bioinformatics/btm220

Abstract

Motivation: Typical high-throughput genotyping techniques produce numerous missing calls that confound subsequent analyses, such as disease association studies. Common remedies for this problem include removing affected markers and/or samples or, otherwise, imputing the missing data. On small marker sets imputation is frequently based on a vote of the K-nearest-neighbor (KNN) haplotypes, but this technique is neither practical nor justifiable for large datasets. Results: We describe a data structure that supports efficient KNN queries over arbitrarily sized, sliding haplotype windows, and evaluate its use for genotype imputation. The performance of our method enables exhaustive exploration over all window sizes and known sites in large (150K, 8.3M) SNP panels. We also compare the accuracy and performance of our methods with competing imputation approaches. Availability: A free open source software package, NPUTE, is available at http://compgen.unc.edu/software, for non-commercial uses. Contact:mcmillan@cs.unc.edu

Keywords

This publication has 15 references indexed in Scilit:

Imputation methods to improve inference in SNP association studies
Genetic Epidemiology, 2006
A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase
American Journal of Human Genetics, 2006
A Comparison of Phasing Algorithms for Trios and Unrelated Individuals
American Journal of Human Genetics, 2006
SNiPer: Improved SNP genotype calling for Affymetrix 10K GeneChip microarray data
BMC Genomics, 2005
EFFICIENT RECONSTRUCTION OF HAPLOTYPE STRUCTURE VIA PERFECT PHYLOGENY
Journal of Bioinformatics and Computational Biology, 2003
Partition-Ligation–Expectation-Maximization Algorithm for Haplotype Inference with Single-Nucleotide Polymorphisms
American Journal of Human Genetics, 2002
Haplotype Inference in Random Population Samples
American Journal of Human Genetics, 2002
Bayesian Haplotype Inference for Multiple Linked Single-Nucleotide Polymorphisms
American Journal of Human Genetics, 2002
A New Statistical Method for Haplotype Reconstruction from Population Data
American Journal of Human Genetics, 2001
Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys
Journal of the American Statistical Association, 1977

Cited by 79 articles