Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows
Open Access
- 1 July 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (13), i401-i407
- https://doi.org/10.1093/bioinformatics/btm220
Abstract
Motivation: Typical high-throughput genotyping techniques produce numerous missing calls that confound subsequent analyses, such as disease association studies. Common remedies for this problem include removing affected markers and/or samples or, otherwise, imputing the missing data. On small marker sets imputation is frequently based on a vote of the K-nearest-neighbor (KNN) haplotypes, but this technique is neither practical nor justifiable for large datasets. Results: We describe a data structure that supports efficient KNN queries over arbitrarily sized, sliding haplotype windows, and evaluate its use for genotype imputation. The performance of our method enables exhaustive exploration over all window sizes and known sites in large (150K, 8.3M) SNP panels. We also compare the accuracy and performance of our methods with competing imputation approaches. Availability: A free open source software package, NPUTE, is available at http://compgen.unc.edu/software, for non-commercial uses. Contact:mcmillan@cs.unc.eduKeywords
This publication has 15 references indexed in Scilit:
- Imputation methods to improve inference in SNP association studiesGenetic Epidemiology, 2006
- A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic PhaseAmerican Journal of Human Genetics, 2006
- A Comparison of Phasing Algorithms for Trios and Unrelated IndividualsAmerican Journal of Human Genetics, 2006
- SNiPer: Improved SNP genotype calling for Affymetrix 10K GeneChip microarray dataBMC Genomics, 2005
- EFFICIENT RECONSTRUCTION OF HAPLOTYPE STRUCTURE VIA PERFECT PHYLOGENYJournal of Bioinformatics and Computational Biology, 2003
- Partition-Ligation–Expectation-Maximization Algorithm for Haplotype Inference with Single-Nucleotide PolymorphismsAmerican Journal of Human Genetics, 2002
- Haplotype Inference in Random Population SamplesAmerican Journal of Human Genetics, 2002
- Bayesian Haplotype Inference for Multiple Linked Single-Nucleotide PolymorphismsAmerican Journal of Human Genetics, 2002
- A New Statistical Method for Haplotype Reconstruction from Population DataAmerican Journal of Human Genetics, 2001
- Formalizing Subjective Notions about the Effect of Nonrespondents in Sample SurveysJournal of the American Statistical Association, 1977