Efficient selection of tagging single-nucleotide polymorphisms in multiple populations
- 6 May 2006
- journal article
- Published by Springer Nature in Human Genetics
- Vol. 120 (1), 58-68
- https://doi.org/10.1007/s00439-006-0182-5
Abstract
Common genetic polymorphism may explain a portion of the heritable risk for common diseases, so considerable effort has been devoted to finding and typing common single-nucleotide polymorphisms (SNPs) in the human genome. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), suggesting that only a subset of all SNPs (known as tagging SNPs, or tagSNPs) need to be genotyped for disease association studies. Based on the genetic differences that exist among human populations, most tagSNP sets are defined in a single population and applied only in populations that are closely related. To improve the efficiency of multi-population analyses, we have developed an algorithm called MultiPop-TagSelect that finds a near-minimal union of population-specific tagSNP sets across an arbitrary number of populations. We present this approach as an extension of LD-select, a tagSNP selection method that uses a greedy algorithm to group SNPs into bins based on their pairwise association patterns, although the MultiPop-TagSelect algorithm could be used with any SNP tagging approach that allows choices between nearly equivalent SNPs. We evaluate the algorithm by considering tagSNP selection in candidate-gene resequencing data and lower density whole-chromosome data. Our analysis reveals that an exhaustive search is often intractable, while the developed algorithm can quickly and reliably find near-optimal solutions even for difficult tagSNP selection problems. Using populations of African, Asian, and European ancestry, we also show that an optimal multi-population set of tagSNPs can be substantially smaller (up to 44%) than a typical set obtained through independent or sequential selection.Keywords
This publication has 61 references indexed in Scilit:
- Tag SNP selection for Finnish individuals based on the CEPH Utah HapMap databaseGenetic Epidemiology, 2005
- Efficiency and power in genetic association studiesNature Genetics, 2005
- Haplotype Diversity in 11 Candidate Genes Across Four PopulationsGenetics, 2005
- Complement Factor H Polymorphism and Age-Related Macular DegenerationScience, 2005
- Linkage Disequilibrium Patterns and tagSNP Transferability among European PopulationsAmerican Journal of Human Genetics, 2005
- CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPsBioinformatics, 2004
- Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex diseaseNature Genetics, 2003
- Haplotype Variation and Linkage Disequilibrium in 313 Human GenesScience, 2001
- Linkage Disequilibrium in Humans: Models and DataAmerican Journal of Human Genetics, 2001
- Variations on a Theme: Cataloging Human DNA Sequence VariationScience, 1997