Single-nucleotide polymorphisms in genes relating to homocysteine metabolism: how applicable are public SNP databases to a typical European population?

Abstract
To facilitate the association studies in complex diseases characterized by hyperhomocysteinemia, we collected structural and frequency data on single-nucleotide polymorphism (SNPs) in 24 genes relating to homocysteine metabolism. Firstly, we scanned approximately 1.2 Mbp of sequence in the NCBI SNP database (dbSNP) build 110 and we detected 1353 putative SNPs with an average in silico genic density of 1:683. Out of 112 putative SNPs in coding regions (cSNPs), we selected a subset of 42 cSNPs and we assessed the applicability of the NCBI dbSNP to the Czech population - a typical representative of European Caucasians - by determining the frequency of the putative cSNPs experimentally by PCR-RFLP or ARMS-PCR in at least 110 control Czech chromosomes. As only 25 of the 42 analyzed cSNPs met the criterion of >/=1% frequency, the positive predictive value of the NCBI data set for our population reached 60%, which is similar to other studies. The correlation of SNP frequency between Czechs and other Caucasians - obtained from NCBI and/or literature - was stronger (r(2)=0.90 for 20 cSNPs) than between Czechs and general NCBI database entries (r(2)=0.73 for 27 cSNPs). Moreover, frequencies of all 20 putative cSNPs, for which data in Caucasians were available, were congruently below or above the 1% frequency criterion both in Czechs and in other Caucasians. In summary, our study shows that the NCBI dbSNP is a useful tool for selecting cSNPs for genetic studies of hyperhomocysteinemia in European populations, although experimental validation of SNPs should be performed, especially if the cSNP entry lacks any frequency data in Caucasians.