Detecting polymorphic regions in Arabidopsis thaliana with resequencing microarrays
Open Access
- 6 March 2008
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 18 (6), 918-929
- https://doi.org/10.1101/gr.070169.107
Abstract
Whole-genome oligonucleotide resequencing arrays have allowed the comprehensive discovery of single nucleotide polymorphisms (SNPs) in eukaryotic genomes of moderate to large size. With this technology, the detection rate for isolated SNPs is typically high. However, it is greatly reduced when other polymorphisms are located near a SNP as multiple mismatches inhibit hybridization to arrayed oligonucleotides. Contiguous tracts of suppressed hybridization therefore typify polymorphic regions (PRs) such as clusters of SNPs or deletions. We developed a machine learning method, designated margin-based prediction of polymorphic regions (mPPR), to predict PRs from resequencing array data. Conceptually similar to hidden Markov models, the method is trained with discriminative learning techniques related to support vector machines, and accurately identifies even very short polymorphic tracts (Arabidopsis thaliana. Nonredundantly, 27% of the genome was included within the boundaries of PRs predicted at high specificity (≈97%). The resulting data set provides a fine-scale view of polymorphic sequences in A. thaliana; patterns of polymorphism not apparent in SNP data were readily detected, especially for noncoding regions. Our predictions provide a valuable resource for evolutionary genetic and functional studies in A. thaliana, and our method is applicable to similar data sets in other species. More broadly, our computational approach can be applied to other segmentation tasks related to the analysis of genomic variation.Keywords
This publication has 53 references indexed in Scilit:
- Recombination and linkage disequilibrium in Arabidopsis thalianaNature Genetics, 2007
- A sequence-based variation map of 8.27 million SNPs in inbred mouse strainsNature, 2007
- Genome-wide patterns of single-feature polymorphism in Arabidopsis thalianaProceedings of the National Academy of Sciences, 2007
- A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thalianaGenes & Development, 2006
- The plant immune systemNature, 2006
- An initial map of insertion and deletion (INDEL) variation in the human genomeGenome Research, 2006
- Transcriptional and posttranscriptional regulation of transcription factor expression in Arabidopsis rootsProceedings of the National Academy of Sciences, 2006
- A haplotype map of the human genomeNature, 2005
- Advanced sequencing technologies: methods and goalsNature Reviews Genetics, 2004
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997