A species-generalized probabilistic model-based definition of CpG islands
- 24 September 2009
- journal article
- research article
- Published by Springer Nature in Mammalian Genome
- Vol. 20 (9-10), 674-680
- https://doi.org/10.1007/s00335-009-9222-5
Abstract
The DNA of most vertebrates is depleted in CpG dinucleotides, the target for DNA methylation. The remaining CpGs tend to cluster in regions referred to as CpG islands (CGI). CGI have been useful as marking functionally relevant epigenetic loci for genome studies. For example, CGI are enriched in the promoters of vertebrate genes and thought to play an important role in regulation. Currently, CGI are defined algorithmically as an observed-to-expected ratio (O/E) of CpG greater than 0.6, G+C content greater than 0.5, and usually but not necessarily greater than a certain length. Here we find that the current definition leaves out important CpG clusters associated with epigenetic marks, relevant to development and disease, and does not apply at all to nonvertabrate genomes. We propose an alternative Hidden Markov model-based approach that solves these problems. We fit our model to genomes from 30 species, and the results support a new epigenomic view toward the development of DNA methylation in species diversity and evolution. The O/E of CpG in islands and nonislands segregated closely phylogenetically and showed substantial loss in both groups in animals of greater complexity, while maintaining a nearly constant difference in CpG O/E between islands and nonisland compartments. Lists of CGI for some species are available at http://www.rafalab.org.This publication has 17 references indexed in Scilit:
- DNA methylation is widespread and associated with differential gene expression in castes of the honeybee, Apis melliferaProceedings of the National Academy of Sciences, 2009
- The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shoresNature Genetics, 2009
- DNA methylation profile of tissue-dependent and differentially methylated regions (T-DMRs) in mouse promoter regions demonstrating tissue-specific gene expressionGenome Research, 2008
- Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterningNature, 2008
- CG dinucleotide clustering is a species-specific property of the genomeNucleic Acids Research, 2007
- Phenotypic plasticity and the epigenetics of human diseaseNature, 2007
- The Human Genome Browser at UCSCGenome Research, 2002
- Comprehensive analysis of CpG islands in human chromosomes 21 and 22Proceedings of the National Academy of Sciences, 2002
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- CpG Islands in vertebrate genomesJournal of Molecular Biology, 1987