Over- and under-representation of short oligonucleotides in DNA sequences.
- 15 February 1992
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 89 (4), 1358-1362
- https://doi.org/10.1073/pnas.89.4.1358
Abstract
Strand-symmetric relative abundance functionals for di-, tri-, and tetranucleotides are introduced and applied to sequences encompassing a broad phylogenetic range to discern tendencies and anomalies in the occurrences of these short oligonucleotides within and between genomic sequences. For dinucleotides, TA is almost universally under-represented, with the exception of vertebrate mitochondrial genomes, and CG is strongly under-represented in vertebrates and in mitochondrial genomes. The traditional methylation/deamination/mutation hypothesis for the rarity of CG does not adequately account for the observed deficiencies in certain sequences, notably the mitochondrial genomes, yeast, and Neurospora crassa, which lack the standard CpG methylase. Homodinucleotides (AA.TT, CC.GG) and larger homooligonucleotides are over-represented in many organisms, perhaps due to polymerase slippage events. For trinucleotides, GCA.TGC tends to be under-represented in phage, human viral, and eukaryotic sequences, and CTA.TAG is strongly under-represented in many prokaryotic, eukaryotic, and viral sequences. The CCA.TGG triplet is ubiquitously over-represented in human viral and eukaryotic sequences. Among the tetranucleotides, several four-base-pair palindromes tend to be under-represented in phage sequences, probably as a means of restriction avoidance. The tetranucleotide CTAG is observed to be rare in virtually all bacterial genomes and some phage genomes. Explanations for these over- and under-representations in terms of DNA/RNA structures and regulatory mechanisms are considered.Keywords
This publication has 21 references indexed in Scilit:
- Genome inhomogeneity is determined mainly by WW and SS dinucleotidesBioinformatics, 1991
- Alternative chromatin structure at CpG islandsCell, 1990
- DNA methylation and late replication probably aid cell memory, and type I DNA reeling could aid chromosome folding and enhancer functionPhilosophical Transactions of the Royal Society of London. B, Biological Sciences, 1990
- Three-dimensional crystal structures of Escherichia coli met repressor with and without corepressorNature, 1989
- Deviations from Expected Frequencies of CpG Dinucleotides in Herpesvirus DNAs May Be Diagnostic of Differences in the States of Their Latent GenomesJournal of General Virology, 1989
- Crystal structure of trp represser/operator complex at atomic resolutionNature, 1988
- Theoretical molecular biology: Prospectives and perspectivesJournal of Theoretical Biology, 1987
- CpG-rich islands and the function of DNA methylationNature, 1986
- CpG frequency in large DNA segmentsJournal of Molecular Evolution, 1983
- A denaturation map of the λ phage DNA molecule determined by electron microscopyJournal of Molecular Biology, 1966