Dipeptide frequencies in proteins and the CpG deficiency in vertebrate DNA

Abstract
Analysis of vertebrate protein sequences totalling 4040 residues shows that amino acids with a high proportion of codons ending in C occur with significantly reduced frequency before amino acids whose codons start with G. This effect is not shown by “control” bacterial protein sequences. The consequent implication of shortage of XXC. GXX codon pairs in vertebrate messenger RNA is discussed in relation to the extreme rarity of the base doublet CpG in vertebrate DNA.

This publication has 28 references indexed in Scilit: