The modal distribution of protein isoelectric points reflects amino acid properties rather than sequence evolution

Abstract
Two-dimensional gel electrophoresis, a routine application in proteomics, separates proteins according to their molecular mass (Mr) and isoelectric point (pI). As the genomic sequences for more and more organisms are determined, the Mr and pI of all their proteins can be estimated computationally. The examination of several of these theoretical proteome plots has revealed a multimodal pI distribution, however, no conclusive explanation for this unusual distribution has so far been presented. We examined the pI distribution of 115 fully sequenced genomes and observed that the modal distribution does not reflect phylogeny or sequence evolution, but rather the chemical properties of amino acids. We provide a statistical explanation of why the observed distributions of pI values are multimodal.