Abstract
Moving from gene sequences to protein behaviors, and from there to cellular function, organismal biology, and fitness, is an immediate challenge to biology in the postgenomic age. Genome databases are, of course, nothing more than collections of molecular structures for organic “natural products.” Unfortunately, the chemical structure of an organic compound does not transparently reveal its behavior or its function in a biological system. This is true even if the molecular structure is a protein sequence. However, protein sequences do contain information about their historical pasts (1). The similarities between sequences within a family of proteins can be used to construct an evolutionary tree that shows familial relationships. Ancestral sequences can be reconstructed by inference from descendant sequences. Dates can be placed on events in that molecular history. Events in the molecular history can be correlated with events in the geological and paleontological records. From this correlation has emerged a strategy for interpretive proteomics: perhaps if we understand a protein's past, we may be better able to understand its present (2). A compelling illustration of this strategy is provided by Zhang and Rosenberg (3) in this issue of PNAS. These scientists examined a pair of proteins from the granules of eosinophilic lymphocytes. These proteins are paralogs; they arose by gene duplication some 30 million years ago (Ma) in an African primate that was ancestral to humans and Old World monkeys. The pair of proteins are relatives of digestive ribonuclease in artiodactyls, the mammal order containing ox, giraffe, deer, and antelope (4). This digestive ribonuclease was evidently created ≈40 million years ago, when ruminant digestion first emerged, to degrade the RNA from bacteria growing in the rumen (5). Zhang and Rosenberg asked: why are these two gene duplicates present in physiology? The names of the two proteins, eosinophil-derived neurotoxin …