Differential Use of Signal Peptides and Membrane Domains Is a Common Occurrence in the Protein Output of Transcriptional Units

Abstract
Membrane organization describes the orientation of a protein with respect to the membrane and can be determined by the presence, or absence, and organization within the protein sequence of two features: endoplasmic reticulum signal peptides and alpha-helical transmembrane domains. These features allow protein sequences to be classified into one of five membrane organization categories: soluble intracellular proteins, soluble secreted proteins, type I membrane proteins, type II membrane proteins, and multi-spanning membrane proteins. Generation of protein isoforms with variable membrane organizations can change a protein's subcellular localization or association with the membrane. Application of MemO, a membrane organization annotation pipeline, to the FANTOM3 Isoform Protein Sequence mouse protein set revealed that within the 8,032 transcriptional units (TUs) with multiple protein isoforms, 573 had variation in their use of signal peptides, 1,527 had variation in their use of transmembrane domains, and 615 generated protein isoforms from distinct membrane organization classes. The mechanisms underlying these transcript variations were analyzed. While TUs were identified encoding all pairwise combinations of membrane organization categories, the most common was conversion of membrane proteins to soluble proteins. Observed within our high-confidence set were 156 TUs predicted to generate both extracellular soluble and membrane proteins, and 217 TUs generating both intracellular soluble and membrane proteins. The differential use of endoplasmic reticulum signal peptides and transmembrane domains is a common occurrence within the variable protein output of TUs. The generation of protein isoforms that are targeted to multiple subcellular locations represents a major functional consequence of transcript variation within the mouse transcriptome. Many genes produce only a single protein; however, others are known to produce a number of proteins with different functions in the cell. The function of a protein within the cell is influenced by its location; for example, proteins that are secreted can act as messengers, whereas proteins embedded in the membrane may act as receptors or channels. Features that determine the eventual location of a protein are found in the protein sequence. The authors identified two such features, the signal peptide that targets a protein for secretion, and the transmembrane domain that embeds a protein in the membrane, predicting their occurrence in mouse protein sequences. The authors then searched the entire mouse genome for genes that vary in the use of these features in protein isoforms. They found a large number of genes that produce proteins with variation in these features; for example, they identified genes producing proteins that are both secreted and intracellular, and genes producing proteins that are both membrane bound and soluble. This process is likely to be a major source of functional variation in the output of mammalian genes.