Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on n‐peptide compositions

Abstract
Gram‐negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram‐negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n‐peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT‐B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high‐throughput and large‐scale analysis of proteomic and genomic data.