Large‐scale plant protein subcellular location prediction
- 18 September 2006
- journal article
- research article
- Published by Wiley in Journal of Cellular Biochemistry
- Vol. 100 (3), 665-678
- https://doi.org/10.1002/jcb.21096
Abstract
Current plant genome sequencing projects have called for development of novel and powerful high throughput tools for timely annotating the subcellular location of uncharacterized plant proteins. In view of this, an ensemble classifier, Plant-PLoc, formed by fusing many basic individual classifiers, has been developed for large-scale subcellular location prediction for plant proteins. Each of the basic classifiers was engineered by the K-Nearest Neighbor (KNN) rule. Plant-PLoc discriminates plant proteins among the following 11 subcellular locations: (1) cell wall, (2) chloroplast, (3) cytoplasm, (4) endoplasmic reticulum, (5) extracell, (6) mitochondrion, (7) nucleus, (8) peroxisome, (9) plasma membrane, (10) plastid, and (11) vacuole. As a demonstration, predictions were performed on a stringent benchmark dataset in which none of the proteins included has ≥25% sequence identity to any other in a same subcellular location to avoid the homology bias. The overall success rate thus obtained was 32–51% higher than the rates obtained by the previous methods on the same benchmark dataset. The essence of Plant-PLoc in enhancing the prediction quality and its significance in biological applications are discussed. Plant-PLoc is accessible to public as a free web-server at http://202.120.37.186/bioinf/plant. Furthermore, for public convenience, results predicted by Plant-PLoc have been provided in a downloadable file at the same website for all plant protein entries in the Swiss-Prot database that do not have subcellular location annotations, or are annotated as being uncertain. The large-scale results will be updated twice a year to include new entries of plant proteins and reflect the continuous development of Plant-PLoc. J. Cell. Biochem. 100: 665–678, 2007.Keywords
This publication has 60 references indexed in Scilit:
- Comparative Sequencing of Plant Genomes: Choices to MakePlant Cell, 2006
- Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid compositionBiochemical and Biophysical Research Communications, 2005
- Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein typesBiochemical and Biophysical Research Communications, 2005
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- Prediction of protein cellular attributes using pseudo‐amino acid compositionProteins-Structure Function and Bioinformatics, 2001
- Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid SequenceJournal of Molecular Biology, 2000
- ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sitesProtein Science, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Discrimination of Intracellular and Extracellular Proteins Using Amino Acid Composition and Residue-pair FrequenciesJournal of Molecular Biology, 1994
- Prediction of protein structural class by discriminant analysisBiochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 1986