ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST
Open Access
- 1 July 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 32 (Web Server), W414-W419
- https://doi.org/10.1093/nar/gkh350
Abstract
Automated prediction of subcellular localization of proteins is an important step in the functional annotation of genomes. The existing subcellular localization prediction methods are based on either amino acid composition or N-terminal characteristics of the proteins. In this paper, support vector machine (SVM) has been used to predict the subcellular location of eukaryotic proteins from their different features such as amino acid composition, dipeptide composition and physico-chemical properties. The SVM module based on dipeptide composition performed better than the SVM modules based on amino acid composition or physico-chemical properties. In addition, PSI-BLAST was also used to search the query sequence against the dataset of proteins (experimentally annotated proteins) to predict its subcellular location. In order to improve the prediction accuracy, we developed a hybrid module using all features of a protein, which consisted of an input vector of 458 dimensions (400 dipeptide compositions, 33 properties, 20 amino acid compositions of the protein and 5 from PSI-BLAST output). Using this hybrid approach, the prediction accuracies of nuclear, cytoplasmic, mitochondrial and extracellular proteins reached 95.3, 85.2, 68.2 and 88.9%, respectively. The overall prediction accuracy of SVM modules based on amino acid composition, physico-chemical properties, dipeptide composition and the hybrid approach was 78.1, 77.8, 82.9 and 88.0%, respectively. The accuracy of all the modules was evaluated using a 5-fold cross-validation technique. Assigning a reliability index (reliability index ≥3), 73.5% of prediction can be made with an accuracy of 96.4%. Based on the above approach, an online web server ESLpred was developed, which is available at http://www.imtech.res.in/raghava/eslpred/.Keywords
This publication has 16 references indexed in Scilit:
- Analysis and prediction of affinity of TAP binding peptides using cascade SVMProtein Science, 2004
- A novel approach to the recognition of protein architecture from sequence using fourier analysis and neural networksProteins-Structure Function and Bioinformatics, 2002
- Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positionsBioinformatics, 2002
- PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localizationTrends in Biochemical Sciences, 1999
- Using neural networks for prediction of the subcellular location of proteinsNucleic Acids Research, 1998
- Wanted: subcellular localization of proteins based on sequenceTrends in Cell Biology, 1998
- The DEF data base of sequence based protein fold class predictions.1994
- Expert system for predicting protein localization sites in gram‐negative bacteriaProteins-Structure Function and Bioinformatics, 1991
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988