AMS 4.0: consensus prediction of post-translational modifications in protein sequences
Open Access
- 4 May 2012
- journal article
- research article
- Published by Springer Nature in Amino Acids
- Vol. 43 (2), 573-582
- https://doi.org/10.1007/s00726-012-1290-2
Abstract
We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing.Keywords
This publication has 36 references indexed in Scilit:
- Fuzzy clustering of physicochemical and biochemical properties of amino AcidsAmino Acids, 2011
- Predicting post-translational lysine acetylation using support vector machinesBioinformatics, 2010
- Lysine acetylation sites prediction using an ensemble of support vector machine classifiersJournal of Theoretical Biology, 2010
- AMS 3.0: prediction of post-translational modificationsBMC Bioinformatics, 2010
- Modified differential evolution based fuzzy clustering for pixel classification in remote sensing imageryPattern Recognition, 2009
- Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selectionNucleic Acids Research, 2008
- Phospho.ELM: a database of phosphorylation sites update 2008Nucleic Acids Research, 2007
- AAindex: amino acid index database, progress report 2008Nucleic Acids Research, 2007
- KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patternsNucleic Acids Research, 2007
- Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequenceProteomics, 2004