A New Algorithm for the Evaluation of Shotgun Peptide Sequencing in Proteomics: Support Vector Machine Classification of Peptide MS/MS Spectra and SEQUEST Scores
- 11 December 2002
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 2 (2), 137-146
- https://doi.org/10.1021/pr0255654
Abstract
Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative peptides. Use of the support vector machine and these additional parameters resulted in a more accurate interpretation of peptide MS/MS spectra and is an important step toward automated interpretation of peptide tandem mass spectrometry data in proteomics. Keywords: shotgun peptide sequencing • SEQUEST • support vector machine • machine learning • mass spectrometry • capillary LC/MS/MS • proteomicsKeywords
This publication has 28 references indexed in Scilit:
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Functional organization of the yeast proteome by systematic analysis of protein complexesNature, 2002
- Directed Proteomic Analysis of the Human NucleolusCurrent Biology, 2002
- The Yeast Nuclear Pore ComplexThe Journal of cell biology, 2000
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- The interpretation of collision‐induced dissociation tandem mass spectra of peptidesMass Spectrometry Reviews, 1995
- Peptides Presented to the Immune System by the Murine Class II Major Histocompatibility Complex Molecule I-A dScience, 1992
- Contributions of mass spectrometry to peptide and protein structureJournal of Mass Spectrometry, 1988
- Sequence analysis of oligopeptides by secondary ion/collision activated dissociation mass spectrometryAnalytical Chemistry, 1981
- The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review, 1958