Improving the reliability and throughput of mass spectrometry‐based proteomics by spectrum quality filtering
- 22 March 2006
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 6 (7), 2086-2094
- https://doi.org/10.1002/pmic.200500309
Abstract
In contemporary peptide‐centric or non‐gel proteome studies, vast amounts of peptide fragmentation data are generated of which only a small part leads to peptide or protein identification. This motivates the development and use of a filtering algorithm that removes spectra that contribute little to protein identification. Removal of unidentifiable spectra reduced both the amount of computational and human time spent on analyzing spectra as well as the chances of obtaining false identifications. Thorough testing on various proteome datasets from different instruments showed that the best suggested machine‐learning classifier is, on average, able to recognize half of the unidentified spectra as bad spectra. Further analyses showed that several unidentified spectra classified as good were derived from peptides carrying unanticipated amino acid modifications or contained sequence tags that allowed peptide identification using homology searches. The implementation of the classifiers is available under the GNU General Public License at http://www.bioinfo.no/software/spectrumquality.This publication has 24 references indexed in Scilit:
- Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the bookNature Methods, 2004
- Spectral Quality Assessment for High-Throughput Tandem Mass Spectrometry ProteomicsOMICS: A Journal of Integrative Biology, 2004
- Automatic Quality Assessment of Peptide Tandem Mass SpectraBioinformatics, 2004
- The International Protein Index: An integrated database for proteomics experimentsProteomics, 2004
- A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun ProteomicsAnalytical Chemistry, 2004
- The Need for Guidelines in Publication of Peptide and Protein Identification DataMolecular & Cellular Proteomics, 2004
- Improving large‐scale proteomics by clustering of mass spectrometry dataProteomics, 2004
- Chemical probes and tandem mass spectrometry: a strategy for the quantitative analysis of proteomes and subproteomesCurrent Opinion in Chemical Biology, 2003
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994