Improving the reliability and throughput of mass spectrometry‐based proteomics by spectrum quality filtering

22 March 2006

journal article
research article
Published by Wiley in Proteomics

Vol. 6 (7), 2086-2094
https://doi.org/10.1002/pmic.200500309

Abstract

In contemporary peptide‐centric or non‐gel proteome studies, vast amounts of peptide fragmentation data are generated of which only a small part leads to peptide or protein identification. This motivates the development and use of a filtering algorithm that removes spectra that contribute little to protein identification. Removal of unidentifiable spectra reduced both the amount of computational and human time spent on analyzing spectra as well as the chances of obtaining false identifications. Thorough testing on various proteome datasets from different instruments showed that the best suggested machine‐learning classifier is, on average, able to recognize half of the unidentified spectra as bad spectra. Further analyses showed that several unidentified spectra classified as good were derived from peptides carrying unanticipated amino acid modifications or contained sequence tags that allowed peptide identification using homology searches. The implementation of the classifiers is available under the GNU General Public License at http://www.bioinfo.no/software/spectrumquality.

This publication has 24 references indexed in Scilit:

Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book
Nature Methods, 2004
Spectral Quality Assessment for High-Throughput Tandem Mass Spectrometry Proteomics
OMICS: A Journal of Integrative Biology, 2004
Automatic Quality Assessment of Peptide Tandem Mass Spectra
Bioinformatics, 2004
The International Protein Index: An integrated database for proteomics experiments
Proteomics, 2004
A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics
Analytical Chemistry, 2004
The Need for Guidelines in Publication of Peptide and Protein Identification Data
Molecular & Cellular Proteomics, 2004
Improving large‐scale proteomics by clustering of mass spectrometry data
Proteomics, 2004
Chemical probes and tandem mass spectrometry: a strategy for the quantitative analysis of proteomes and subproteomes
Current Opinion in Chemical Biology, 2003
Probability-based protein identification by searching sequence databases using mass spectrometry data
Electrophoresis, 1999
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
Journal of the American Society for Mass Spectrometry, 1994

Cited by 68 articles