Improving Reproducibility and Sensitivity in Identifying Human Proteins by Shotgun Proteomics

25 May 2004

journal article
research article
Published by American Chemical Society (ACS) in Analytical Chemistry

Vol. 76 (13), 3556-3568
https://doi.org/10.1021/ac035229m

Abstract

Identifying proteins in cell extracts by shotgun proteomics involves digesting the proteins, sequencing the resulting peptides by data-dependent mass spectrometry (MS/MS), and searching protein databases to identify the proteins from which the peptides are derived. Manual analysis and direct spectral comparison reveal that scores from two commonly used search programs (Sequest and Mascot) validate less than half of potentially identifiable MS/MS spectra (class positive) from shotgun analyses of the human erythroleukemia K562 cell line. Here we demonstrate increased sensitivity and accuracy using a focused search strategy along with a peptide sequence validation script that does not rely exclusively on XCorr or Mowse scores generated by Sequest or Mascot, but uses consensus between the search programs, along with chemical properties and scores describing the nature of the fragmentation spectrum (ion score and RSP). The approach yielded 4.2% false positive and 8% false negative frequencies in peptide assignments. The protein profile is then assembled from peptide assignments using a novel peptide-centric protein nomenclature that more accurately reports protein variants that contain identical peptide sequences. An Isoform Resolver algorithm ensures that the protein count is not inflated by variants in the protein database, eliminating ∼25% of redundant proteins. Analysis of soluble proteins from a human K562 cells identified 5130 unique proteins, with ∼100 false positive protein assignments.

Keywords

This publication has 12 references indexed in Scilit:

A New Algorithm for the Evaluation of Shotgun Peptide Sequencing in Proteomics: Support Vector Machine Classification of Peptide MS/MS Spectra and SEQUEST Scores
Journal of Proteome Research, 2002
Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search
Analytical Chemistry, 2002
Probability-Based Validation of Protein Identifications Using a Modified SEQUEST Algorithm
Analytical Chemistry, 2002
An accurate mass tag strategy for quantitative and high-throughput proteome measurements
Proteomics, 2002
Qscore: An algorithm for evaluating SEQUEST database search results
Journal of the American Society for Mass Spectrometry, 2002
Experimental Protein Mixture for Validating Tandem Mass Spectral Analysis
OMICS: A Journal of Integrative Biology, 2002
Probability-based protein identification by searching sequence databases using mass spectrometry data
Electrophoresis, 1999
Megakaryocytic Differentiation Induced by Constitutive Activation of Mitogen-Activated Protein Kinase Kinase
Molecular and Cellular Biology, 1997
Direct Analysis and Identification of Proteins in Mixtures by LC/MS/MS and Database Searching at the Low-Femtomole Level
Analytical Chemistry, 1997
Mass shifts due to ion/ion interactions in a quadrupole ion‐trap mass spectrometer
Rapid Communications in Mass Spectrometry, 1994

Cited by 200 articles