Improving Reproducibility and Sensitivity in Identifying Human Proteins by Shotgun Proteomics
- 25 May 2004
- journal article
- research article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 76 (13), 3556-3568
- https://doi.org/10.1021/ac035229m
Abstract
Identifying proteins in cell extracts by shotgun proteomics involves digesting the proteins, sequencing the resulting peptides by data-dependent mass spectrometry (MS/MS), and searching protein databases to identify the proteins from which the peptides are derived. Manual analysis and direct spectral comparison reveal that scores from two commonly used search programs (Sequest and Mascot) validate less than half of potentially identifiable MS/MS spectra (class positive) from shotgun analyses of the human erythroleukemia K562 cell line. Here we demonstrate increased sensitivity and accuracy using a focused search strategy along with a peptide sequence validation script that does not rely exclusively on XCorr or Mowse scores generated by Sequest or Mascot, but uses consensus between the search programs, along with chemical properties and scores describing the nature of the fragmentation spectrum (ion score and RSP). The approach yielded 4.2% false positive and 8% false negative frequencies in peptide assignments. The protein profile is then assembled from peptide assignments using a novel peptide-centric protein nomenclature that more accurately reports protein variants that contain identical peptide sequences. An Isoform Resolver algorithm ensures that the protein count is not inflated by variants in the protein database, eliminating ∼25% of redundant proteins. Analysis of soluble proteins from a human K562 cells identified 5130 unique proteins, with ∼100 false positive protein assignments.Keywords
This publication has 12 references indexed in Scilit:
- A New Algorithm for the Evaluation of Shotgun Peptide Sequencing in Proteomics: Support Vector Machine Classification of Peptide MS/MS Spectra and SEQUEST ScoresJournal of Proteome Research, 2002
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Probability-Based Validation of Protein Identifications Using a Modified SEQUEST AlgorithmAnalytical Chemistry, 2002
- An accurate mass tag strategy for quantitative and high-throughput proteome measurementsProteomics, 2002
- Qscore: An algorithm for evaluating SEQUEST database search resultsJournal of the American Society for Mass Spectrometry, 2002
- Experimental Protein Mixture for Validating Tandem Mass Spectral AnalysisOMICS: A Journal of Integrative Biology, 2002
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Megakaryocytic Differentiation Induced by Constitutive Activation of Mitogen-Activated Protein Kinase KinaseMolecular and Cellular Biology, 1997
- Direct Analysis and Identification of Proteins in Mixtures by LC/MS/MS and Database Searching at the Low-Femtomole LevelAnalytical Chemistry, 1997
- Mass shifts due to ion/ion interactions in a quadrupole ion‐trap mass spectrometerRapid Communications in Mass Spectrometry, 1994