The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools

21 August 2007

journal article
research article
Published by American Chemical Society (ACS) in Journal of Proteome Research

Vol. 7 (1), 96-103
https://doi.org/10.1021/pr070244j

Abstract

Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the “ISB standard protein mix”, using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF−TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.

Keywords

This publication has 23 references indexed in Scilit:

Protein Identification by Tandem Mass Spectrometry and Sequence Database Searching
Published by Springer Nature ,2007
Optimized Peptide Separation and Identification for Mass Spectrometry Based Proteomics via Free-Flow Electrophoresis
Journal of Proteome Research, 2006
Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations
Nature Methods, 2005
Computational analysis of shotgun proteomics data
Current Opinion in Chemical Biology, 2005
A common open representation of mass spectrometry data and its application to proteomics research
Nature Biotechnology, 2004
Standard Mixtures for Proteome Studies
OMICS: A Journal of Integrative Biology, 2004
A microcapillary trap cartridge‐microcapillary high‐performance liquid chromatography electrospray ionization emitter device capable of peptide tandem mass spectrometry at the attomole level on an ion trap mass spectrometer with automated routine operation
Rapid Communications in Mass Spectrometry, 2003
Proteome Analysis by Mass Spectrometry
Annual Review of Biophysics, 2003
Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search
Analytical Chemistry, 2002
Direct analysis of protein complexes using mass spectrometry
Nature Biotechnology, 1999

Cited by 150 articles