Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project

16 August 2005

journal article
research article
Published by Wiley in Proteomics

Vol. 5 (13), 3246-3261
https://doi.org/10.1002/pmic.200500186

Abstract

The pilot phase of the HUPO Plasma Proteome Project (PPP) is an international collaboration to catalog the protein composition of human blood plasma and serum by analyzing standardized aliquots of reference serum and plasma specimens using a variety of experimental techniques. Data management for this project included collection, integration, analysis, and dissemination of findings from participating organizations world-wide. Accomplishing this task required a communication and coordination infrastructure specific enough to support meaningful integration of results from all participants, but flexible enough to react to changing requirements and new insights gained during the course of the project and to allow participants with varying informatics capabilities to contribute. Challenges included integrating heterogeneous data, reducing redundant information to minimal identification sets, and data annotation. Our data integration workflow assembles a minimal and representative set of protein identifications, which account for the contributed data. It accommodates incomplete concordance of results from different laboratories, ambiguity and redundancy in contributed identifications, and redundancy in the protein sequence databases. Recommendations of the PPP for future large-scale proteomics endeavors are described.

Keywords

This publication has 22 references indexed in Scilit:

The International Protein Index: An integrated database for proteomics experiments
Proteomics, 2004
The Human Proteome Organization Plasma Proteome Project pilot phase: Reference specimens, technology platform comparisons, and standardized data submissions and analyses
Proteomics, 2004
Pedro: a configurable data entry tool for XML
Bioinformatics, 2004
Statistical Models for Protein Validation Using Tandem Mass Spectral Data and Protein Amino Acid Sequence Databases
Analytical Chemistry, 2004
Human protein reference database as a discovery resource for proteomics
Nucleic Acids Research, 2004
Mining the Biomedical Literature in the Genomic Era: An Overview
Journal of Computational Biology, 2003
GutenTag: High-Throughput Sequence Tagging via an Empirically Derived Fragmentation Model
Analytical Chemistry, 2003
Protein microarrays and proteomics
Nature Genetics, 2002
Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search
Analytical Chemistry, 2002
Experimental Protein Mixture for Validating Tandem Mass Spectral Analysis
OMICS: A Journal of Integrative Biology, 2002

Cited by 47 articles