Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project
- 16 August 2005
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 5 (13), 3246-3261
- https://doi.org/10.1002/pmic.200500186
Abstract
The pilot phase of the HUPO Plasma Proteome Project (PPP) is an international collaboration to catalog the protein composition of human blood plasma and serum by analyzing standardized aliquots of reference serum and plasma specimens using a variety of experimental techniques. Data management for this project included collection, integration, analysis, and dissemination of findings from participating organizations world-wide. Accomplishing this task required a communication and coordination infrastructure specific enough to support meaningful integration of results from all participants, but flexible enough to react to changing requirements and new insights gained during the course of the project and to allow participants with varying informatics capabilities to contribute. Challenges included integrating heterogeneous data, reducing redundant information to minimal identification sets, and data annotation. Our data integration workflow assembles a minimal and representative set of protein identifications, which account for the contributed data. It accommodates incomplete concordance of results from different laboratories, ambiguity and redundancy in contributed identifications, and redundancy in the protein sequence databases. Recommendations of the PPP for future large-scale proteomics endeavors are described.Keywords
This publication has 22 references indexed in Scilit:
- The International Protein Index: An integrated database for proteomics experimentsProteomics, 2004
- The Human Proteome Organization Plasma Proteome Project pilot phase: Reference specimens, technology platform comparisons, and standardized data submissions and analysesProteomics, 2004
- Pedro: a configurable data entry tool for XMLBioinformatics, 2004
- Statistical Models for Protein Validation Using Tandem Mass Spectral Data and Protein Amino Acid Sequence DatabasesAnalytical Chemistry, 2004
- Human protein reference database as a discovery resource for proteomicsNucleic Acids Research, 2004
- Mining the Biomedical Literature in the Genomic Era: An OverviewJournal of Computational Biology, 2003
- GutenTag: High-Throughput Sequence Tagging via an Empirically Derived Fragmentation ModelAnalytical Chemistry, 2003
- Protein microarrays and proteomicsNature Genetics, 2002
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Experimental Protein Mixture for Validating Tandem Mass Spectral AnalysisOMICS: A Journal of Integrative Biology, 2002