Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

Open Access

20 May 2013

journal article
review article
Published by Royal Society of Chemistry (RSC) in Chemical Society Reviews

Vol. 42 (16), 6754-6776
https://doi.org/10.1039/c3cs60050e

Abstract

Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing “big data”. We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the “grand challenges” for the preservation and sharing of chemical information.

Keywords

This publication has 95 references indexed in Scilit:

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data
BMC Bioinformatics, 2010
CDK-Taverna: an open workflow environment for cheminformatics
BMC Bioinformatics, 2010
It's the Data!
Molecular Biology of the Cell, 2010
Tunable Machine Vision-Based Strategy for Automated Annotation of Chemical Databases
Journal of Chemical Information and Modeling, 2009
The Collaboratory for MS3D: A New Cyberinfrastructure for the Structural Elucidation of Biological Macromolecules and Their Assemblies Using Mass Spectrometry-Based Approaches
Journal of Proteome Research, 2008
Chemical Markup, XML, and the World Wide Web. 7. CMLSpect, an XML Vocabulary for Spectral Data
Journal of Chemical Information and Modeling, 2007
SemanticEye: A Semantic Web Application to Rationalize and Enhance Chemical Electronic Publishing
Journal of Chemical Information and Modeling, 2006
The Blue Obelisk—Interoperability in Chemical Informatics
Journal of Chemical Information and Modeling, 2006
Enhancement of the chemical semantic web through the use of InChI identifiers
Organic & Biomolecular Chemistry, 2005
The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics
Journal of Chemical Information and Computer Sciences, 2003

Cited by 40 articles