MetaBar - a tool for consistent contextual data acquisition and standards compliant submission
Open Access
- 30 June 2010
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 11 (1), 358
- https://doi.org/10.1186/1471-2105-11-358
Abstract
Background: Environmental sequence datasets are increasing at an exponential rate; however, the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. The consistent capture and structured submission of these data is crucial for integrated data analysis and ecosystems modeling. The application MetaBar has been developed, to support consistent contextual data acquisition. Results: MetaBar is a spreadsheet and web-based software tool designed to assist users in the consistent acquisition, electronic storage, and submission of contextual data associated to their samples. A preconfigured Microsoft® Excel® spreadsheet is used to initiate structured contextual data storage in the field or laboratory. Each sample is given a unique identifier and at any stage the sheets can be uploaded to the MetaBar database server. To label samples, identifiers can be printed as barcodes. An intuitive web interface provides quick access to the contextual data in the MetaBar database as well as user and project management capabilities. Export functions facilitate contextual and sequence data submission to the International Nucleotide Sequence Database Collaboration (INSDC), comprising of the DNA DataBase of Japan (DDBJ), the European Molecular Biology Laboratory database (EMBL) and GenBank. MetaBar requests and stores contextual data in compliance to the Genomic Standards Consortium specifications. The MetaBar open source code base for local installation is available under the GNU General Public License version 3 (GNU GPL3). Conclusion: The MetaBar software supports the typical workflow from data acquisition and field-sampling to contextual data enriched sequence submission to an INSDC database. The integration with the megx.net marine Ecological Genomics database and portal facilitates georeferenced data integration and metadata-based comparisons of sampling sites as well as interactive data visualization. The ample export functionalities and the INSDC submission support enable exchange of data across disciplines and safeguarding contextual data.This publication has 21 references indexed in Scilit:
- The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadataNucleic Acids Research, 2009
- Megx.net: integrated database resource for marine ecological genomicsNucleic Acids Research, 2009
- GeMInA, Genomic Metadata for Infectious Agents, a geospatial surveillance pathogen databaseNucleic Acids Research, 2009
- The Universal Protein Resource (UniProt) in 2010Nucleic Acids Research, 2009
- GenGIS: A geospatial information system for genomic dataGenome Research, 2009
- The future of biocurationNature, 2008
- The minimum information about a genome sequence (MIGS) specificationNature Biotechnology, 2008
- SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARBNucleic Acids Research, 2007
- bold: The Barcode of Life Data System (http://www.barcodinglife.org)Molecular Ecology Notes, 2007
- Annually reoccurring bacterial communities are predictable from ocean conditionsProceedings of the National Academy of Sciences, 2006