Atlas – a data warehouse for integrative bioinformatics
Open Access
- 21 February 2005
- journal article
- database
- Published by Springer Nature in BMC Bioinformatics
- Vol. 6 (1), 34
- https://doi.org/10.1186/1471-2105-6-34
Abstract
Background: We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description: The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion: The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/Keywords
This publication has 34 references indexed in Scilit:
- The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction dataNature Biotechnology, 2004
- EnsMart: A Generic System for Fast and Flexible Access to Biological DataGenome Research, 2004
- IntAct: an open source molecular interaction databaseNucleic Acids Research, 2004
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- The Database of Interacting Proteins: 2004 updateNucleic Acids Research, 2004
- Human protein reference database as a discovery resource for proteomicsNucleic Acids Research, 2004
- The UCSC Genome Browser DatabaseNucleic Acids Research, 2003
- BIND: the Biomolecular Interaction Network DatabaseNucleic Acids Research, 2003
- Microarray databases: standards and ontologiesNature Genetics, 2002
- An ontology for bioinformatics applications.Bioinformatics, 1999