BIOZON: a system for unification, management and analysis of heterogeneous biological data
Open Access
- 15 February 2006
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 7 (1), 70
- https://doi.org/10.1186/1471-2105-7-70
Abstract
Background Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. Description Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. Conclusion The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org.Keywords
This publication has 37 references indexed in Scilit:
- Alignment of metabolic pathwaysBioinformatics, 2005
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2004
- Integration of biological sourcesACM SIGMOD Record, 2004
- Within the twilight zone: a sensitive profile-profile comparison tool based on information theoryJournal of Molecular Biology, 2002
- A survey of approaches to automatic schema matchingThe VLDB Journal, 2001
- An overview of the Object Protocol Model (OPM) and the OPM data management toolsInformation Systems, 1995
- Challenges in Integrating Biological Data SourcesJournal of Computational Biology, 1995
- SRS—an indexing and retrieval tool for flat file data librariesBioinformatics, 1993
- Transforming a set of biological flat file librariesto a fast access networkBioinformatics, 1993
- Model independent assertions for integration of heterogeneous schemasThe VLDB Journal, 1992