MauveDB
- 27 June 2006
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
Real-world data --- especially when generated by distributed measurement infrastructures such as sensor networks --- tends to be incomplete, imprecise, and erroneous, making it impossible to present it to users or feed it directly into applications. The traditional approach to dealing with this problem is to first process the data using statistical or probabilistic models that can provide more robust interpretations of the data. Current database systems, however, do not provide adequate support for applying models to such data, especially when those models need to be frequently updated as new data arrives in the system. Hence, most scientists and engineers who depend on models for managing their data do not use database systems for archival or querying at all; at best, databases serve as a persistent raw data store.In this paper we define a new abstraction called model-based views and present the architecture of MauveDB, the system we are building to support such views. Just as traditional database views provide logical data independence, model-based views provide independence from the details of the underlying data generating mechanism and hide the irregularities of the data by using models to present a consistent view to the users. MauveDB supports a declarative language for defining model-based views, allows declarative querying over such views using SQL, and supports several different materialization strategies and techniques to efficiently maintain them in the face of frequent updates. We have implemented a prototype system that currently supports views based on regression and interpolation, using the Apache Derby open source DBMS, and we present results that show the utility and performance benefits that can be obtained by supporting several different types of model-based views in a database system.Keywords
This publication has 20 references indexed in Scilit:
- Indexing continuously changing data with mean-variance treePublished by Association for Computing Machinery (ACM) ,2005
- Distributed regressionPublished by Association for Computing Machinery (ACM) ,2004
- Evaluating probabilistic queries over imprecise dataPublished by Association for Computing Machinery (ACM) ,2003
- Wireless sensor networks for habitat monitoringPublished by Association for Computing Machinery (ACM) ,2002
- Scalable Information-Driven Sensor Querying and Routing for Ad Hoc Heterogeneous Sensor NetworksThe International Journal of High Performance Computing Applications, 2002
- Wireless sensor networks: a surveyComputer Networks, 2002
- ProbViewACM Transactions on Database Systems, 1997
- A probabilistic relational algebra for the integration of information retrieval and database systemsACM Transactions on Information Systems, 1997
- The management of probabilistic dataIEEE Transactions on Knowledge and Data Engineering, 1992
- Incomplete Information in Relational DatabasesJournal of the ACM, 1984