MauveDB

27 June 2006

proceedings article
Published by Association for Computing Machinery (ACM)

p. 73-84
https://doi.org/10.1145/1142473.1142483

Abstract

Real-world data --- especially when generated by distributed measurement infrastructures such as sensor networks --- tends to be incomplete, imprecise, and erroneous, making it impossible to present it to users or feed it directly into applications. The traditional approach to dealing with this problem is to first process the data using statistical or probabilistic models that can provide more robust interpretations of the data. Current database systems, however, do not provide adequate support for applying models to such data, especially when those models need to be frequently updated as new data arrives in the system. Hence, most scientists and engineers who depend on models for managing their data do not use database systems for archival or querying at all; at best, databases serve as a persistent raw data store.In this paper we define a new abstraction called model-based views and present the architecture of MauveDB, the system we are building to support such views. Just as traditional database views provide logical data independence, model-based views provide independence from the details of the underlying data generating mechanism and hide the irregularities of the data by using models to present a consistent view to the users. MauveDB supports a declarative language for defining model-based views, allows declarative querying over such views using SQL, and supports several different materialization strategies and techniques to efficiently maintain them in the face of frequent updates. We have implemented a prototype system that currently supports views based on regression and interpolation, using the Apache Derby open source DBMS, and we present results that show the utility and performance benefits that can be obtained by supporting several different types of model-based views in a database system.

Keywords

This publication has 20 references indexed in Scilit:

Indexing continuously changing data with mean-variance tree
Published by Association for Computing Machinery (ACM) ,2005
Distributed regression
Published by Association for Computing Machinery (ACM) ,2004
Evaluating probabilistic queries over imprecise data
Published by Association for Computing Machinery (ACM) ,2003
Wireless sensor networks for habitat monitoring
Published by Association for Computing Machinery (ACM) ,2002
Scalable Information-Driven Sensor Querying and Routing for Ad Hoc Heterogeneous Sensor Networks
The International Journal of High Performance Computing Applications, 2002
Wireless sensor networks: a survey
Computer Networks, 2002
ProbView
ACM Transactions on Database Systems, 1997
A probabilistic relational algebra for the integration of information retrieval and database systems
ACM Transactions on Information Systems, 1997
The management of probabilistic data
IEEE Transactions on Knowledge and Data Engineering, 1992
Incomplete Information in Relational Databases
Journal of the ACM, 1984

Cited by 123 articles