Approximate computation of multidimensional aggregates of sparse data using wavelets

1 June 1999

conference paper
Published by Association for Computing Machinery (ACM)

Vol. 28 (2), 193-204
https://doi.org/10.1145/304182.304199

Abstract

Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries.In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a smalll number of I/Os, depending upon the desired accuracy.We present two I/O-efficient algorithms to construct the compact data cube for the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.

Keywords

This publication has 11 references indexed in Scilit:

Data cube approximation and histograms via wavelets
Published by Association for Computing Machinery (ACM) ,1998
Wavelet-based histograms for selectivity estimation
Published by Association for Computing Machinery (ACM) ,1998
New sampling-based summary statistics for improving approximate query answers
Published by Association for Computing Machinery (ACM) ,1998
Online aggregation
Published by Association for Computing Machinery (ACM) ,1997
An array-based algorithm for simultaneous multidimensional aggregates
Published by Association for Computing Machinery (ACM) ,1997
Improved histograms for selectivity estimation of range predicates
Published by Association for Computing Machinery (ACM) ,1996
Implementing data cubes efficiently
Published by Association for Computing Machinery (ACM) ,1996
An Overview of Wavelet Based Multiresolution Analyses
SIAM Review, 1994
The input/output complexity of sorting and related problems
Communications of the ACM, 1988
Access path selection in a relational database management system
Published by Association for Computing Machinery (ACM) ,1979

Cited by 183 articles