Approximate computation of multidimensional aggregates of sparse data using wavelets
- 1 June 1999
- conference paper
- Published by Association for Computing Machinery (ACM)
- Vol. 28 (2), 193-204
- https://doi.org/10.1145/304182.304199
Abstract
Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries.In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a smalll number of I/Os, depending upon the desired accuracy.We present two I/O-efficient algorithms to construct the compact data cube for the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.Keywords
This publication has 11 references indexed in Scilit:
- Data cube approximation and histograms via waveletsPublished by Association for Computing Machinery (ACM) ,1998
- Wavelet-based histograms for selectivity estimationPublished by Association for Computing Machinery (ACM) ,1998
- New sampling-based summary statistics for improving approximate query answersPublished by Association for Computing Machinery (ACM) ,1998
- Online aggregationPublished by Association for Computing Machinery (ACM) ,1997
- An array-based algorithm for simultaneous multidimensional aggregatesPublished by Association for Computing Machinery (ACM) ,1997
- Improved histograms for selectivity estimation of range predicatesPublished by Association for Computing Machinery (ACM) ,1996
- Implementing data cubes efficientlyPublished by Association for Computing Machinery (ACM) ,1996
- An Overview of Wavelet Based Multiresolution AnalysesSIAM Review, 1994
- The input/output complexity of sorting and related problemsCommunications of the ACM, 1988
- Access path selection in a relational database management systemPublished by Association for Computing Machinery (ACM) ,1979