SWAT: hierarchical stream summarization in large networks
- 13 May 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The problem of statistics and aggregate maintenance over data streams has gained popularity in recent years es- pecially in telecommunications network monitoring, trend- related analysis, web-click streams, stock tickers, and other time-variant data. The amount of data generated in such applications can become too large to store, or if stored too large to scan multiple times. We consider queries over data streams that are biased towards the more recent values. We develop a technique that summarizes a dynamic stream in- crementally at multiple resolutions. This approximation can be used to answer point queries, range queries, and inner product queries. Moreover, the precision of answers can be changed adaptively by a client. Later, we extend the above technique to work in a dis- tributed setting, specifically in a large network where a cen- tral site summarizes the stream and clients ask queries. We minimize the message overhead by deciding what and where to replicate by using an adaptive replication scheme. We maintain a hierarchy of approximations that change adap- tively based on the query and update rates. We show ex- perimentally that our technique performs better than existing techniques: up to times better in terms of approximation quality, up to four orders of magnitude times better in re- sponse time, and up to five times better in terms of message complexity.Keywords
This publication has 10 references indexed in Scilit:
- Approximating a data stream for querying and estimation: algorithms and performance evaluationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Clustering data streamsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Processing complex aggregate queries over data streamsPublished by Association for Computing Machinery (ACM) ,2002
- Maintaining Stream Statistics over Sliding WindowsSIAM Journal on Computing, 2002
- Models and issues in data stream systemsPublished by Association for Computing Machinery (ACM) ,2002
- Data-streams and histogramsPublished by Association for Computing Machinery (ACM) ,2001
- Adaptive precision setting for cached approximate valuesPublished by Association for Computing Machinery (ACM) ,2001
- On computing correlated aggregates over continual data streamsPublished by Association for Computing Machinery (ACM) ,2001
- An adaptive data replication algorithmACM Transactions on Database Systems, 1997
- The space complexity of approximating the frequency momentsPublished by Association for Computing Machinery (ACM) ,1996