Merging Multiple Data Streams on Common Keys over High Performance Networks

1 January 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Abstract

The model for data mining on streaming data assumes that there is a buffer of fixed length and a data stream of infinite length and the challenge is to extract patterns, changes, anomalies, and statistically significant structures by examining the data one time and storing records and derived attributes of length less than N. As data grids, data webs, and semantic webs become more common, mining distributed streaming data will become more and more important. The first step when presented with two or more distributed streams is to merge them using a common key. In this paper, we present two algorithms for merging streaming data using a common key. We also present experimental studies showing these algorithms scale in practice to OC-12 networks.

Keywords

This publication has 8 references indexed in Scilit:

Simple Available Bandwidth Utilization Library for High-Speed Wide Area Networks
The Journal of Supercomputing, 2005
Streaming-data algorithms for high-quality clustering
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Clustering data streams
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
DataSpace: a data Web for the exploratory analysis and mining of data
Computing in Science & Engineering, 2002
A Dataspace Infrastructure for Astronomical Data
Published by Springer Nature ,2001
Mining high-speed data streams
Published by Association for Computing Machinery (ACM) ,2000
NiagaraCQ
Published by Association for Computing Machinery (ACM) ,2000
Continuous queries over append-only databases
Published by Association for Computing Machinery (ACM) ,1992

Cited by 2 articles