Data streaming algorithms for efficient and accurate estimation of flow size distribution

1 June 2004

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGMETRICS Performance Evaluation Review

Vol. 32 (1), 177-188
https://doi.org/10.1145/1012888.1005709

Abstract

Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.

Keywords

This publication has 18 references indexed in Scilit:

An information-theoretic approach to traffic matrix estimation
Published by Association for Computing Machinery (ACM) ,2003
Estimating flow distributions from sampled flow statistics
Published by Association for Computing Machinery (ACM) ,2003
Fast accurate computation of large-scale IP traffic matrices from link loads
Published by Association for Computing Machinery (ACM) ,2003
Bitmap algorithms for counting active flows on high speed links
Published by Association for Computing Machinery (ACM) ,2003
Inverting sampled traffic
Published by Association for Computing Machinery (ACM) ,2003
New directions in traffic measurement and accounting
Published by Association for Computing Machinery (ACM) ,2002
Traffic matrix estimation
Published by Association for Computing Machinery (ACM) ,2002
Properties and prediction of flow statistics from sampled packet streams
Published by Association for Computing Machinery (ACM) ,2002
Iterative Bayesian estimation of network traffic matrices in the case of bursty flows
Published by Association for Computing Machinery (ACM) ,2002
Charging from sampled network usage
Published by Association for Computing Machinery (ACM) ,2001

Cited by 71 articles