Probabilistic counting
- 1 November 1983
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 02725428,p. 76-82
- https://doi.org/10.1109/sfcs.1983.46
Abstract
We present here a class of probabilistic algorithms with which one can estimate the number of distinct elements in a collection of data (typically a large file stored on disk) in a single pass, using only 0(1) auxiliary storage and 0(1) operations per element. We precisely quantify the accuracy-storage trade-offs: for instance a typical accuracy of about 5% can be achieved using only 256 binary words, even for very large files. The algorithms are totally insensitive to the replicative structure of the elements in the file. They are particularly adapted to data base systems in the context of query optimization and can be implemented in a decentralized manner (thus making them also useful for distributed data base applications).Keywords
This publication has 4 references indexed in Scilit:
- Approximate counting: A detailed analysisBIT Numerical Mathematics, 1985
- The complexity of approximate countingPublished by Association for Computing Machinery (ACM) ,1983
- Counting large numbers of events in small registersCommunications of the ACM, 1978
- Key-to-address transform techniquesCommunications of the ACM, 1971