Average profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm
- 1 March 1995
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory
- Vol. 41 (2), 478-488
- https://doi.org/10.1109/18.370149
Abstract
Consider the parsing algorithm developed by Lempel and Ziv (1978) that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice, the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a randomly selected phrase, and the average number of phrases of a given size (the so-called average profile of phrase sizes). These parameters can be efficiently analyzed through a digital search tree representation. For a memoryless source with unequal probabilities of symbols generation (the so-called asymmetric Bernoulli model), we prove that the size of a typical phrase is asymptotically normally distributed with mean and variance explicitly computed. In terms of digital search trees, we prove the normal limiting distribution of the typical depth (i.e., the length of a path from the root to a randomly selected node). The latter finding is proved by a technique that belongs to the toolkit of the "analytical analysis of algorithms", and it seems to be novel in the context of data compression.Keywords
This publication has 28 references indexed in Scilit:
- Trie partitioning process: Limiting distributionsPublished by Springer Nature ,2005
- Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search treesTheoretical Computer Science, 1995
- Digital Search Trees Again Revisited: The Internal Path Length PerspectiveSIAM Journal on Computing, 1994
- Analysis of digital tries with Markovian dependencyIEEE Transactions on Information Theory, 1991
- Compression, Tests for Randomness and Estimating the Statistical Model of an Individual SequencePublished by Springer Nature ,1990
- On the variance of the external path length in a symmetric digital trieDiscrete Applied Mathematics, 1989
- Further results on digital search treesTheoretical Computer Science, 1988
- On classification with empirically observed statistics and universal data compressionIEEE Transactions on Information Theory, 1988
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978
- A universal algorithm for sequential data compressionIEEE Transactions on Information Theory, 1977