The minimum description length principle in coding and modeling

1 October 1998

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory

Vol. 44 (6), 2743-2760
https://doi.org/10.1109/18.720554

Abstract

We review the principles of minimum description length and stochastic complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.

Keywords

This publication has 40 references indexed in Scilit:

Mutual information, metric entropy and cumulative relative entropy risk
The Annals of Statistics, 1997
A general minimax result for relative entropy
IEEE Transactions on Information Theory, 1997
A strong version of the redundancy-capacity theorem of universal coding
IEEE Transactions on Information Theory, 1995
Strong Consistency of the PLS Criterion for Order Determination of Autoregressive Processes
The Annals of Statistics, 1989
Bayes Theory
Published by Springer Nature ,1983
The determination of optimum structures for the state space representation of multivariate stochastic processes
IEEE Transactions on Automatic Control, 1982
Universal modeling and coding
IEEE Transactions on Information Theory, 1981
The Estimation of the Order of an ARMA Process
The Annals of Statistics, 1980
Modeling by shortest data description
Automatica, 1978
On the mathematical foundations of theoretical statistics
Philosophical Transactions of the Royal Society A, 1922

Cited by 641 articles