The minimum description length principle in coding and modeling
- 1 October 1998
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory
- Vol. 44 (6), 2743-2760
- https://doi.org/10.1109/18.720554
Abstract
We review the principles of minimum description length and stochastic complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.Keywords
This publication has 40 references indexed in Scilit:
- Mutual information, metric entropy and cumulative relative entropy riskThe Annals of Statistics, 1997
- A general minimax result for relative entropyIEEE Transactions on Information Theory, 1997
- A strong version of the redundancy-capacity theorem of universal codingIEEE Transactions on Information Theory, 1995
- Strong Consistency of the PLS Criterion for Order Determination of Autoregressive ProcessesThe Annals of Statistics, 1989
- Bayes TheoryPublished by Springer Nature ,1983
- The determination of optimum structures for the state space representation of multivariate stochastic processesIEEE Transactions on Automatic Control, 1982
- Universal modeling and codingIEEE Transactions on Information Theory, 1981
- The Estimation of the Order of an ARMA ProcessThe Annals of Statistics, 1980
- Modeling by shortest data descriptionAutomatica, 1978
- On the mathematical foundations of theoretical statisticsPhilosophical Transactions of the Royal Society A, 1922