Quantitative linguistics and complex system studies*

1 December 1996

journal article
research article
Published by Taylor & Francis in Journal of Quantitative Linguistics

Vol. 3 (3), 177-228
https://doi.org/10.1080/09296179608599629

Abstract

Linguistic discourses treated as maximum entropy systems of words according to prescriptions of algorithmic information theory (Kolmogorov, Chaitin, & Zurek) are shown to give a natural explanation of Zipf's law with quantitative rigor. The pattern of word frequencies in discourse naturally leads to a distinction between two classes of words: content words (c‐words) and service words (s‐words). A unified entropy model for the two classes of words leads to word frequency distribution functions in accordance with data. The model draws on principles of classical and quantum statistical mechanics and emphasises general principles of classifying, counting and optimising their related costs for coding of sequential symbols, under certain obvious constraints; hence it is likely to be valid for diverse complex systems of nature. Unlike other models of Zipf s law, which require exponential distribution of word lengths, entropy models based on words as primary symbols do not restrict the word length distribution. It is shown that language exhibits the characteristics of complex adaptive systems (Gell‐Mann, 1994), in which the complexity measure is maximal for a system of intermediate algorithmic entropy, between totally ordered and disordered systems. A complexity function ‐ a higher order entropy ‐is defined for linguistic discourse which has the above properties. Natural discourses indeed seem to have the right mix of order and randomness and a complexity close to maximal.

Keywords

This publication has 26 references indexed in Scilit:

Algorithmic randomness and physical entropy
Physical Review A, 1989
Chaotic dynamics, Markov partitions, and Zipf's law
Journal of Statistical Physics, 1989
Did Shakespeare write a newly-discovered poem?
Biometrika, 1987
POWER LAW RELATIONS IN SCIENCE BIBLIOGRAPHY—A SELF‐CONSISTENT INTERPRETATION
Journal of Documentation, 1971
Programming languages in mechanized documentation
Journal of Documentation, 1971
On publication characteristics of research establishments
Czechoslovak Journal of Physics, 1970
Bradford's Law of Bibliography of Science: an Interpretation
Nature, 1970
THE RELATION BETWEEN THE DICTIONARY DISTRIBUTION AND THE OCCURRENCE DISTRIBUTION OF WORD LENGTH AND ITS IMPORTANCE FOR THE STUDY OF QUANTITATIVE LINGUISTICS
Biometrika, 1958
Information Theory and Statistical Mechanics
Physical Review B, 1957
ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS
Biometrika, 1955

Cited by 22 articles