An empirical study of list structure in Lisp

Abstract
Static measurements of the list structure of five large Lisp programs are reported and analyzed in this paper. These measurements reveal substantial regularity, or predictability, among pointers to atoms and especially among pointers to lists. Pointers to atoms are found to obey, roughly, Zipf's law, which governs word frequencies in natural languages; pointers to lists usually point to a location physically nearby in memory. The use of such regularities in the space-efficient representation of list structure is discussed. Linearization of lists, whereby successive cdrs (or cars) are placed in consecutive memory locations whenever possible, greatly strengthens the observed regularity of list structure. It is shown that under some reasonable assumptions, the entropy or information content of a car-cdr pair in the programs measured is about 10 to 15 bits before linearization, and about 7 to 12 bits after.

This publication has 9 references indexed in Scilit: