Overfitting revisited: an information-theoretic approach to simplifying discrimination trees

Abstract
This paper describes a method of simplifying inductively generated discrimination trees using a measure of tree quality based on the principle of information economy, which takes into account both the size of the tree and the size of the outcome data after (notional) encoding by that tree. Results of testing this method on a selection of data sets show that it has some practical advantages over previously used techniques for tree-pruning. Some of the theoretical implications of the present method are also discussed.

This publication has 4 references indexed in Scilit: