Using mutual information for selecting features in supervised neural net learning
- 1 July 1994
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks
- Vol. 5 (4), 537-550
- https://doi.org/10.1109/72.298224
Abstract
This paper investigates the application of the mutual information criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is suitable for assessing the "information content" of features in complex classification tasks, where methods bases on linear relations (like the correlation) are prone to mistakes. The fact that the mutual information is independent of the coordinates chosen permits a robust estimation. Nonetheless, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a "greedy" selection of the features and that takes both the mutual information with respect to the output class and with respect to the already-selected features into account. Finally the results of a series of experiments are discussed.Keywords
This publication has 13 references indexed in Scilit:
- Using the Karhunen-Loe've transformation in the back-propagation training algorithmIEEE Transactions on Neural Networks, 1991
- The self-organizing mapProceedings of the IEEE, 1990
- Mutual information functions versus correlation functionsJournal of Statistical Physics, 1990
- A simple procedure for pruning back-propagation trained neural networksIEEE Transactions on Neural Networks, 1990
- Boolean Feature Discovery in Empirical LearningMachine Learning, 1990
- How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output SignalsNeural Computation, 1989
- Reconstructing attractors from scalar time series: A comparison of singular system and redundancy criteriaPhysica D: Nonlinear Phenomena, 1989
- Analysis of hidden units in a layered network trained to classify sonar targetsNeural Networks, 1988
- Independent coordinates for strange attractors from mutual informationPhysical Review A, 1986
- THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMSAnnals of Eugenics, 1936