Bayesian Regularization and Pruning Using a Laplace Prior
- 1 January 1995
- journal article
- Published by MIT Press in Neural Computation
- Vol. 7 (1), 117-143
- https://doi.org/10.1162/neco.1995.7.1.117
Abstract
Standard techniques for improved generalization from neuralnetworks include weight decay and pruning. Weight decay has aBayesian interpretation with the decay function corresponding to aprior over weights. The method of transformation groups and maximumentropy suggests a Laplace rather than a gaussian prior. Aftertraining, the weights then arrange themselves into two classes: (1)those with a common sensitivity to the data error and (2) thosefailing to achieve this sensitivity and that therefore vanish.Since the critical value is determined adaptively during training,pruning---in the sense of setting weights to exact zeros---becomesan automatic consequence of regularization alone. The count of freeparameters is also reduced automatically as weights are pruned. Acomparison is made with results of MacKay using the evidenceframework and a gaussian regularizer.Keywords
This publication has 5 references indexed in Scilit:
- Fast Exact Multiplication by the HessianNeural Computation, 1994
- Curvature-driven smoothing: a learning algorithm for feedforward networksIEEE Transactions on Neural Networks, 1993
- A scaled conjugate gradient algorithm for fast supervised learningNeural Networks, 1993
- A Practical Bayesian Framework for Backpropagation NetworksNeural Computation, 1992
- Prior ProbabilitiesIEEE Transactions on Systems Science and Cybernetics, 1968