Abstract
Standard techniques for improved generalization from neuralnetworks include weight decay and pruning. Weight decay has aBayesian interpretation with the decay function corresponding to aprior over weights. The method of transformation groups and maximumentropy suggests a Laplace rather than a gaussian prior. Aftertraining, the weights then arrange themselves into two classes: (1)those with a common sensitivity to the data error and (2) thosefailing to achieve this sensitivity and that therefore vanish.Since the critical value is determined adaptively during training,pruning---in the sense of setting weights to exact zeros---becomesan automatic consequence of regularization alone. The count of freeparameters is also reduced automatically as weights are pruned. Acomparison is made with results of MacKay using the evidenceframework and a gaussian regularizer.

This publication has 5 references indexed in Scilit: