A statistical approach to learning and generalization in layered neural networks

Abstract
A general statistical description of the problem of learning from examples is presented. Learning in layered networks is posed as a search in the network parameter space for a network that minimizes an additive error function of a statistically independent examples. By imposing the equivalence of the minimum error and the maximum likelihood criteria for training the network, the Gibbs distribution on the ensemble of networks with a fixed architecture is derived. The probability of correct prediction of a novel example can be expressed using the ensemble, serving as a measure to the network's generalization ability. The entropy of the prediction distribution is shown to be a consistent measure of the network's performance. The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves.<>

This publication has 17 references indexed in Scilit: