Confidence estimation methods for neural networks: a practical comparison

Abstract
Feedforward neural networks, particularly multilayer perceptrons, are widely used in regression and classification tasks. A reliable and practical measure of prediction confidence is essential. In this work three alternative approaches to prediction confidence estimation are presented and compared. The three methods are the maximum likelihood, approximate Bayesian, and the bootstrap technique. We consider prediction uncertainty owing to both data noise and model parameter misspecification. The methods are tested on a number of controlled artificial problems and a real, industrial regression application, the prediction of paper "curl". Confidence estimation performance is assessed by calculating the mean and standard deviation of the prediction interval coverage probability. We show that treating data noise variance as a function of the inputs is appropriate for the curl prediction task. Moreover, we show that the mean coverage probability can only gauge confidence estimation performance as an average over the input space, i.e., global performance and that the standard deviation of the coverage is unreliable as a measure of local performance. The approximate Bayesian approach is found to perform better in terms of global performance.

This publication has 14 references indexed in Scilit: