Nonparametric Bayes error estimation using unclassified samples

Abstract
A new nonparametric method of estimating the Bayes risk using an unclassified test sample set as well as a classified design sample set is introduced. The classified design set is used to obtain nonparametric estimates of the conditional Bayes risk of classification at each point of the unclassified test set. The average of these risk estimates is the error estimate. For large numbers of design samples the new error estimate has less variance than does an error-count estimate for classified test samples using the optimum Bayes classifier. The first application of the nonparametric method usesk-nearest neighbor (k-NN) estimates of the posterior probabilities to form the risk estimate. A large-sample analysis is made of this estimate. The expected value of this estimate is shown to be a lower bound on the Bayes error. A simple modification provides unbiased estimates of thek-NN classification error, thus providing an upper bound on the Bayes error. The second application of the method uses Parzen approximation of the density functions to obtain estimates of the risk and subsequently the Bayes error. Results of experiments on simulated data illustrate the small-sample behavior.

This publication has 12 references indexed in Scilit: