Nonparametric Bayes error estimation using unclassified samples

Abstract

A new nonparametric method of estimating the Bayes risk using an unclassified test sample set as well as a classified design sample set is introduced. The classified design set is used to obtain nonparametric estimates of the conditional Bayes risk of classification at each point of the unclassified test set. The average of these risk estimates is the error estimate. For large numbers of design samples the new error estimate has less variance than does an error-count estimate for classified test samples using the optimum Bayes classifier. The first application of the nonparametric method usesk-nearest neighbor (k-NN) estimates of the posterior probabilities to form the risk estimate. A large-sample analysis is made of this estimate. The expected value of this estimate is shown to be a lower bound on the Bayes error. A simple modification provides unbiased estimates of thek-NN classification error, thus providing an upper bound on the Bayes error. The second application of the method uses Parzen approximation of the density functions to obtain estimates of the risk and subsequently the Bayes error. Results of experiments on simulated data illustrate the small-sample behavior.

Keywords

This publication has 12 references indexed in Scilit:

Application of optimum error-reject functions (Corresp.)
IEEE Transactions on Information Theory, 1972
Estimation of Classification Error
IEEE Transactions on Computers, 1971
Convergence of the nearest neighbor rule
IEEE Transactions on Information Theory, 1971
Nonparametric Bayes-risk estimation
IEEE Transactions on Information Theory, 1971
The Nearest Neighbor Classification Rule with a Reject Option
IEEE Transactions on Systems Science and Cybernetics, 1970
On optimum recognition error and reject tradeoff
IEEE Transactions on Information Theory, 1970
Some convergence properties of a nearest neighbor decision rule
IEEE Transactions on Information Theory, 1970
Nearest neighbor pattern classification
IEEE Transactions on Information Theory, 1967
Estimation of a multivariate density
Annals of the Institute of Statistical Mathematics, 1966
On Estimation of a Probability Density Function and Mode
The Annals of Mathematical Statistics, 1962

Cited by 63 articles