Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problems

4 December 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. i, 417-423 vol.1
https://doi.org/10.1109/icpr.1990.118138

Abstract

The authors discuss the effects of sample size on the feature selection and error estimation for several types of classifiers. In addition to surveying prior work in this area, they give practical advice to today's designers and users of statistical pattern recognition systems. It is pointed out that one needs a large number of training samples if a complex classification rule with many features is being utilized. In many pattern recognition problems, the number of potential features is very large and not much is known about the characteristics of the pattern classes under consideration: thus, it is difficult to determine a priori the complexity of the classification rule needed. Therefore, even when the designer believes that a large number of training samples has been selected, they may not be enough for designing and evaluating the classification problem at hand. It is further noted that a small sample size can cause many problems in the design of a pattern recognition system.

Keywords

This publication has 26 references indexed in Scilit:

On the accuracy of a bootstrap estimate of the classification error
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Bootstrap Techniques for Error Estimation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1987
Recent advances in error rate estimation
Pattern Recognition Letters, 1986
Hierarchical Classifier Design Using Mutual Information
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1982
3 Large sample approximations and asymptotic expansions of classification statistics
Published by Elsevier ,1982
On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1980
Additive estimators for probabilities of correct classification
Pattern Recognition, 1978
Bibliography on estimation of misclassification
IEEE Transactions on Information Theory, 1974
Optimization of k nearest neighbor density estimates
IEEE Transactions on Information Theory, 1973

Cited by 38 articles