On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition

1 May 1980

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence

Vol. PAMI-2 (3), 242-252
https://doi.org/10.1109/tpami.1980.4767011

Abstract

This paper compares four classification algorithms-discriminant functions when classifying individuals into two multivariate populations. The discriminant functions (DF's) compared are derived according to the Bayes rule for normal populations and differ in assumptions on the covariance matrices' structure. Analytical formulas for the expected probability of misclassification EP_N are derived and show that the classification error EP_N depends on the structure of a classification algorithm, asymptotic probability of misclassification P∞, and the ratio of learning sample size N to dimensionality p:N/p for all linear DF's discussed and N²/p for quadratic DF's. The tables for learning quantity H = EPN</sub./P∞ depending on parameters P∞, N, and p for four classifilcation algorithms analyzed are presented and may be used for estimating the necessary learning sample size, detennining the optimal number of features, and choosing the type of the classification algorithm in the case of a limited learning sample size.

Keywords

This publication has 12 references indexed in Scilit:

Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
On the optimal number of features in the classification of multivariate Gaussian data
Pattern Recognition, 1978
On the Effects of Dimension in Discriminant Analysis
Technometrics, 1976
Discriminant Functions When Covariance Matrices are Unequal
Journal of the American Statistical Association, 1974
Estimation of the Errors of Misclassification on the Criterion of Asymptotic Mean Square Error
Technometrics, 1974
Small-sample optimality of design techniques for linear classifiers of Gaussian patterns
IEEE Transactions on Information Theory, 1972
On Expected Probabilities of Misclassification in Discriminant Analysis, Necessary Sample Size, and a Relation with the Multiple Correlation Coefficient
Biometrics, 1968
Estimation of Error Rates in Discriminant Analysis
Technometrics, 1968
An Asymptotic Expansion for the Distribution of the Linear Discriminant Function
The Annals of Mathematical Statistics, 1963
Computing the Distribution of Quadratic Forms in Normal Variables
Biometrika, 1961

Cited by 123 articles