On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition
- 1 May 1980
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Vol. PAMI-2 (3), 242-252
- https://doi.org/10.1109/tpami.1980.4767011
Abstract
This paper compares four classification algorithms-discriminant functions when classifying individuals into two multivariate populations. The discriminant functions (DF's) compared are derived according to the Bayes rule for normal populations and differ in assumptions on the covariance matrices' structure. Analytical formulas for the expected probability of misclassification EPN are derived and show that the classification error EPN depends on the structure of a classification algorithm, asymptotic probability of misclassification P∞, and the ratio of learning sample size N to dimensionality p:N/p for all linear DF's discussed and N2/p for quadratic DF's. The tables for learning quantity H = EPN</sub./P∞ depending on parameters P∞, N, and p for four classifilcation algorithms analyzed are presented and may be used for estimating the necessary learning sample size, detennining the optimal number of features, and choosing the type of the classification algorithm in the case of a limited learning sample size.Keywords
This publication has 12 references indexed in Scilit:
- Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- On the optimal number of features in the classification of multivariate Gaussian dataPattern Recognition, 1978
- On the Effects of Dimension in Discriminant AnalysisTechnometrics, 1976
- Discriminant Functions When Covariance Matrices are UnequalJournal of the American Statistical Association, 1974
- Estimation of the Errors of Misclassification on the Criterion of Asymptotic Mean Square ErrorTechnometrics, 1974
- Small-sample optimality of design techniques for linear classifiers of Gaussian patternsIEEE Transactions on Information Theory, 1972
- On Expected Probabilities of Misclassification in Discriminant Analysis, Necessary Sample Size, and a Relation with the Multiple Correlation CoefficientBiometrics, 1968
- Estimation of Error Rates in Discriminant AnalysisTechnometrics, 1968
- An Asymptotic Expansion for the Distribution of the Linear Discriminant FunctionThe Annals of Mathematical Statistics, 1963
- Computing the Distribution of Quadratic Forms in Normal VariablesBiometrika, 1961