Fast Nearest Neighbor Condensation for Large Data Sets Classification
- 8 October 2007
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 19 (11), 1450-1464
- https://doi.org/10.1109/tkde.2007.190645
Abstract
This work has two main objectives, namely, to introduce a novel algorithm, called the fast condensed nearest neighbor (FCNN) rule, for computing a training-set-consistent subset for the nearest neighbor decision rule and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst-case time complexity is quadratic but often with a small constant prefactor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangle inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here-enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior and, often, in terms of the size of the model while it guaranteed the same prediction accuracy. Furthermore, it was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology (MIT) Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.Keywords
This publication has 26 references indexed in Scilit:
- Fast minimization of structural risk by nearest neighbor ruleIEEE Transactions on Neural Networks, 2003
- An incremental prototype set building techniquePattern Recognition, 2002
- Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognitionPattern Recognition, 2001
- Nearest neighbor classification from multiple feature subsetsIntelligent Data Analysis, 1999
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998
- Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems designIEEE Transactions on Systems, Man, and Cybernetics, 1994
- Consistent Nonparametric RegressionThe Annals of Statistics, 1977
- An algorithm for a selective nearest neighbor decision rule (Corresp.)IEEE Transactions on Information Theory, 1975
- The reduced nearest neighbor rule (Corresp.)IEEE Transactions on Information Theory, 1972
- The condensed nearest neighbor rule (Corresp.)IEEE Transactions on Information Theory, 1968