Fast Nearest Neighbor Condensation for Large Data Sets Classification

8 October 2007

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering

Vol. 19 (11), 1450-1464
https://doi.org/10.1109/tkde.2007.190645

Abstract

This work has two main objectives, namely, to introduce a novel algorithm, called the fast condensed nearest neighbor (FCNN) rule, for computing a training-set-consistent subset for the nearest neighbor decision rule and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst-case time complexity is quadratic but often with a small constant prefactor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangle inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here-enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior and, often, in terms of the size of the model while it guaranteed the same prediction accuracy. Furthermore, it was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology (MIT) Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.

Keywords

This publication has 26 references indexed in Scilit:

Fast minimization of structural risk by nearest neighbor rule
IEEE Transactions on Neural Networks, 2003
An incremental prototype set building technique
Pattern Recognition, 2002
Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition
Pattern Recognition, 2001
Nearest neighbor classification from multiple feature subsets
Intelligent Data Analysis, 1999
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design
IEEE Transactions on Systems, Man, and Cybernetics, 1994
Consistent Nonparametric Regression
The Annals of Statistics, 1977
An algorithm for a selective nearest neighbor decision rule (Corresp.)
IEEE Transactions on Information Theory, 1975
The reduced nearest neighbor rule (Corresp.)
IEEE Transactions on Information Theory, 1972
The condensed nearest neighbor rule (Corresp.)
IEEE Transactions on Information Theory, 1968

Cited by 133 articles