What is the best multi-stage architecture for object recognition?

Top Cited Papers

1 September 2009

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 15505499,p. 2146-2153
https://doi.org/10.1109/iccv.2009.5459469

Abstract

In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the non-linearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a two-stage system with random filters can yield almost 63% recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (5.6%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (> 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (0.53%).

Keywords

This publication has 21 references indexed in Scilit:

Why is Real-World Visual Object Recognition Hard?
PLoS Computational Biology, 2008
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Reducing the Dimensionality of Data with Neural Networks
Science, 2006
Multiclass Object Recognition with Sparse, Localized Features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
K-SVD and its non-negative variant for dictionary design
Published by SPIE-Intl Soc Optical Eng ,2005
Learning methods for generic object recognition with invariance to pose and lighting
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
Sparse coding with an overcomplete basis set: A strategy employed by V1?
Vision Research, 1997

Cited by 1221 articles