Ask the locals: Multi-way local pooling for image recognition

1 November 2011

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 15505499,p. 2651-2658
https://doi.org/10.1109/iccv.2011.6126555

Abstract

Invariant representations in object recognition systems are generally obtained by pooling feature vectors over spatially local neighborhoods. But pooling is not local in the feature vector space, so that widely dissimilar features may be pooled together if they are in nearby locations. Recent approaches rely on sophisticated encoding methods and more specialized codebooks (or dictionaries), e.g., learned on subsets of descriptors which are close in feature space, to circumvent this problem. In this work, we argue that a common trait found in much recent work in image recognition or retrieval is that it leverages locality in feature space on top of purely spatial locality. We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. State-of-the-art results on several object recognition benchmarks show the promise of this approach.

Keywords

This publication has 30 references indexed in Scilit:

Locality-constrained Linear Coding for image classification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Multiple kernels for object detection
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Visual Word Ambiguity
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Fast image search for learned metrics
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Learning subcategory relevances for category recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Reducing the Dimensionality of Data with Neural Networks
Science, 2006
Dimensionality Reduction by Learning an Invariant Mapping
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
The pyramid match kernel: discriminative classification with sets of image features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Video Google: a text retrieval approach to object matching in videos
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Receptive fields, binocular interaction and functional architecture in the cat's visual cortex
The Journal of Physiology, 1962

Cited by 180 articles