Ask the locals: Multi-way local pooling for image recognition
- 1 November 2011
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 15505499,p. 2651-2658
- https://doi.org/10.1109/iccv.2011.6126555
Abstract
Invariant representations in object recognition systems are generally obtained by pooling feature vectors over spatially local neighborhoods. But pooling is not local in the feature vector space, so that widely dissimilar features may be pooled together if they are in nearby locations. Recent approaches rely on sophisticated encoding methods and more specialized codebooks (or dictionaries), e.g., learned on subsets of descriptors which are close in feature space, to circumvent this problem. In this work, we argue that a common trait found in much recent work in image recognition or retrieval is that it leverages locality in feature space on top of purely spatial locality. We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. State-of-the-art results on several object recognition benchmarks show the promise of this approach.Keywords
This publication has 30 references indexed in Scilit:
- Locality-constrained Linear Coding for image classificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Multiple kernels for object detectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Visual Word AmbiguityPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Fast image search for learned metricsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Learning subcategory relevances for category recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Reducing the Dimensionality of Data with Neural NetworksScience, 2006
- Dimensionality Reduction by Learning an Invariant MappingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- The pyramid match kernel: discriminative classification with sets of image featuresPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Video Google: a text retrieval approach to object matching in videosPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Receptive fields, binocular interaction and functional architecture in the cat's visual cortexThe Journal of Physiology, 1962