Discovering objects and their location in images

Top Cited Papers

1 January 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1 (15505499), 370-377 Vol. 1
https://doi.org/10.1109/iccv.2005.77

Abstract

We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA). In text analysis, this is used to discover topics in a corpus using the bag-of-words document representation. Here we treat object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics. The model is applied to images by using a visual analogue of a word, formed by vector quantizing SIFT-like region descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approximate spatial layout are found without supervision. Performance of this unsupervised method is compared to the supervised approach of Fergus et al. (2003) on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include 'doublets' which encode spatially local co-occurring regions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classification and segmentation methods are applied to a set of images containing multiple objects per image. These results demonstrate that we can successfully build object class models from an unsupervised analysis of images.

Keywords

This publication has 14 references indexed in Scilit:

Hierarchical Dirichlet Processes
Journal of the American Statistical Association, 2006
Rapid object detection using a boosted cascade of simple features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Robust wide-baseline stereo from maximally stable extremal regions
Image and Vision Computing, 2004
Video Google: a text retrieval approach to object matching in videos
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Learning the semantics of words and pictures
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision, 2001
Unsupervised Learning by Probabilistic Latent Semantic Analysis
Machine Learning, 2001
10.1162/153244303322533214
Applied Physics Letters, 2000
Object recognition from local scale-invariant features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999
Minimum complexity density estimation
IEEE Transactions on Information Theory, 1991

Cited by 584 articles