Discriminative models for multi-class object layout

1 September 2009

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 229-236
https://doi.org/10.1109/iccv.2009.5459256

Abstract

Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. Such reductions allow one to leverage sophisticated classifiers for learning. These models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically. We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image. Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics. We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formulate learning as a convex optimization problem. We employ a cutting plane algorithm similar to efficiently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes.

Keywords

This publication has 14 references indexed in Scilit:

Cutting-plane training of structural SVMs
Machine Learning, 2009
A discriminatively trained, multiscale, deformable part model
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Object categorization using co-occurrence, location and appearance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Putting Objects in Perspective
International Journal of Computer Vision, 2008
Training structural SVMs when exact inference is intractable
Published by Association for Computing Machinery (ACM) ,2008
TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation
Lecture Notes in Computer Science, 2006
Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Histograms of Oriented Gradients for Human Detection
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
A hierarchical field framework for unified context-based classification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Near-regular texture analysis and manipulation
ACM Transactions on Graphics, 2004

Cited by 176 articles