Learning the semantics of object–action relations by observation

Open Access

1 August 2011

journal article
Published by SAGE Publications in The International Journal of Robotics Research

Vol. 30 (10), 1229-1249
https://doi.org/10.1177/0278364911410459

Abstract

Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.

Keywords

This publication has 33 references indexed in Scilit:

Action Recognition Using Mined Hierarchical Compound Features
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010
Segment Tracking via a Spatiotemporal Linking Process including Feedback Stabilization in an n-D Lattice Model
Sensors, 2009
Action observation can prime visual object recognition
Experimental Brain Research, 2009
Scene Modelling and Classification Using Learned Spatial Relations
Lecture Notes in Computer Science, 2009
Putting Objects in Perspective
International Journal of Computer Vision, 2008
Incremental learning of gestures by imitation in a humanoid robot
Published by Association for Computing Machinery (ACM) ,2007
Recognition and reproduction of gestures using a probabilistic framework combining PCA, ICA and HMM
Published by Association for Computing Machinery (ACM) ,2005
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Robots that imitate humans
Trends in Cognitive Sciences, 2002
The symbol grounding problem
Physica D: Nonlinear Phenomena, 1990

Cited by 131 articles