Learning the semantics of object–action relations by observation
Open Access
- 1 August 2011
- journal article
- Published by SAGE Publications in The International Journal of Robotics Research
- Vol. 30 (10), 1229-1249
- https://doi.org/10.1177/0278364911410459
Abstract
Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.Keywords
This publication has 33 references indexed in Scilit:
- Action Recognition Using Mined Hierarchical Compound FeaturesIEEE Transactions on Pattern Analysis and Machine Intelligence, 2010
- Segment Tracking via a Spatiotemporal Linking Process including Feedback Stabilization in an n-D Lattice ModelSensors, 2009
- Action observation can prime visual object recognitionExperimental Brain Research, 2009
- Scene Modelling and Classification Using Learned Spatial RelationsLecture Notes in Computer Science, 2009
- Putting Objects in PerspectiveInternational Journal of Computer Vision, 2008
- Incremental learning of gestures by imitation in a humanoid robotPublished by Association for Computing Machinery (ACM) ,2007
- Recognition and reproduction of gestures using a probabilistic framework combining PCA, ICA and HMMPublished by Association for Computing Machinery (ACM) ,2005
- Distinctive Image Features from Scale-Invariant KeypointsInternational Journal of Computer Vision, 2004
- Robots that imitate humansTrends in Cognitive Sciences, 2002
- The symbol grounding problemPhysica D: Nonlinear Phenomena, 1990