Learning Spatiotemporal Features with 3D Convolutional Networks

Top Cited Papers

1 December 2015

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 4489-4497
https://doi.org/10.1109/iccv.2015.510

Abstract

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets, 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets, and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8% accuracy on UCF101 dataset with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use.

Keywords

Other Versions

This publication has 26 references indexed in Scilit:

Caffe
Published by Association for Computing Machinery (ACM) ,2014
Large-Scale Video Classification with Convolutional Neural Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Evaluating New Variants of Motion Interchange Patterns
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Action bank: A high-level representation of activity in video
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
3D Convolutional Neural Networks for Human Action Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012
Moving vistas: Exploiting motion for describing scenes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Boundary Learning by Optimization with Topological Constraints
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Convolutional Learning of Spatio-temporal Features
Lecture Notes in Computer Science, 2010
A Spatio-Temporal Descriptor Based on 3D-Gradients
Published by British Machine Vision Association and Society for Pattern Recognition ,2008
A 3-dimensional sift descriptor and its application to action recognition
Published by Association for Computing Machinery (ACM) ,2007

Cited by 5812 articles