DOI QR코드

DOI QR Code

Decomposed "Spatial and Temporal" Convolution for Human Action Recognition in Videos

  • Sediqi, Khwaja Monib (Dept. of Computer Science & Engineering, Chonbuk National University) ;
  • Lee, Hyo Jong (Dept. of Computer Science & Engineering, Chonbuk National University)
  • Published : 2019.05.10

Abstract

In this paper we study the effect of decomposed spatiotemporal convolutions for action recognition in videos. Our motivation emerges from the empirical observation that spatial convolution applied on solo frames of the video provide good performance in action recognition. In this research we empirically show the accuracy of factorized convolution on individual frames of video for action classification. We take 3D ResNet-18 as base line model for our experiment, factorize its 3D convolution to 2D (Spatial) and 1D (Temporal) convolution. We train the model from scratch using Kinetics video dataset. We then fine-tune the model on UCF-101 dataset and evaluate the performance. Our results show good accuracy similar to that of the state of the art algorithms on Kinetics and UCF-101 datasets.

Keywords