DOI QR코드

DOI QR Code

Recognizing the Direction of Action using Generalized 4D Features

일반화된 4차원 특징을 이용한 행동 방향 인식

  • Kim, Sun-Jung (School of Electrical and Computer Engineering, Seoul National University) ;
  • Kim, Soo-Wan (School of Electrical and Computer Engineering, Seoul National University) ;
  • Choi, Jin-Young (School of Electrical and Computer Engineering, Seoul National University)
  • 김선정 (서울대학교 전기정보공학부) ;
  • 김수완 (서울대학교 전기정보공학부) ;
  • 최진영 (서울대학교 전기정보공학부)
  • Received : 2014.04.22
  • Accepted : 2014.07.02
  • Published : 2014.10.25

Abstract

In this paper, we propose a method to recognize the action direction of human by developing 4D space-time (4D-ST, [x,y,z,t]) features. For this, we propose 4D space-time interest points (4D-STIPs, [x,y,z,t]) which are extracted using 3D space (3D-S, [x,y,z]) volumes reconstructed from images of a finite number of different views. Since the proposed features are constructed using volumetric information, the features for arbitrary 2D space (2D-S, [x,y]) viewpoint can be generated by projecting the 3D-S volumes and 4D-STIPs on corresponding image planes in training step. We can recognize the directions of actors in the test video since our training sets, which are projections of 3D-S volumes and 4D-STIPs to various image planes, contain the direction information. The process for recognizing action direction is divided into two steps, firstly we recognize the class of actions and then recognize the action direction using direction information. For the action and direction of action recognition, with the projected 3D-S volumes and 4D-STIPs we construct motion history images (MHIs) and non-motion history images (NMHIs) which encode the moving and non-moving parts of an action respectively. For the action recognition, features are trained by support vector data description (SVDD) according to the action class and recognized by support vector domain density description (SVDDD). For the action direction recognition after recognizing actions, each actions are trained using SVDD according to the direction class and then recognized by SVDDD. In experiments, we train the models using 3D-S volumes from INRIA Xmas Motion Acquisition Sequences (IXMAS) dataset and recognize action direction by constructing a new SNU dataset made for evaluating the action direction recognition.

본 논문에서는 4차원 시공간 (4D-ST, [x,y,z,t]) 특징을 이용하여 행동 방향을 인식하는 방법을 제안한다. 이를 위해 4차원 시공간 특징점 (4D-STIPs, [x,y,z,t])을 제안하였고, 이는 여러 다른 뷰에서 촬영한 이미지들로부터 복원된 3차원 공간 (3D-S, [x,y,z]) 볼륨으로부터 계산된다. 3차원 공간정보를 갖고 있는 3D-S 볼륨과 4D-STIPs는 2차원 공간 (2D-S, [x,y]) 뷰로 사영을 하여 임의의 2D-S 뷰에서의 특징을 생성해 낼 수 있다. 이 때, 사영 방향을 결정 할 수 있으므로, 학습 시 방향에 대한 정보를 포함하여 행동 방향을 인식 할 수 있다. 행동 방향을 인식하는 과정은 두 단계로 나눌 수 있는데, 우선 어떤 행동인지를 인식하고 그 후, 방향 정보를 이용하여 최종적으로 행동 방향을 인식한다. 행동 인식과 방향 인식을 위해, 사영된 3D-S 볼륨과 4D-STIPs은 각각 움직이는 부분과 움직이지 않는 부분에 대한 정보를 담고 있는 motion history images (MHIs)와 non-motion history images (NMHIs)로 구성된다. 이러한 특징들은 행동 인식을 위해, 방향 정보에 상관없이 같은 행동이면 같은 클래스로 분류되어 support vector data description (SVDD) 분류기로 학습되고, support vector domain density description (SVDDD)을 이용하여 인식된다. 인식된 행동에서 최종적으로 방향을 인식하기 위해 각 행동을 방향 클래스로 분류하여 SVDD 분류기로 학습하고 SVDDD로 인식한다. 제안된 방법의 성능을 보이기 위해서 INRIA Xmas Motion Acquisition Sequences (IXMAS) 데이터셋에서 제공하는 3D-S 볼륨을 사용하여 학습을 하고, 행동 방향 인식 실험이 가능한 SNU 데이터셋을 구축하여 인식 실험을 하였다.

Keywords

References

  1. Daniel Weinland, Remi Ronfard, Edmond Boyer, "Free viewpoint action recognition using motion history volumes," Computer Vision and Image Understanding, vol. 104, no. 2-3, 2006.
  2. Magnus Burenius, Josephine Sullivan, Stefan Carlsson, "3D pictorial structures for multiple view articulated pose estimation," Computer Vision and Pattern Recognition, 2013.
  3. Fan Zhu, Ling Shao, Mingxiu Lin, "Multi-view action recognition using local similarity random forests and sensor fusion," Pattern Recognition Letters, vol. 34, pp. 20-24, 2013. https://doi.org/10.1016/j.patrec.2012.04.016
  4. Lu Xia, J. K. Aggarwal, "Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera," Computer Vision and Pattern Recognition, 2013.
  5. Omar Oreifej, Zicheng Liu, "HON4D: histogram of oriented 4D normals for activity recognition from depth sequences," Computer Vision and Pattern Recognition, 2013.
  6. Jiajia Luo, Wei Wang, Hairong Qi, "Group sparsity and geometry constrained dictionary learning for action recognition from depth maps," International Conference on Computer Vision, 2013.
  7. Cen Rao, Alper Yilmaz, Mubarak Shah, "View-invariant representation and recognition of actions," International Journal of Computer Vision, vol. 50, no. 2, pp. 203-226, 2002. https://doi.org/10.1023/A:1020350100748
  8. Vasu Parameswaran, Rama Chellappa, "View invariants for human action recognition," International Journal of Computer Vision, vol. 66, no. 1, pp. 83-101, 2006. https://doi.org/10.1007/s11263-005-3671-4
  9. Yuping Shen, Hassan Foroosh, "View-invariant action recognition using fundamental ratios," Computer Vision and Pattern Recognition, 2008.
  10. Imran N. Junejo, Emilie Dexter, Ivan Laptev, Patrick Perez, "Cross-view action recognition from temporal self-similarities," European Conference on Computer Vision, 2008.
  11. Jingen Liu, Mubarak Shah, Benjamin Kuipers, Silvio Savarese, "Cross-view action recognition via view knowledge transfer," Computer Vision and Pattern Recognition, 2011.
  12. Jingjing Zheng, Zhuolin Jiang, P. Jonathon Phillips, Rama Chellappa, "Cross-view action recognition via a transferable dictionary pair," British Machine Vision Conference, 2012.
  13. Binlong Li, Octavia I. Camps, Mario Sznaier, "Cross-view activity recognition using Hankelets," Computer Vision and Pattern Recognition, 2012.
  14. Jingjing Zheng, Zhuolin Jiang, "Learning view-invariant sparse representation for cross-view action recognition," International Conference on Computer Vision, 2013.
  15. Jingen Liu, Mubarak Shah, "Learning human actions via information maximization," Computer Vision and Pattern Recognition, 2008.
  16. Daniel Weinland, Edmond Boyer, Remi Ronfard, "Action recognition from arbitrary views using 3D exemplars," International Conference on Computer Vision, 2007.
  17. Pingkun Yan, Saad M. Khan, Mubarak Shah, "Learning 4D action feature models or arbitrary view action recognition," Computer Vision and Pattern Recognition, 2008.
  18. Kishore K Reddy, Jingen Liu, Mubarak Shah, "Incremental action recognition using feature-tree," International Conference on Computer Vision, 2009.
  19. Daniel Weinland, Mustafa Ozuysal, Pascal Fua, "Making action recognition robust to occlusions ans viewpoint changes," European Conference on Computer Vision, 2010.
  20. Mohamed Becha Kaaniche, Francois Bremond, "Gesture recognition by learning local motion signature," Computer Vision and Pattern Recognition, 2011.
  21. Xinxiao Wu, Yunde Jia, "View-invariant action recognition using latent kernelized structural SVM," European Conference on Computer Vision, 2012.
  22. Saad M. Khan, Pingkun Yan, Mubarak Shah, "A homographic framework for the fusion of multi-view silhouettes," International Conference on Computer Vision, 2007.
  23. Ivan Laptev, "On space-time interest points," International Journal of Computer Vision, vol. 64, no. 2/3, pp. 107-123, 2005. https://doi.org/10.1007/s11263-005-1838-7
  24. Aaron F. Bobick, James W. Davis, "The recognition of human movement using temporal templates," Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, 2001. https://doi.org/10.1109/34.910878
  25. Myoung Soo Park, Jin Hee Na, Jin Young Choi, "Feature extraction using class-augmented principal component analysis," Lecture Notes in Computer Science, vol. 4132, pp. 606-615, 2006.
  26. David M. J. Tax, Robert P. W. Duin, "Support vector data description," Machine Learning, vol. 54, pp. 45-66, 2004. https://doi.org/10.1023/B:MACH.0000008084.60811.49
  27. Woo Sung Kang, Jin Young Choi, "Domain density description for multiclass pattern classification with reduced computational load," Pattern Recognition, 2007.
  28. Richard Hartley, Andrew Zisserman, "Multiple view geometry," Cambridge University Press, 2000.
  29. Chris Harris, Mike Stephens, "A combined corner and edge detector," Alvey Vision Conference, 1988.

Cited by

  1. Depth-Based Recognition System for Continuous Human Action Using Motion History Image and Histogram of Oriented Gradient with Spotter Model vol.26, pp.6, 2016, https://doi.org/10.5391/JKIIS.2016.26.6.471