DOI QR코드

DOI QR Code

Effective Pose-based Approach with Pose Estimation for Emotional Action Recognition

자세 예측을 이용한 효과적인 자세 기반 감정 동작 인식

  • 김진옥 (대구한의대학교 국제문화정보대학 모바일콘텐츠학부)
  • Received : 2012.09.03
  • Accepted : 2012.10.20
  • Published : 2013.03.31

Abstract

Early researches in human action recognition have focused on tracking and classifying articulated body motions. Such methods required accurate segmentation of body parts, which is a sticky task, particularly under realistic imaging conditions. Recent trends of work have become popular towards the use of more and low-level appearance features such as spatio-temporal interest points. Given the great progress in pose estimation over the past few years, redefined views about pose-based approach are needed. This paper addresses the issues of whether it is sufficient to train a classifier only on low-level appearance features in appearance approach and proposes effective pose-based approach with pose estimation for emotional action recognition. In order for these questions to be solved, we compare the performance of pose-based, appearance-based and its combination-based features respectively with respect to scenario of various emotional action recognition. The experiment results show that pose-based features outperform low-level appearance-based approach of features, even when heavily spoiled by noise, suggesting that pose-based approach with pose estimation is beneficial for the emotional action recognition.

인간의 동작 인식에 대한 이전 연구는 주로 관절체로 표현된 신체 움직임을 추적하고 분류하는데 초점을 맞춰 왔다. 이 방식들은 실제 이미지 사용 환경에서 신체 부위에 대한 정확한 분류가 필요하다는 점이 까다롭기 때문에 최근의 동작 인식 연구 동향은 시공간상의 관심 점과 같이 저수준의, 더 추상적인 외형특징을 이용하는 방식이 일반화되었다. 하지만 몇 년 사이 자세 예측 기술이 발전하면서 자세 기반 방식에 대한 시각을 재정립하는 것이 필요하다. 본 연구는 외형 기반 방식에서 저수준의 외형특징만으로 분류기를 학습시키는 것이 충분한지에 대한 문제를 제기하면서 자세 예측을 이용한 효과적인 자세기반 동작인식 방식을 제안하였다. 이를 위해 다양한 감정을 표현하는 동작 시나리오를 대상으로 외형 기반, 자세 기반 특징 및 두 가지 특징을 조합한 방식을 비교하였다. 실험 결과, 자세 예측을 이용한 자세 기반 방식이 저수준의 외형특징을 이용한 방식보다 감정 동작 분류 및 인식 성능이 더 나았으며 잡음 때문에 심하게 망가진 이미지의 감정 동작 인식에도 자세 예측을 이용한 자세기반의 방식이 효과적이었다.

Keywords

References

  1. L. Campbell, A. Bobick. "Recognition of human body motion using phase space constraints". ICCV(International Conference on Computer Vision), 1995, pp.624-630.
  2. D. Gavrila, L. Davis. "Towards 3-d model-based tracking and recognition of human movement: a multi-view approach". Int. Workshop on Face and Gesture Rec., 1995, pp.272-277.
  3. I. Laptev, M. Marszałek, C. Schmid, B. Rozenfeld. "Learning realistic human actions from movies". CVPR(Computer Vision and Pattern Recognition), 2008, pp.1-8.
  4. M. D. Rodriguez, J. Ahmed, M. Shah. "Action mach: A spatio-temporal maximum average correlation height filter for action recognition". CVPR(Computer Vision and Pattern Recognition), 2008, pp.58-65.
  5. J.G. Liu, J.B. Luo, M. Shah. "Recognizing realistic actions from videos in the wild". CVPR(Computer Vision and Pattern Recognition), pp.1996-2003, 2009.
  6. P. Dollar, V. Rabaud, G. Cottrell, S. Belongie. "Behavior recognition via sparse spatio-temporal features". Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), 2005, pp.65-72.
  7. A. Efros, A. Berg, G. Mori, J. Malik. "Recognizing action at a distance". ICCV(International Conference on Computer Vision), Vol.2, 2003, pp.726-733.
  8. J. Gall, V. Lempitsky. "Class-specific hough forests for object detection". CVPR(Computer Vision and Pattern Recognition), 2009, pp.1022-1029.
  9. J. Sivic. "Efficient visual search of videos cast as text retrieval". IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.31, No.4. pp.591-605, 2009. https://doi.org/10.1109/TPAMI.2008.111
  10. K. Schindler, L. Van Gool. "Action snippets: How many frames does human action recognition require?", CVPR(Computer Vision and Pattern Recognition), 2008, pp.1-8,
  11. J. Bandouch, M. Beetz. "Tracking humans interacting with the environment using efficient hierarchical sampling and layered observation models, Int. Workshop on Human- Computer Interaction", 2009, pp.2040-2047,
  12. J. Gall, A. Yao, L. Van Gool. "2d action recognition serves 3d human pose estimation", ECCV(European Conference on Computer Vision), 2010, pp.425-428.
  13. G. Taylor, L. Sigal, D. Fleet, G. "Hinton. Dynamical binary latent variable models for 3d human pose trackin". CVPR(Computer Vision and Pattern Recognition), 2010, pp.631-638.
  14. L. Breiman. "Random Forests. Machine Learning", Vol.45, No.1, pp.5-32, 2001. https://doi.org/10.1023/A:1010933404324
  15. L. Kovar, M. Gleicher. "Automated extraction and parameterization of motions in large data sets". ACM Trans. Graph., Vol.23, pp.559-568, 2004. https://doi.org/10.1145/1015706.1015760
  16. C. Rao, A. Yilmaz, M. Shah. "View-invariant representation and recognition of actions". IJCV(International Journal of Computer Vision), Vol.50, No.2, 2002, pp.203-226. https://doi.org/10.1023/A:1020350100748
  17. D. Weinland, E. Boyer, R. Ronfard. "Action recognition from arbitrary views using 3d exemplars". ICCV(International Conference on Computer Vision), 2007, pp.1-7.
  18. D. Weinland, E. Boyer. "Action recognition using exemplar-based embedding", CVPR(Computer Vision and Pattern Recognition), 2008, pp.1-7.
  19. R. Li, T.P. Tian, S. Sclaroff, M. H. Yang. "3d human motion tracking with a coordinated mixture of factor analyzers". IJCV(International Journal of Computer Vision), Vol.87, 2010, pp.170-190. https://doi.org/10.1007/s11263-009-0283-4
  20. C. Thurau, V. Hlavac. "Pose primitive based human action recognition in videos or still images". CVPR(Computer Vision and Pattern Recognition), 2008, pp.1-8.
  21. A. Kläser, M. Marszałek, C. Schmid, A. Zisserman. "Human focused action localization in video". Int. Workshop on Sign, Gesture, and Activity (SGA), 2010.
  22. Z. Zeng, M. Pantic, G. Roisman, T. Huang, "A survey of affect recognition methods: Audio, visual, and spontaneous expressions". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.31, No.1, pp.39-48, 2009. https://doi.org/10.1109/TPAMI.2008.52
  23. Kim Jin Ok, "A Study on Visual Perception based Emotion Recogniton using Body-Activity Posture", The KIPS Transactions: Part B, Vol.18, No.5, pp.305-314, 2010.
  24. M. Muller, T. Roder, M. Clausen. "Efficient content-based retrieval of motion capture data". ACM Trans. Graph., Vol.24, pp.677-685, 2005. https://doi.org/10.1145/1073204.1073247

Cited by

  1. A Bio-Inspired Modeling of Visual Information Processing for Action Recognition vol.3, pp.8, 2014, https://doi.org/10.3745/KTSDE.2014.3.8.299