Depth Image Poselets via Body Part-based Pose and Gesture Recognition

신체 부분 포즈를 이용한 깊이 영상 포즈렛과 제스처 인식

  • 박재완 (전남대학교 전자컴퓨터공학과) ;
  • 이칠우 (전남대학교 전자컴퓨터공학과)
  • Received : 2015.11.25
  • Accepted : 2016.04.03
  • Published : 2016.06.30

Abstract

In this paper we propose the depth-poselets using body-part-poses and also propose the method to recognize the gesture. Since the gestures are composed of sequential poses, in order to recognize a gesture, it should emphasize to obtain the time series pose. Because of distortion and high degree of freedom, it is difficult to recognize pose correctly. So, in this paper we used partial pose for obtaining a feature of the pose correctly without full-body-pose. In this paper, we define the 16 gestures, a depth image using a learning image was generated based on the defined gestures. The depth poselets that were proposed in this paper consists of principal three-dimensional coordinates of the depth image and its depth image of the body part. In the training process after receiving the input defined gesture by using a depth camera in order to train the gesture, the depth poselets were generated by obtaining 3D joint coordinates. And part-gesture HMM were constructed using the depth poselets. In the testing process after receiving the input test image by using a depth camera in order to test, it extracts foreground and extracts the body part of the input image by comparing depth poselets. And we check part gestures for recognizing gesture by using result of applying HMM. We can recognize the gestures efficiently by using HMM, and the recognition rates could be confirmed about 89%.

본 논문에서는 신체 부분 포즈를 이용한 깊이 영상 포즈렛과 제스처를 인식하는 방법을 제안한다. 제스처는 순차적인 포즈로 구성되어 있기 때문에, 제스처를 인식하기 위해서는 시계열 포즈를 획득하는 것에 중점을 두고 있어야 한다. 하지만 인간의 포즈는 자유도가 높고 왜곡이 많기 때문에 포즈를 정확히 인식하는 것은 쉽지 않은 일이다. 그래서 본 논문에서는 신체의 전신 포즈를 사용하지 않고 포즈 특징을 정확히 얻기 위해 부분 포즈를 사용하였다. 본 논문에서는 16개의 제스처를 정의하였으며, 학습 영상으로 사용하는 깊이 영상 포즈렛은 정의된 제스처를 바탕으로 생성하였다. 본 논문에서 제안하는 깊이 영상 포즈렛은 신체 부분의 깊이 영상과 해당 깊이 영상의 주요 3차원 좌표로 구성하였다. 학습과정에서는 제스처를 학습하기 위하여 깊이 카메라를 이용하여 정의된 제스처를 입력받은 후, 3차원 관절 좌표를 획득하여 깊이 영상 포즈렛이 생성되었다. 그리고 깊이 영상 포즈렛을 이용하여 부분 제스처 HMM을 구성하였다. 실험과정에서는 실험을 위해 깊이 카메라를 이용하여 실험 영상을 입력받은 후, 전경을 추출하고 학습된 제스처에 해당하는 깊이 영상 포즈렛을 비교하여 입력 영상의 신체 부분을 추출한다. 그리고 HMM을 적용하여 얻은 결과를 이용하여 제스처 인식에 필요한 부분 제스처를 확인한다. 부분 제스처를 이용한 HMM을 이용하여 효과적으로 제스처를 인식할 수 있으며, 관절 벡터를 이용한 인식률은 약 89%를 확인할 수 있었다.

Keywords

References

  1. Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005.
  2. Bourdev, Lubomir, and Jitendra Malik. "Poselets: Body part detectors trained using 3d human pose annotations." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.
  3. Kraft, Erwin, and Thomas Brox. "Motion Based Foreground Detection and Poselet Motion Features for Action Recognition." Computer Vision--ACCV 2014. Springer International Publishing, 2015. 350-365
  4. Maji, Subhransu, Lubomir Bourdev, and Jitendra Malik. "Action recognition from a distributed representation of pose and appearance." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
  5. Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. "Efficient matching of pictorial structures." Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on. Vol. 2. IEEE, 2000.
  6. Li, Bo, et al. "Part-based pedestrian detection using grammar model and ABM-HoG features." Vehicular Electronics and Safety (ICVES), 2013 IEEE International Conference on. IEEE, 2013.
  7. Wang, Chunyu, Yizhou Wang, and Alan L. Yuille. "An approach to pose-based action recognition." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.
  8. Yang, Yi, and Deva Ramanan. "Articulated human detection with flexible mixtures of parts." Pattern Analysis and Machine Intelligence, IEEE Transactions on 35.12 (2013): 2878-2890. https://doi.org/10.1109/TPAMI.2012.261
  9. Desai, Chaitanya, and Deva Ramanan. "Detecting actions, poses, and objects with relational phraselets." Computer Vision-ECCV 2012. Springer Berlin Heidelberg, 2012. 158-172.
  10. Wang, Yang, Duan Tran, and Zicheng Liao. "Learning hierarchical poselets for human parsing." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
  11. Liu, Nianjun, et al. "Understanding HMM training for video gesture recognition." TENCON 2004. 2004 IEEE Region 10 Conference. IEEE, 2004.
  12. Elmezain, Mahmoud, et al. "A hidden markov model-based continuous gesture recognition system for hand motion trajectory." Pattern Recognition, 2008. ICPR 2008. 19th International Conference on. IEEE, 2008.
  13. Kita, Kenji, et al. "Automatic acquisition of probabilistic dialogue models." Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on. Vol. 1. IEEE, 1996.
  14. Kumar, S., D. R. Deepti, and Ballapalle Prabhakar. "Face recognition using pseudo-2D ergodic HMM." Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 2. IEEE, 2006.