DOI QR코드

DOI QR Code

FBX Format Animation Generation System Combined with Joint Estimation Network using RGB Images

RGB 이미지를 이용한 관절 추정 네트워크와 결합된 FBX 형식 애니메이션 생성 시스템

  • Lee, Yujin (Dept. of IT Media Engineering, The Graduate School, Seoul National University of Science and Technology) ;
  • Kim, Sangjoon (Dept. of Information Technology and Media Engineering, The graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology) ;
  • Park, Gooman (Dept. of Electronic IT Media Engineering, Seoul National University of Science and Technology)
  • 이유진 (서울과학기술대학교 IT미디어공학과) ;
  • 김상준 (서울과학기술대학교 정보통신미디어공학전공) ;
  • 박구만 (서울과학기술대학교 전자IT미디어공학과)
  • Received : 2021.04.12
  • Accepted : 2021.05.14
  • Published : 2021.09.30

Abstract

Recently, in various fields such as games, movies, and animation, content that uses motion capture to build body models and create characters to express in 3D space is increasing. Studies are underway to generate animations using RGB-D cameras to compensate for problems such as the cost of cinematography in how to place joints by attaching markers, but the problem of pose estimation accuracy or equipment cost still exists. Therefore, in this paper, we propose a system that inputs RGB images into a joint estimation network and converts the results into 3D data to create FBX format animations in order to reduce the equipment cost required for animation creation and increase joint estimation accuracy. First, the two-dimensional joint is estimated for the RGB image, and the three-dimensional coordinates of the joint are estimated using this value. The result is converted to a quaternion, rotated, and an animation in FBX format is created. To measure the accuracy of the proposed method, the system operation was verified by comparing the error between the animation generated based on the 3D position of the marker by attaching a marker to the body and the animation generated by the proposed system.

최근 게임, 영화, 애니메이션 다양한 분야에서 모션 캡처를 이용하여 신체 모델을 구축하고 캐릭터를 생성하여 3차원 공간에 표출하는 콘텐츠가 증가하고 있다. 마커를 부착하여 관절의 위치를 측정하는 방법에서 촬영 장비에 대한 비용과 같은 문제를 보완하기 위해 RGB-D 카메라를 이용하여 애니메이션을 생성하는 연구가 진행되고 있지만, 관절 추정 정확도나 장비 비용의 문제가 여전히 존재한다. 이에 본 논문에서는 애니메이션 생성에 필요한 장비 비용을 줄이고 관절 추정 정확도를 높이기 위해 RGB 이미지를 관절 추정 네트워크에 입력하고, 그 결과를 3차원 데이터로 변환하여 FBX 형식 애니메이션으로 생성하는 시스템을 제안한다. 먼저 RGB 이미지에 대한 2차원 관절을 추정하고, 이 값을 이용하여 관절의 3차원 좌표를 추정한다. 그 결과를 쿼터니언으로 변환하여 회전한 후, FBX 형식의 애니메이션을 생성한다. 제안한 방법의 정확도 측정을 위해 신체에 마커를 부착하여 마커의 3차원 위치를 바탕으로 생성한 애니메이션과 제안된 시스템으로 생성한 애니메이션의 오차를 비교하여 시스템 동작을 입증하였다.

Keywords

Acknowledgement

This work was supported by Institute for Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) in 2021(No. 2017-0-00217, Development of Immersive Signage Based on Variable Transparency and Multiple Layers).

References

  1. S. Kim, "Realtime 3D Human Full-Body Convergence Motion Capture using a Kinect Sensor," Journal of Digital Convergence, Vol.14, No.1, pp.189-194, Jan 2016, https://doi.org/10.14400/JDC.2016.14.1.189
  2. J. Jeong, M. Yoon, S. Kim, and G. Park, "Design and production of real-time 3D animation viewer engine based on motion capture," The Institute of Electronics and Information Engineers, 531-535, Jun 2019.
  3. Kinect animation studio, http://marcojrfurtado.github.io/KinectAnimationStudio/index.html
  4. Y. Yang and D. Ramanan, "Articulated Human Detection with Flexible Mixtures of Parts," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.35, No.12, pp.2878-2890, Dec 2013 https://doi.org/10.1109/TPAMI.2012.261
  5. B. Sapp and B. Taskar, "MODEC: Multimodal decomposable models for human pose estimation," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp.3674-3681, 2013.
  6. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, "Joint training of a convolutional network and a graphical model for human pose estimation," Adv. Neural Inf. Process. Syst., Vol.2, pp.1799-1807, Jan 2014.
  7. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, "Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7291-7299, 2017.
  8. J. Martinez, R. Hossain, J. Romero, and J. J. Little, "A Simple Yet Effective Baseline for 3d Human Pose Estimation," Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-October, pp.2659-2668, 2017.
  9. OpenMMD, https://github.com/peterljq/OpenMMD
  10. Kumarapu, Laxman and Prerana Mukherjee. "AnimePose: Multi-person 3D pose estimation and animation." Pattern Recognit. Lett. 147, pp.16-24. 2021. https://doi.org/10.1016/j.patrec.2021.03.028
  11. Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. "Deeper depth prediction with fully convolutional residual networks," In 3D Vision (3DV), 2016 Fourth International Conference on, pp.239-248. IEEE, 2016.
  12. Bo Li, Chunhua Shen, Yuchao Dai, A. van den Hengel and Mingyi He, "Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1119-1127, 2015, doi: 10.1109/CVPR.2015.7298715.
  13. Liu, Fayao, Chunhua Shen and Guosheng Lin. "Deep convolutional neural fields for depth estimation from a single image." 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5162-5170, 2015.
  14. K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016. doi: 10.1109/CVPR.2016.90.
  15. FBX SDK, https://www.autodesk.com/developer-network/platformtechnologies/fbx-sdk-2020-0
  16. MS COCO dataset, https://cocodataset.org/#home
  17. MPII dataset, http://human-pose.mpi-inf.mpg.de/
  18. AI Challenger dataset, http://dataju.cn/Dataju/web/datasetInstanceDetail/440
  19. M. Yoon, Research of FBX generation using deep learning, Master's Thesis of Seoul National University of Science and Technology, Seoul, Korea, 2020.
  20. S. Kim, Y. Lee, and G. Park. "Real-Time Joint Animation Production and Expression System using Deep Learning Model and Kinect Camera." Journal of Broadcast Engineering 26(3), pp.269-282, May 2021. https://doi.org/10.5909/JBE.2021.26.3.269
  21. Human3.6M dataset, http://vision.imar.ro/human3.6m/description.php