DOI QR코드

DOI QR Code

실감형 화상 회의를 위해 깊이정보 혼합을 사용한 시선 맞춤 시스템

Eye Contact System Using Depth Fusion for Immersive Videoconferencing

  • 장우석 (광주과학기술원 정보통신공학부) ;
  • 이미숙 (한국전자통신연구원) ;
  • 호요성 (광주과학기술원 정보통신공학부)
  • Jang, Woo-Seok (Gwangju Institute of Science and Technology, School of Information and Communications) ;
  • Lee, Mi Suk (Electronics and Telecommunications Research Institute) ;
  • Ho, Yo-Sung (Gwangju Institute of Science and Technology, School of Information and Communications)
  • 투고 : 2014.11.11
  • 심사 : 2015.03.26
  • 발행 : 2015.07.25

초록

본 논문에서는 실감형 원격 영상회의를 위한 시스템을 제안한다. 원격 영상회의에서 카메라는 보통 디스플레이의 중앙이 아닌 측면에 설치가 된다. 이는 시선 불일치를 만들고, 사용자들의 몰입도를 떨어뜨린다. 따라서 실감형 영상회의에 있어서 시선 맞춤은 중요한 부분을 차지한다. 제안하는 방법은 스테레오 카메라와 깊이 카메라를 사용하여 시선 맞춤을 시도한다. 깊이 카메라는 비교적 적은 비용으로 효율적으로 깊이 정보를 생성할 수 있는 키넥트 카메라를 선택하였다. 하지만 키넥트 카메라는 비용적인 장점에도 불구하고 단독으로 사용하기에는 내제하는 단점이 많다. 따라서 스테레오 카메라를 더하여 각 깊이 센서 간의 단점을 보완하는 방법을 개발하였고, 이는 각 깊이 정보 간의 혼합 및 정제 과정을 통해서 실현된다. 시선 맞춤 영상 생성은 후처리를 통한 보완된 깊이 정보를 이용하여 3차원 워핑 기술을 이용하여 구현된다. 실험결과를 보면 제안한 시스템이 자연스러운 시선 맞춤 영상을 제공하는 것을 알 수 있다.

In this paper, we propose a gaze correction method for realistic video teleconferencing. Typically, cameras used in teleconferencing are installed at the side of the display monitor, but not in the center of the monitor. This system makes it too difficult for users to contact each eyes. Therefore, eys contact is the most important in the immersive videoconferencing. In the proposed method, we use the stereo camera and the depth camera to correct the eye contact. The depth camera is the kinect camera, which is the relatively cheap price, and estimate the depth information efficiently. However, the kinect camera has some inherent disadvantages. Therefore, we fuse the kinect camera with stereo camera to compensate the disadvantages of the kinect camera. Consecutively, for the gaze-corrected image, view synthesis is performed by 3D warping according to the depth information. Experimental results verify that the proposed system is effective in generating natural gaze-corrected images.

키워드

참고문헌

  1. S. B. Lee, I. Y. Shin, and Y. S. Ho, "Gaze-corrected view generation using stereo camera system for immersive videoconferencing," IEEE Transactions on Consumer Electronics, vol. 57, no. 3, pp. 1033-1040, 2011. https://doi.org/10.1109/TCE.2011.6018852
  2. C. Kuster, T. Popa, J. C. Bazin et al., "Gaze correction for home video conferencing," ACM Transaction on Graphics, vol. 31, no. 6, pp. 1-6, 2012.
  3. J. Zhu, R. Yang, and X. Xiang, "Eye contact in video conference via fusion of time-of-flight depth sensor and stereo," 3D Research, vol. 2, no. 3, pp. 1-10, 2011.
  4. Video and Requirement Group, "Call for Proposals on 3D Video Coding Technology " N12036, ISO/IEC JTC1/SC29/WG11, 2011.
  5. L. Zhang, and T. Wa James, "Stereoscopic image generation based on depth images for 3D TV," IEEE Transactions on Broadcasting, vol. 51, no. 2, pp. 191-199, 2005. https://doi.org/10.1109/TBC.2005.846190
  6. C. Fehn, and R. S. Pastoor, "Interactive 3-DTV-Concepts and Key Technologies," Proceedings of the IEEE, vol. 94, no. 3, pp. 524-538, 2006. https://doi.org/10.1109/JPROC.2006.870688
  7. E. K. Lee, and Y. S. Ho, "Generation of multi-view video using a fusion camera system for 3D displays," IEEE Transactions on Consumer Electronics, vol. 56, no. 4, pp. 2797-2805, 2010. https://doi.org/10.1109/TCE.2010.5681171
  8. W. S. Jang, and Y. S. Ho, "Efficient disparity map estimation using occlusion handling for various 3D multimedia applications," IEEE Transactions on Consumer Electronics, vol. 57, no. 4, pp. 1937-1943, 2011. https://doi.org/10.1109/TCE.2011.6131174
  9. D. Scharstein, R. Szeliski, and R. Zabih, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms." pp. 131-140.
  10. S. Y. Kim, S. B. Lee, and Y. S. Ho, "Three-dimensional natural video system based on layered representation of depth maps," IEEE Transactions on Consumer Electronics, vol. 52, no. 3, pp. 1035-1042, 2006. https://doi.org/10.1109/TCE.2006.1706504
  11. L. Xia, C. C. Chen, and J. K. Aggarwal, "Human detection using depth information by Kinect." pp. 15-22.
  12. Z. Zhang, "A flexible new technique for camera calibration," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330-1334, 2000. https://doi.org/10.1109/34.888718
  13. Y. S. Kang, and Y. S. Ho, "An efficient image rectification method for parallel multi-camera arrangement," IEEE Transactions on Consumer Electronics, vol. 57, no. 3, pp. 1041-1048, 2011. https://doi.org/10.1109/TCE.2011.6018853
  14. W. S. Jang, and Y. S. Ho, "Efficient depth map generation with occlusion handling for various camera arrays," Signal, Image and Video Processing, vol. 8, no. 2, pp. 287-297, 2014/02/01, 2014. https://doi.org/10.1007/s11760-013-0550-2
  15. Y. S. Kang, and Y. S. Ho, "High-quality multi-view depth generation using multiple color and depth cameras." pp. 1405-1410.
  16. J. Kopf, M. F. Cohen, D. Lischinski et al., "Joint bilateral upsampling," ACM Trans. Graph., vol. 26, no. 3, pp. 96, 2007. https://doi.org/10.1145/1276377.1276497
  17. U. Fecker, M. Barkowsky, and A. Kaup, "Improving the Prediction Efficiency for Multi-View Video Coding Using Histogram Matching." pp. 2-17.
  18. R. Hartley, and A. Zisserman, Multiple View Geometry in Computer Vision, Second ed.: Cambridge University Press, 2004.