DOI QR코드

DOI QR Code

Intra Prediction Method by Quadric Surface Modeling for Depth Video

깊이 영상의 이차 곡면 모델링을 통한 화면 내 예측 방법

  • 이동석 (동의대학교 인공지능그랜드ICT연구센터) ;
  • 권순각 (동의대학교 컴퓨터소프트웨어공학과)
  • Received : 2022.02.14
  • Accepted : 2022.04.19
  • Published : 2022.04.30

Abstract

In this paper, we propose an intra-picture prediction method by a quadratic surface modeling method for depth video coding. The pixels of depth video are transformed to 3D coordinates using distance information. A quadratic surface with the smallest error is found by least square method for reference pixels. The reference pixel can be either the upper pixels or the left pixels. In the intra prediction using the quadratic surface, two predcition values are computed for one pixel. Two errors are computed as the square sums of differences between each prediction values and the pixel values of the reference pixels. The pixel sof the block are predicted by the reference pixels and prediction method that they have the lowest error. Comparing with the-state-of-art video coding method, simulation results show that the distortion and the bit rate are improved by up to 5.16% and 5.12%, respectively.

본 논문은 깊이 영상의 부호화를 위해 이차 곡면 모델링 방법을 통한 화면 내 예측 방법을 제안한다. 깊이 영상 내 깊이 화소는 거리 정보를 통해 3차원 좌표로 변환한다. 화면 내 예측을 위한 참조 화소들에 대해 최소자승법을 통해 오차가 제일 작은 이차 곡면을 찾는다. 참조 화소로는 상단의 화소들 또는 좌단의 화소들 중 하나가 될 수 있다. 이차 곡면을 통한 화면 내 예측에서, 한 화소에 대해 두 개의 예측 값이 계산된다. 각각의 참조 화소에 대해 예측 값들과 참조 화소의 차아의 제곱합으로 두 오차 값을 계산한다. 계산된 총 4개의 오차 중 제일 작은 오차를 가지는 참조 화소 선택 방법과 예측 화소 선택 방법이 선택되고, 이를 통해 블록 내 화소를 예측한다. 실험 결과는 최신 영상 부호화 방법과 비교하여 왜곡과 비트율이 각각 최대 5.16%과 5.12% 개선됨을 보인다.

Keywords

Acknowledgement

본 논문은 2021년도 BB21+ 사업으로 지원되었으며, 또한 과학기술정보통신부 및 정보통신기획평가원의 Grand ICT연구센터 지원사업의 연구결과로 수행되었으며 (IITP-2022-2020-0-01791), 또한 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(NRF-2021R1F1A1062131).

References

  1. Aguilar, W. G., Rodriguez, G. A., Alvarez, L., Sandoval, S., Quisaguano, F., and Limaico, A. (2017). Visual SLAM with a RGB-D camera on a quadrotor UAV using on-board processing, International Work-Conference on Artificial Neural Networks, Jun. 14-16, Cadiz, Spain, pp. 596-606.
  2. Bross, B., Wang, Y. K., Ye, Y., Liu, S., Chen, J., Sullivan, G. J., and Ohm, J. R. (2021). Overview of the versatile video coding (VVC) standard and its applications, IEEE Transactions on Circuits and Systems for Video Technology, 31(10), pp. 3736-3764. https://doi.org/10.1109/TCSVT.2021.3101953
  3. Fraunhofer Heinrich Hertz Institute (2022). V VC Test Model, https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM (Accessed on A pr. 14th, 2022)
  4. Fu, J., Miao, D., Yu, W., Wang, S., Lu, Y., and Li, S. (2013). Kinect-like depth data compression, IEEE Transactions on Multimedia, 15(6), pp. 1340-1352. https://doi.org/10.1109/TMM.2013.2247584
  5. Hartley, R. and Zisserman, A. (2000). Multiple View Geometry in Computer Vision, New York, USA, Cambridge University Press.
  6. Jiang, M., Luo, X., Hai, T., Wang, H., Yang, S., Abdalla, A. N. (2019). Visual Object Tracking in RGB-D Data via Genetic Feature Learning, Complexity, 2019(2019), pp. 1-8.
  7. Kwon, S. K., Tamhankar, A., and Rao, K. R. (2006). Overview of H.264/MPEG-4 part 10, Journal of Visual Communication and Image Representation, 17(2), pp. 186-216. https://doi.org/10.1016/j.jvcir.2005.05.010
  8. Lai, K., Bo, L., Ren, X., and Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset, IEEE International Conference on Robotics and Automation, May 9-13, Shanghai, China, pp. 1817-1824.
  9. Lee, D. S. and Kwon, S. K. (2019). Prediction method for depth picture through spherical modeling mode, Journal of Korea Multimedia Society, 22(12), pp. 1368-1375. https://doi.org/10.9717/KMMS.2019.22.12.1368
  10. Lee, D. S., Kim, B. K., and Kwon, S. K. (2021). Efficient depth data coding method based on plane modeling for intra prediction, IEEE Access, 9, pp. 29153-29164. https://doi.org/10.1109/ACCESS.2021.3056687
  11. Liu, S., Lai, P., Tian, D., and Chen, C. W. (2011). New depth coding techniques with utilization of corresponding video, IEEE Transactions on Broadcasting, 57(2), pp. 551-561. https://doi.org/10.1109/TBC.2011.2120750
  12. Nenci, F., Spinello, L., and Stachniss, C. (2014). Effective compression of range data streams for remote robot operations using H.264, IEEE/ RSJ International Conference on Intelligent Robots and Systems, Sep. 14-18, Chicago, USA, pp. 3794-3799.
  13. Ren, C. Y., Prisacariu, V. A., Kahler, O., Reid, I. D., and Murray, D. W. (2017). "Real-time Tracking of Single and Multiple Objects from Depth-colour Imagery Using 3D Signed Distance Functions, International Journal of Computer Vision, 124(1), pp. 80-95. https://doi.org/10.1007/s11263-016-0978-2
  14. Shen, G., Kim, W. S., Ortega, A., Lee, J., and Wey, H. (2010). Edge-aware intra prediction for depth-map coding. IEEE International Conference on Image Processing, Sep. 26-29, Hong Kong, pp. 3393-3396.
  15. Silberman, N., Hoiem, D., Kohli, P,, and Fergus, R. (2012), Indoor segmentation and support inference from rgbd images, European Conference on Computer Vision, Oct. 7-13 Firenze, Italy, pp. 746-760.
  16. Stankiewicz, O., Wegner, K., and Domanski, M. (2013). Nonlinear depth representation for 3D video coding, IEEE International Conference on Image Processing, Sep. 15-18, Melbourne, Australia, pp. 1752-1756.
  17. Sullivan, G. J., Ohm, J., Han, W. J., and Wiegand, T. (2012). Overview of the high efficiency video coding (HEVC) standard, IEEE Transactions on Circuits and Systems for Video Technology, 22(12), pp. 1649-1668. https://doi.org/10.1109/TCSVT.2012.2221191
  18. Sun, Y., Liu, M., and Meng, M. Q. H. (2017). Improving RGB-D SLAM in dynamic environments: A motion removal approach, Robotics and Autonomous Systems, 89, pp. 110-122. https://doi.org/10.1016/j.robot.2016.11.012
  19. Wang, X., Sekercioglu, Y., Drummond, T., Fremont, V., Natalizio, E., and Fantoni, I. (2018). Relative pose based redundancy removal: collaborative RGB-D data transmission in mobile visual sensor networks, Sensors, 18(8), pp. 1-23. https://doi.org/10.1109/JSEN.2017.2772700
  20. Wang, X., Sekercioglu, Y., Drummond, T., Natalizio, E., Fantoni, I., and Fremont, V. (2016). Fast depth video compression for mobile RGB-D sensors, IEEE Transactions on Circuits and Systems for Video Technology, 26(4), pp. 673-686. https://doi.org/10.1109/TCSVT.2015.2416571
  21. Zhao, Y., Carraro, M., Munaro, M., and Menegatti, E. (2017). Robust Multiple Object Tracking in RGB-D Camera Networks, IEEE/RSJ International Conference on Intelligent Robots and Systems, Sep. 24-28, Vancouver, Canada, pp. 6625-6632.