DOI QR코드

DOI QR Code

Kernel-Based Video Frame Interpolation Techniques Using Feature Map Differencing

특성맵 차분을 활용한 커널 기반 비디오 프레임 보간 기법

  • 서동혁 (국민대학교 AI빅데이터융합경영학과) ;
  • 고민성 (국민대학교 AI빅데이터융합경영학과) ;
  • 이승학 (국민대학교 AI빅데이터융합경영학과) ;
  • 박종혁 (국민대학교 AI빅데이터융합경영학과)
  • Received : 2023.11.06
  • Accepted : 2023.12.21
  • Published : 2024.01.31

Abstract

Video frame interpolation is an important technique used in the field of video and media, as it increases the continuity of motion and enables smooth playback of videos. In the study of video frame interpolation using deep learning, Kernel Based Method captures local changes well, but has limitations in handling global changes. In this paper, we propose a new U-Net structure that applies feature map differentiation and two directions to focus on capturing major changes to generate intermediate frames more accurately while reducing the number of parameters. Experimental results show that the proposed structure outperforms the existing model by up to 0.3 in PSNR with about 61% fewer parameters on common datasets such as Vimeo, Middle-burry, and a new YouTube dataset. Code is available at https://github.com/Go-MinSeong/SF-AdaCoF.

비디오 프레임 보간(Video Frame Interpolation)은 움직임의 연속성을 증가시켜 영상을 부드럽게 재생할 수 있어 영상, 미디어 분야에서 사용되는 중요한 기술이다. 딥러닝 기반 비디오 프레임 보간 연구에서 널리 사용되는 방법 중 하나인 커널 기반 방법(Kernel Based Method)의 경우, 지역적인 변화를 잘 포착하지만 전체적인 변화를 처리하는 데 한계가 있었다. 이에 본 논문에서는 주요 변화 포착에 집중하기 위한 특성맵 차분, Two Direction을 적용한 새로운 U-Net 구조를 통해 파라미터 수를 줄이면서 중간 프레임을 보다 정확하게 생성하고자 한다. 실험 결과 제안한 구조가 기존보다 Vimeo, Middle-burry 등의 일반적인 데이터셋과 새로운 YouTube 데이터셋에서 기존 모델보다 약 61% 더 적은 파라미터로 PSNR 수치가 최대 0.3 우수한 성능을 달성하였다. 본 논문에서 사용한 코드는 https://github.com/Go-MinSeong/SF-AdaCoF에서 확인 가능하다.

Keywords

References

  1. J. Huh, K. Yoon, S. Kim, and J. Joung, "Research trends in deep learning-based video frame interpolation techniques," The Korean Institute of Broadcast and Media Engineers : Broadcast and Media Maganize, Vol.18, No.2, pp.51-61, 2022. 
  2. J. Dong, K. Ota, and M. Dong, "Video frame interpolation: A comprehensive survey," ACM Transactions on Multimedia Computing, Communications, and Applications, Vol.19, No.78, pp.1-31, 2023.  https://doi.org/10.1145/3556544
  3. H. Lee, T. Kim, T.-y. Chung, D. Pak, Y. Ban, and S. Lee, "AdaCoF: Adaptive collaboration of flows for video frame interpolation," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 
  4. J. Dai et al., "Deformable convolutional networks," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 
  5. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional networks for biomedical image segmentation," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015. 
  6. S. Niklaus, L. Mai, and F. Liu, "Video frame interpolation via adaptive convolution," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017. 
  7. K. M. Briedis, A. Djelouah, R. Ortiz, and M.Gross, "Kernel-based frame interpolation for spatio-temporallyadaptive rendering," SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings, 2023. 
  8. Z. Teed and J. Deng, "RAFT: Recurrent all-pairs field transforms for optical flow," ECCV(European Conference on Computer Vision) 2020, pp.402-419, 2020. 
  9. Z. Huang et al., "FlowFormer: A transformer architecture for optical flow," ECCV(European Conference on Computer Vision) 2022, pp.668-685, 2022. 
  10. G. Zhang, Y. Zhu, H. Wang, Y. Chen, G. Wu, and L. Wang, "Extracting motion and appearance via inter-frame attention for efficient video frame interpolation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.5682-5692, 2023. 
  11. N. Singla, "Motion detection based on frame difference method," International Journal of Information & Computation Technology, Vol.4, No.15, pp.1559-1565, 2014. 
  12. S. Niklaus, L. Mai, and F. Liu. "Video frame interpolation via adaptive separable convolution," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017. 
  13. O. Kopuklu, M. Babaee, and G. Rigoll, "Convolutional neural networks with layer reuse," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. 
  14. T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, "Video Enhancement with Task-Oriented Flow," Vimeo 90k [Data set], http://toflow.csail.mit.edu 
  15. D. Scharstein et al., "High-resolution stereo datasets with subpixel-accurate ground truth," In German Conference on Pattern Recognition (GCPR 2014), Munster, Germany, September, 2014. 
  16. F. Perazzi, J. Pont-Tuset, B. McWilliams, L. van Gool, M. Gross and A. Sorkine-Hornung, "A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation," [Data set], https://davischallenge.org 
  17. K. Soomro, A. R. Zamir, and M. Shah, "UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild," [Data set], https://www.crcv.ucf.edu/data/UCF101.php 
  18. D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," Published as a Conference Paper at the 3rd International Conference for Learning Representations, San Diego, 2015. 
  19. A. Hore and D. Ziou, "Image quality metrics: PSNR vs. SSIM," 2010 20th International Conference on Pattern Recognition, 2010. 
  20. J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual losses for real-time style transfer and super-resolution," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016.