DOI QR코드

DOI QR Code

Key Frame Detection Using Contrastive Learning

대조적 학습을 활용한 주요 프레임 검출 방법

  • Kyoungtae, Park (Department of Electrical and Electronics Engineering, Konkuk University) ;
  • Wonjun, Kim (Department of Electrical and Electronics Engineering, Konkuk University) ;
  • Ryong, Lee (Korea Institute of Science and Technology Information) ;
  • Rae-young, Lee (Korea Institute of Science and Technology Information) ;
  • Myung-Seok, Choi (Korea Institute of Science and Technology Information)
  • 박경태 (건국대학교 전기전자공학부) ;
  • 김원준 (건국대학교 전기전자공학부) ;
  • 이용 (한국과학기술정보연구원) ;
  • 장래영 (한국과학기술정보연구원) ;
  • 최명석 (한국과학기술정보연구원)
  • Received : 2022.08.22
  • Accepted : 2022.10.24
  • Published : 2022.11.30

Abstract

Research for video key frame detection has been actively conducted in the fields of computer vision. Recently with the advances on deep learning techniques, performance of key frame detection has been improved, but the various type of video content and complicated background are still a problem for efficient learning. In this paper, we propose a novel method for key frame detection, witch utilizes contrastive learning and memory bank module. The proposed method trains the feature extracting network based on the difference between neighboring frames and frames from separate videos. Founded on the contrastive learning, the method saves and updates key frames in the memory bank, witch efficiently reduce redundancy from the video. Experimental results on video dataset show the effectiveness of the proposed method for key frame detection.

비디오 영상 내 주요 프레임(Key Frame) 검출은 컴퓨터 비전 분야에서 꾸준히 연구되고 있는 분야 중 하나이다. 최근 심층학습(Deep Learning) 기술의 발전으로 비디오 영상에서의 주요 프레임 검출 성능이 향상 되었으나, 다양한 종류의 영상 콘텐츠 및 복잡한 배경으로 인해 여전히 효과적인 학습이 어려운 문제점이 있다. 본 논문에서는 대조적 학습(Contrastive Learning)과 메모리 뱅크(Memory Bank)를 통해 영상의 주요 프레임을 검출하는 새로운 방법을 제안한다. 제안하는 방법은 입력 프레임과 같은 영상 내 이웃하는 프레임 간 차이와 다른 영상 내 프레임과의 차이를 기반으로 특징 추출 신경망을 학습한다. 이와 같은 대조적 학습을 통해 메모리 뱅크에 주요 프레임을 저장 및 갱신하여 영상의 중복성을 효과적으로 제거한다. 비디오 영상 데이터셋에서의 실험 결과를 통해 제안하는 방법의 성능을 검증하였다.

Keywords

Acknowledgement

본 연구는 한국과학기술정보연구원(KISTI) 'Data/AI 기반 문제해결 체계 구축(K-22-L04-C05-S01)'사업 지원으로 수행되었습니다.

References

  1. C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang, "Video summarization and scene detection by graph modeling," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 2, pp. 296-305, Feb. 2005. doi: https://doi.org/10.1109/TCSVT.2004.841694
  2. P. Mundur, Y. Rao, and Y. Yesha, "Keyframe-based video summarization using delaunay clustering," International Journal on Digital Libraries, vol. 6, no. 2, pp. 219-232. Apr. 2006. doi: https://doi.org/10.1007/s00799-005-0129-9
  3. S. K. Kuanar, R. Panda, and A. S. Chowdhury, "Video key frame extraction through dynamic delaunay clustering with a structural constraint," J. Vis. Commun. Image Represent., vol. 24, no. 7, pp. 1212-1227, Apr. 2013. doi: https://doi.org/10.1016/j.jvcir.2013.08.003
  4. M. Furini, F. Geraci, M. Montangero, and M. Pellegrini, "STIMO: StIll and moving video storyboard for the web scenario," Multimed. Tools Appl., vol. 46, no. 1, pp. 47-69, Dec. 2010. doi: https://doi.org/10.1007/s11042-009-0307-7
  5. J. Almeida, N. J. Leite, and R. D. S. Torres, "VISON: VIdeo Summarization for Online applications," Pattern Recognit. Lett., vol. 33, no. 4, pp. 397-409, Sep. 2012. doi: https://doi.org/10.1016/j.patrec.2011.08.007
  6. B. L. Yeo, and B. Liu, "Rapid scene analysis on compressed video." IEEE Transactions on circuits and systems for video technology, vol 5, no. 6, pp. 533-544. Dec. 1995. doi: https://doi.org/10.1109/76.475896
  7. G. Guan, Z. Wang, S. Lu, J.D. Deng, and D.D. Feng, "Keypoint-based keyframe selection", IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 4, pp. 729-734, Apr. 2013. doi: https://doi.org/10.1109/TCSVT.2012.2214871
  8. G. LoweDavid, "Distinctive image features from scale-invariant keypoints." International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004. doi: https://doi.org/10.1023/B:VISI.0000029664.99615.94
  9. K. Zhang, W. L. Chao, F. Sha, and K. Grauman, "Video summarization with long short-term memory," in Proc. Eur. Conf. Comput. Vis., pp. 766-782, Oct. 2016. doi: https://doi.org/10.1007/978-3-319-46478-7_47
  10. B. Mahasseni, M. Lam, and S. Todorovic, "Unsupervised video summarization with adversarial lstm networks," in Proc. IEEE Comput. Vis. Pattern Recognit., pp. 202-211, Jul. 2017. doi: https://doi.org/10.1109/CVPR.2017.318
  11. B. Zhao, X. Li, and X. Lu, "HSA-RNN: Hierarchical structure-adaptive RNN for video summarization," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 7405-7414, Jun. 2018. doi: https://doi.org/10.1109/CVPR.2018.00773
  12. S. E. F. De Avila, A. P. B. Lopes, A. Luz Jr, and A. de Albuquerque Araujo, "VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method." Pattern Recognit. Lett., vol. 32, no. 1, pp. 56-68, Sep. 2011 doi: https://doi.org/10.1016/j.patrec.2010.08.004
  13. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proc. Advances in Neural Information Processing Systems, pp. 1097-110, Dec. 2017.
  14. K. He, Z. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition." in Proc. IEEE Comput. Vis. Pattern Recognit., pp. 770-778, Jun. 2016. doi: https://doi.org/10.1109/CVPR.2016.90
  15. F. Schroff, D. Kalenichenko, D. and J. Philbin, "Facenet: A unified embedding for face recognition and clustering." In Proc. IEEE Comput. Vis. Pattern Recognit., pp. 815-823, Jun. 2015. doi: https://doi.org/10.1109/CVPR.2015.7298682
  16. N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, and T. Huang, "Youtube-vos: Sequence-to-sequence video object segmentation." in Proc. Eur. Conf. Comput. Vis., pp. 585-601, Sep. 2018. doi: https://doi.org/10.1007/978-3-030-01228-1_36
  17. M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool, (2014, September). "Creating summaries from user videos." in Proc. Eur. Conf. Comput. Vis., pp. 505-520, Sep. 2014 doi: https://doi.org/10.1007/978-3-319-10584-0_33
  18. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. Devito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, "Automatic differentiation in pytorch". in Proc. Conference and Workshop on Neural Information Processing Systems, pp. 1-4, Dec. 2017. https://openreview.net/pdf?id=BJJsrmfCZ