Key Frame Detection Using Contrastive Learning

Kyoungtae, Park;Wonjun, Kim;Ryong, Lee;Rae-young, Lee;Myung-Seok, Choi;

doi:10.5909/JBE.2022.27.6.897

Journal of Broadcast Engineering (방송공학회논문지)

Volume 27 Issue 6
/
Pages.897-905
/
2022
/
1226-7953(pISSN)
/
2287-9137(eISSN)

The Korean Institute of Broadcast and Media Engineers (한국방송∙미디어공학회)

DOI QR Code

Key Frame Detection Using Contrastive Learning

대조적 학습을 활용한 주요 프레임 검출 방법

Kyoungtae, Park (Department of Electrical and Electronics Engineering, Konkuk University) ;
Wonjun, Kim (Department of Electrical and Electronics Engineering, Konkuk University) ;
Ryong, Lee (Korea Institute of Science and Technology Information) ;
Rae-young, Lee (Korea Institute of Science and Technology Information) ;
Myung-Seok, Choi (Korea Institute of Science and Technology Information)

박경태 (건국대학교 전기전자공학부) ;
김원준 (건국대학교 전기전자공학부) ;
이용 (한국과학기술정보연구원) ;
장래영 (한국과학기술정보연구원) ;
최명석 (한국과학기술정보연구원)

Received : 2022.08.22
Accepted : 2022.10.24
Published : 2022.11.30

https://doi.org/10.5909/JBE.2022.27.6.897 Citation PDF KSCI KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

Research for video key frame detection has been actively conducted in the fields of computer vision. Recently with the advances on deep learning techniques, performance of key frame detection has been improved, but the various type of video content and complicated background are still a problem for efficient learning. In this paper, we propose a novel method for key frame detection, witch utilizes contrastive learning and memory bank module. The proposed method trains the feature extracting network based on the difference between neighboring frames and frames from separate videos. Founded on the contrastive learning, the method saves and updates key frames in the memory bank, witch efficiently reduce redundancy from the video. Experimental results on video dataset show the effectiveness of the proposed method for key frame detection.

비디오 영상 내 주요 프레임(Key Frame) 검출은 컴퓨터 비전 분야에서 꾸준히 연구되고 있는 분야 중 하나이다. 최근 심층학습(Deep Learning) 기술의 발전으로 비디오 영상에서의 주요 프레임 검출 성능이 향상 되었으나, 다양한 종류의 영상 콘텐츠 및 복잡한 배경으로 인해 여전히 효과적인 학습이 어려운 문제점이 있다. 본 논문에서는 대조적 학습(Contrastive Learning)과 메모리 뱅크(Memory Bank)를 통해 영상의 주요 프레임을 검출하는 새로운 방법을 제안한다. 제안하는 방법은 입력 프레임과 같은 영상 내 이웃하는 프레임 간 차이와 다른 영상 내 프레임과의 차이를 기반으로 특징 추출 신경망을 학습한다. 이와 같은 대조적 학습을 통해 메모리 뱅크에 주요 프레임을 저장 및 갱신하여 영상의 중복성을 효과적으로 제거한다. 비디오 영상 데이터셋에서의 실험 결과를 통해 제안하는 방법의 성능을 검증하였다.

Keywords

Acknowledgement

본 연구는 한국과학기술정보연구원(KISTI) 'Data/AI 기반 문제해결 체계 구축(K-22-L04-C05-S01)'사업 지원으로 수행되었습니다.

References

C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang, "Video summarization and scene detection by graph modeling," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 2, pp. 296-305, Feb. 2005. doi: https://doi.org/10.1109/TCSVT.2004.841694
P. Mundur, Y. Rao, and Y. Yesha, "Keyframe-based video summarization using delaunay clustering," International Journal on Digital Libraries, vol. 6, no. 2, pp. 219-232. Apr. 2006. doi: https://doi.org/10.1007/s00799-005-0129-9
S. K. Kuanar, R. Panda, and A. S. Chowdhury, "Video key frame extraction through dynamic delaunay clustering with a structural constraint," J. Vis. Commun. Image Represent., vol. 24, no. 7, pp. 1212-1227, Apr. 2013. doi: https://doi.org/10.1016/j.jvcir.2013.08.003
M. Furini, F. Geraci, M. Montangero, and M. Pellegrini, "STIMO: StIll and moving video storyboard for the web scenario," Multimed. Tools Appl., vol. 46, no. 1, pp. 47-69, Dec. 2010. doi: https://doi.org/10.1007/s11042-009-0307-7
J. Almeida, N. J. Leite, and R. D. S. Torres, "VISON: VIdeo Summarization for Online applications," Pattern Recognit. Lett., vol. 33, no. 4, pp. 397-409, Sep. 2012. doi: https://doi.org/10.1016/j.patrec.2011.08.007
B. L. Yeo, and B. Liu, "Rapid scene analysis on compressed video." IEEE Transactions on circuits and systems for video technology, vol 5, no. 6, pp. 533-544. Dec. 1995. doi: https://doi.org/10.1109/76.475896
G. Guan, Z. Wang, S. Lu, J.D. Deng, and D.D. Feng, "Keypoint-based keyframe selection", IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 4, pp. 729-734, Apr. 2013. doi: https://doi.org/10.1109/TCSVT.2012.2214871
G. LoweDavid, "Distinctive image features from scale-invariant keypoints." International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004. doi: https://doi.org/10.1023/B:VISI.0000029664.99615.94
K. Zhang, W. L. Chao, F. Sha, and K. Grauman, "Video summarization with long short-term memory," in Proc. Eur. Conf. Comput. Vis., pp. 766-782, Oct. 2016. doi: https://doi.org/10.1007/978-3-319-46478-7_47
B. Mahasseni, M. Lam, and S. Todorovic, "Unsupervised video summarization with adversarial lstm networks," in Proc. IEEE Comput. Vis. Pattern Recognit., pp. 202-211, Jul. 2017. doi: https://doi.org/10.1109/CVPR.2017.318
B. Zhao, X. Li, and X. Lu, "HSA-RNN: Hierarchical structure-adaptive RNN for video summarization," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 7405-7414, Jun. 2018. doi: https://doi.org/10.1109/CVPR.2018.00773
S. E. F. De Avila, A. P. B. Lopes, A. Luz Jr, and A. de Albuquerque Araujo, "VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method." Pattern Recognit. Lett., vol. 32, no. 1, pp. 56-68, Sep. 2011 doi: https://doi.org/10.1016/j.patrec.2010.08.004
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proc. Advances in Neural Information Processing Systems, pp. 1097-110, Dec. 2017.
K. He, Z. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition." in Proc. IEEE Comput. Vis. Pattern Recognit., pp. 770-778, Jun. 2016. doi: https://doi.org/10.1109/CVPR.2016.90
F. Schroff, D. Kalenichenko, D. and J. Philbin, "Facenet: A unified embedding for face recognition and clustering." In Proc. IEEE Comput. Vis. Pattern Recognit., pp. 815-823, Jun. 2015. doi: https://doi.org/10.1109/CVPR.2015.7298682
N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, and T. Huang, "Youtube-vos: Sequence-to-sequence video object segmentation." in Proc. Eur. Conf. Comput. Vis., pp. 585-601, Sep. 2018. doi: https://doi.org/10.1007/978-3-030-01228-1_36
M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool, (2014, September). "Creating summaries from user videos." in Proc. Eur. Conf. Comput. Vis., pp. 505-520, Sep. 2014 doi: https://doi.org/10.1007/978-3-319-10584-0_33
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. Devito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, "Automatic differentiation in pytorch". in Proc. Conference and Workshop on Neural Information Processing Systems, pp. 1-4, Dec. 2017. https://openreview.net/pdf?id=BJJsrmfCZ

Journal of Broadcast Engineering (방송공학회논문지)

Key Frame Detection Using Contrastive Learning

대조적 학습을 활용한 주요 프레임 검출 방법

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)