Subdivision Ensemble Model for Highlight Detection

하이라이트 검출을 위한 구간 분할 앙상블 모델

  • Lee, Hansol (Dept. of Media IT Engineering, Graduate School, Seoul National University of Science and Technology) ;
  • Lee, Gyemin (Dept. of Media IT Engineering, Graduate School, Seoul National University of Science and Technology)
  • 이한솔 (서울과학기술대학교 일반대학원 미디어IT공학과) ;
  • 이계민 (서울과학기술대학교 일반대학원 미디어IT공학과)
  • Received : 2020.06.01
  • Accepted : 2020.07.23
  • Published : 2020.07.30


Automatically predicting video highlight is an important task for media industry and streaming platform providers to save time and cost of manual video editing process. We propose a new ensemble model that combines multiple highlight detectors with each focusing on different parts of highlight events. Therefore, our model can capture more information-rich sections of events. Furthermore, the proposed model can extract improved features for highlight detection particularly when the train video set is small. We evaluate our model on e-sports and baseball videos.

하이라이트를 자동으로 예측 하는 문제는 영상을 사람이 직접 편집하는 시간과 비용 문제를 해결하기 위해 필요한 기술이다. 본 논문에서는 하이라이트 구간 내에서 하이라이트 판단 여부에 영향을 주는 특정 부분에 집중하기 위해 앙상블 모델을 제안한다. 우리의 모델은 하나의 단일 모델만으로는 충분히 학습하기 어려운 중요한 정보를 앙상블을 통해 더 많은 유용한 특징들을 얻을 수 있다. 앙상블을 이루는 단일모델들은 오디오와 이미지 정보를 결합하여 다양한 영상의 특징들을 추출한다. 직접 수집한 e스포츠 경기 영상과 야구 경기 영상을 통해 하이라이트 예측 성능이 개선됨을 확인한다.



  1. K. Zhang, WL. Chao, F. Sha, and K. Grauman, "Video Summarization with Long Short-term Memory," European Conference on Computer Vision, Amsterdam, Netherlands, pp. 766-782, 2016,
  2. K. Zhou, Y. Qiao, and Tao Xiang, "Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward," In Thirty-Second AAAI Conference on Artificial Intelligence, pp. 7582-7589, 2018.
  3. B. Zhao, X. Li, and X. Lu, "HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization," The IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 7405-7414, 2018,
  4. B. Mahasseni, M. Lam, and S. Todorovic, "Unsupervised Video Summarization with Adversarial LSTM Networks," In CVPR, pp. 2982-2991, 2017,
  5. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Nets," In NIPS, pp. 2672-2680, 2014.
  6. K. Zhang, K. Grauman, and F. Sha, "Retrospective Encoders for Video Summarization," In ECCV, pp. 383-399, 2018,
  7. H. Lee, G. Lee, "Summarizing Long-Length Videos with GAN-Enhanced Audio/Visual Features," In ICCV workshop, 2019,
  8. H. Lee, G. Lee, "Video Highlight Prediction Using GAN and Multiple Time-Interval Information of Audio and Image," Journal of Broadcast Engineering, Vol. 25, No. 2, pp. 143-150, 2020,
  9. E. Kim, G. Lee, "Highlight Detection in Personal Broadcasting by Analysing Chat Traffic : Game Contests as a Test Case," Journal of Broadcast Engineering, Vol. 23, No. 2, pp. 218-226, 2018,
  10. E. Kim, G. Lee, "Video Highlight Prediction Using Multiple Time-Interval Information of Chat and Audio," Journal of Broadcast Engineering, Vol. 24, No. 4, pp. 553-563, 2019,
  11. Twitch, (accessed May. 20, 2020).
  12. Kakao TV, (accessed May. 20, 2020).
  13. A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks," In NIPS, 2012,
  14. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," In CVPR, pp. 770-778, 2016,
  15. Naver-sports, (accepted May. 20, 2020).
  16. OGN, (accepted May. 20, 2020).