Browse > Article
http://dx.doi.org/10.5909/JBE.2019.24.4.553

Video Highlight Prediction Using Multiple Time-Interval Information of Chat and Audio  

Kim, Eunyul (Dept. of Broadcasting.Communication Fusion Program, Graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology)
Lee, Gyemin (Dept. of Broadcasting.Communication Fusion Program, Graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology)
Publication Information
Journal of Broadcast Engineering / v.24, no.4, 2019 , pp. 553-563 More about this Journal
Abstract
As the number of videos uploaded on live streaming platforms rapidly increases, the demand for providing highlight videos is increasing to promote viewer experiences. In this paper, we present novel methods for predicting highlights using chat logs and audio data in videos. The proposed models employ bi-directional LSTMs to understand the contextual flow of a video. We also propose to use the features over various time-intervals to understand the mid-to-long term flows. The proposed Our methods are demonstrated on e-Sports and baseball videos collected from personal broadcasting platforms such as Twitch and Kakao TV. The results show that the information from multiple time-intervals is useful in predicting video highlights.
Keywords
Video highlight; Multiple time-interval models; Bi-directional LSTM; Chat logs; Audio;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Z. Xiong, R. Radhakrishnan, A. Divakaran, and TS. Huang, "Highlights extraction from sports video based on an audio-visual marker detection framework", IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, pp. 29-32, 2005, doi:10.1109/ICME.2005.1521352.
2 LC. Hsieh, CW. Lee, TH. Chiu, and W. Hsu, "Live semantic sport highlight detection based on analyzing tweets of twitter," IEEE International Conference on Multimedia and Expo, Melbourne, Australia, pp. 949-954, 2012, doi:10.1109/ICME.2012.135.
3 J. Li, Z. Liao, C. Zhang, and J. Wang, "Event Detection on Online Videos using Crowdsourced Time-Sync Comment," International Conference on Cloud Computing and Big Data, Macau, China, pp. 52-57, 2016, doi:10.1109/CCBD.2016.021.
4 Q. Ping, C. Chen, "Video Highlights Detection and Summarization with Lag-Calibration based on Concept-Emotion Mapping of Crowd-sourced Time-Sync Comments," Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1-11, 2017, doi:10.18653/v1/W17-4501.
5 E. Kim, G. Lee, "Highlight Detection in Personal Broadcasting by Analysing Chat Traffic : Game Contests as a Test Case," Journal of Broadcast Engineering, Vol.23, No.2, pp.218-226, 2018, doi:http://dx.doi.org/10.5909/JBE.2018.23.2.218.   DOI
6 CY. Fu, J. Lee, M. Bansal, and AC. Berg, "Video Highlight Prediction Using Audience Chat Reactions," Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 972-978, 2017.
7 A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of Tricks for Efficient Text Classification," European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 427-431, 2016, doi:10.18653/v1/E17-2068.
8 S. Davis, P.Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.28, No.4, pp.357-366, 1980, doi:https://doi.org/10.1109/tassp.1980.1163420.   DOI
9 T. Mikolov, K. Chen, G. Corrado, and J. Dean. "Efficient Estimation of Word Representations in Vector Space," Journal of Biomedical Science and Engineering, Vol.9, No.1, pp.7-16 2016   DOI
10 S. Hochreiter, J. Schmidhuber, "Long short-Term Memory," Neural Computation, Vol.9, No.8, pp.1735-1780, 1997, doi:10.1162/neco.1997.9.8.1735.   DOI
11 Twitch, https://www.twitch.tv/ (accessed Mar. 08, 2019).
12 Kakao TV, https://tv.kakao.com/ (accessed Mar. 08, 2019).
13 K. Zhang, WL. Chao, F. Sha, and K. Grauman, "Video Summarization with Long Short-term Memory," European Conference on Computer Vision, Amsterdam, Netherlands, pp. 766-782, 2016, doi:10.1007/978-3-319-46478-7_47.
14 M. Sun, A. Farhadi, and S. Seitz, "Ranking Domain -specific Highlights by Analyzing Edited Videos," European Conference on Computer Vision, Zurich, Switzerland, pp. 708-802, 2014, doi:10.1007/978-3-319-10590-1_51.
15 H. Tang, V. Kwatra, ME. Sargin, and U. Gargi, "Detecting highlights in sports videos: Cricket as a test case," IEEE International Conference on Multimedia and Expo, Barcelona, Spain, pp. 1-6, 2011, doi:10.1109/ICME.2011.6012139.
16 C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," The IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, pp. 1-9, 2015, doi: 10.1109/CVPR.2015.7298594 .