Browse > Article
http://dx.doi.org/10.3837/tiis.2019.08.011

Spatio-Temporal Residual Networks for Slide Transition Detection in Lecture Videos  

Liu, Zhijin (School of Communication and Information Engineering, Shanghai University)
Li, Kai (School of Communication and Information Engineering, Shanghai University)
Shen, Liquan (School of Communication and Information Engineering, Shanghai University)
Ma, Ran (School of Communication and Information Engineering, Shanghai University)
An, Ping (School of Communication and Information Engineering, Shanghai University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.8, 2019 , pp. 4026-4040 More about this Journal
Abstract
In this paper, we present an approach for detecting slide transitions in lecture videos by introducing the spatio-temporal residual networks. Given a lecture video which records the digital slides, the speaker, and the audience by multiple cameras, our goal is to find keyframes where slide content changes. Since temporal dependency among video frames is important for detecting slide changes, 3D Convolutional Networks has been regarded as an efficient approach to learn the spatio-temporal features in videos. However, 3D ConvNet will cost much training time and need lots of memory. Hence, we utilize ResNet to ease the training of network, which is easy to optimize. Consequently, we present a novel ConvNet architecture based on 3D ConvNet and ResNet for slide transition detection in lecture videos. Experimental results show that the proposed novel ConvNet architecture achieves the better accuracy than other slide progression detection approaches.
Keywords
Lecture video; slide transition; 3D ConvNet; ResNet;
Citations & Related Records
연도 인용수 순위
  • Reference
1 He K, Zhang X, Ren S, et al, "Deep Residual Learning for Image Recognition," in Proc. of Computer Vision and Pattern Recognition, pp. 770-778, June 26-July 1, 2016.
2 Ma, Di, Agam, Gady, "Lecture video segmentation and indexing," in Proc. of The International Society for Optical Engineering, 8297(1), pp .48, January 25- 26, 2012.
3 Hyun Ji Jeong. Tak-Eun Kim. Myoung Ho Kim, "An accurate lecture video segmentation method by using SIFT and adaptive threshold," in Proc. of Conference: Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia, pp. 285-288, December 03-05, 2012.
4 Jeong H J, Kim T E, Kim H G, et al, "Automatic detection of slide transitions in lecture videos," Multimedia Tools & Applications, vol. 74, no. 18, pp. 7537-7554, September 28, 2015.   DOI
5 Li K, Wang J, Wang H, et al, "Structuring Lecture Videos by Automatic Projection Screen Localization and Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 6, pp.1233-1246, June 1, 2015.   DOI
6 Yousuf, M., Mehmood, Z., Habib, H.A., Mahmood, T., Saba, T., Rehman, A., Rashid, M. "A Novel Technique Based on Visual Words Fusion Analysis of Sparse Features for Effective Content-Based Image Retrieval," Mathematical Problems in Engineering, vol. 2018, 2018.
7 Sharif, U., Mehmood, Z., et al, "Scene analysis and search using local features and support vector machine for effective content-based image retrieval," Artificial Intelligence Review, vol. 52, no, 2, pp. 901-925, 2019.   DOI
8 Cirne M V M, Pedrini H, "VISCOM: A robust video summarization approach using color co-occurrence matrices," Multimedia Tools & Applications, vol. 77, no. 1, pp. 857-875, 2018.   DOI
9 Subudhi B N, Veerakumar T, Yadav D, et al, "Video Skimming for Lecture Video Sequences Using Histogram Based Low Level Features" in Proc. of International Advance Computing Conference, pp. 684-689, January 05-07, 2017.
10 Mohanta P P, Saha S K, Chanda B, "A Model-Based Shot Boundary Detection Technique Using Frame Transition Parameter," IEEE Transactions on Multimedia, vol. 14, no. 1, pp.223-233, February, 2012.   DOI
11 Balagopalan A, Balasubramanian L L, Balasubramanian V, et al, "Automatic keyphrase extraction and segmentation of video lectures," in Proc. of IEEE International Conference on Technology Enhanced Education, pp. 1-10, January 3, 2012.
12 He L, Sanocki E, Gupta A, et al,"Auto-summarization of audio-video presentations," in Proc. of Acm Multimedia, pp. 489-498, October 30 ,1999.
13 Repp, Stephan, Meinel, Christoph, "Segmentation of lecture videos based on spontaneous speech recognition," in Proc. of 10th IEEE International Symposium on Multimedia, pp. 692-697, December 15-17, 2008.
14 Lin M, Diller C B R, Forsgren N, et al, "Segmenting Lecture Videos by Topic: From Manual to Automated Methods," in Proc. of 11th Americas Conference on Information System, pp. 1891-1898, August 11-15, 2005.
15 Mehmood Z, Gul N, et al, "Scene search based on the adapted triangular regions and soft clustering to improve the effectiveness of the visual-bag-of-words model," Eurasip Journal on Image & Video Processing, vol.48, 2018.
16 Kanwal Yousaf, Zahid Mehmood, Tanzila Saba, et al., "A Novel Technique for Speech Recognition and Visualization Based Mobile Application to Support Two-Way Communication between Deaf-Mute and Normal Peoples," Wireless Communications and Mobile Computing, vol. 2018, pp. 1-12, 2018.
17 Qazi KA, Nawaz T, Mehmood Z, et al, "A hybrid technique for speech segregation and classification using a sophisticated deep neural network," PLoS ONE, vol. 13, no. 3: e0194151, 2018.   DOI
18 Yang H, Siebert M, Lühne P, et al, "Lecture Video Indexing and Analysis Using Video OCR Technology," in Proc. of Seventh International Conference on Signal-Image Technology and Internet-Based Systems, pp. 54-61, November 28, 2011.
19 Che X., Yang H., Meinel C, "Lecture video segmentation by automatically analyzing the synchronized slides," in Proc. of the 2013 ACM Multimedia Conference, pp. 345-348, October 21 -25, 2013.
20 Baidya, ESHA, Goel, Sanjay, "LectureKhoj Automatic Tagging and Semantic segmentation of online lecture videos," in Proc. of 7th International Conference on Contemporary Computing, pp. 37-43, August 07-09, 2014.
21 Sarwar, A., Mehmood, Z., et al, "A novel method for content-based image retrieval to improve the effectiveness of the bag-of-words model using a support vector machine," Journal of Information Science, vol. 45, no. 1, pp. 117-135, 2018.   DOI
22 Mehmood, Z., Rashid, M., et al, "Effect of complementary visual words versus complementary features on clustering for effective content-based image search," Journal of Intelligent and Fuzzy Systems, vol. 35, no. 5, pp. 5421-5434, 2018.   DOI
23 Simonyan K, Zisserman A, "Very Deep Convolutional Networks for Large-Scale Image Recognition," Computer Science, 2014.
24 Yao T, Mei T, Rui Y, "Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization," in Proc. of Computer Vision and Pattern Recognition, pp. 982-990, June 26 -July 01, 2016.
25 Qiu Z, Yao T, Mei T, "Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks," in Proc. of IEEE International Conference on Computer Vision, pp. 5534-5542, October 22-29, 2017.
26 He K, Sun J, "Convolutional neural networks at constrained time cost," in Proc. of Computer Vision and Pattern Recognition, pp.5353-5360, June 7-12, 2015.
27 Glorot X, Bengio Y, "Understanding the difficulty of training deep feedforward neural networks," Journal of Machine Learning Research, pp. 249-256, May 13-15, 2010.
28 Ioffe S, Szegedy C. "Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shift" in Proc. of 32nd International Conference on Machine Learning, pp.448-456, July 6-11, 2015.
29 Gong Y, Liu X, "Video summarization using singular value decomposition," Multimedia Systems, vol. 9, no. 2, pp. 157-168, 2003.   DOI
30 Z. j. Liu, K. Li, L. Q. Shen and P. An, "Sparse time-varying graphs for slide transition detection in lecture videos," in Proc. of International Conference on Image and Graphics (ICIG), pp. 567-576, Sept 13, 2017.
31 Rehman, A., Abbas, N., Saba, T., Rahman, S.I.U., Mehmood, Z., Kolivand, H. "Classification of acute lymphoblastic leukemia using deep learning," Microscopy Research and Technique, vol. 81, no. 11, pp. 1310-1317,2018.   DOI
32 Du Tran, Lubomir Bourdev, et al, "Learning Spatiotemporal Features with 3D Convoluntional Networks," in Proc. of IEEE International Conference on Computer Vision, pp. 4489-4497, Feburary 17, 2015.