Browse > Article
http://dx.doi.org/10.9708/jksci.2019.24.08.019

Online Video Synopsis via Multiple Object Detection  

Lee, JaeWon (Dept. of Computer Engineering, Kangwon National University)
Kim, DoHyeon (Dept. of Computer Engineering, Kangwon National University)
Kim, Yoon (Dept. of Computer Engineering, Kangwon National University)
Abstract
In this paper, an online video summarization algorithm based on multiple object detection is proposed. As crime has been on the rise due to the recent rapid urbanization, the people's appetite for safety has been growing and the installation of surveillance cameras such as a closed-circuit television(CCTV) has been increasing in many cities. However, it takes a lot of time and labor to retrieve and analyze a huge amount of video data from numerous CCTVs. As a result, there is an increasing demand for intelligent video recognition systems that can automatically detect and summarize various events occurring on CCTVs. Video summarization is a method of generating synopsis video of a long time original video so that users can watch it in a short time. The proposed video summarization method can be divided into two stages. The object extraction step detects a specific object in the video and extracts a specific object desired by the user. The video summary step creates a final synopsis video based on the objects extracted in the previous object extraction step. While the existed methods do not consider the interaction between objects from the original video when generating the synopsis video, in the proposed method, new object clustering algorithm can effectively maintain interaction between objects in original video in synopsis video. This paper also proposed an online optimization method that can efficiently summarize the large number of objects appearing in long-time videos. Finally, Experimental results show that the performance of the proposed method is superior to that of the existing video synopsis algorithm.
Keywords
Video Synopsis; Video Summarization; Object-based Video Recognition; Object Detection; CCTV(closed-circuit television);
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. R. Huang, H. C. Chen, and P. C. Chung, "Online surveillance video synopsis," 2012 IEEE International Symposium on Circuits and Systems, pp. 1843-1846, Seoul, South Korea, May 2012.
2 E. Bennett, P. Eric, and L. McMillan, "Computational time-lapse video," ACM Transactions on Graphics, Vol. 26, No. 3, pp. 102, July 2007.   DOI
3 N. Petrovic, N. Jojic, and T. S. Huang, "Adaptive video fast forward," Multimedia Tools and Applications, Vol. 26, No. 3, pp. 327-344, August 2005   DOI
4 J. Nam, and A. H. Tewfik, "Video abstract of video," 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No. 99TH8451), pp. 117-122, September 1999.
5 X. Zhu, X. Wu, J. Fan, A. K. Elmagarmid, and W. G. Aref, "Exploring video content structure for hierarchical summarization," Multimedia Systems, Vol. 10, No. 2, pp. 98-115, August 2004.   DOI
6 T. Liu, X. Zhang, J. Feng, and K. T. Lo, "Shot reconstruction degree: a novel criterion for key frame selection," Pattern recognition letters, Vol. 25, No. 12, pp. 1451-1457, September 2004.   DOI
7 B. T. Truong, and S. Venkatesh, "Video abstraction: A systematic review and classification," ACM transactions on multimedia computing, communications, and applications, Vol. 3, No. 1, February 2007.
8 C. Gianluigi, and S. Raimondo, "An innovative algorithm for key frame extraction in video summarization," Journal of Real-Time Image Processing, Vol. 1, No. 1, pp. 69-88, March 2006.   DOI
9 H. Liu, W. Meng, and Z. Liu, "Key frame extraction of online video based on optimized frame difference," 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1238-1242, Sichuan, China, May 2012.
10 C. M. Taskiran, Z. Pizlo, A. Amir, D. Ponceleon, and E. J. Delp, "Automated video program summarization using speech transcripts," IEEE Transactions on Multimedia, Vol. 8, No. 4, pp. 775-791, August 2006.   DOI
11 Y. F. Ma, X. S. Hua, L. Lu, and H. J. Zhang, "A generic framework of user attention model and its application in video summarization," IEEE Transaction on multimedia, Vol. 7, No. 5, pp. 907-919, October 2005.   DOI
12 X. Zhu, C. C. Loy, and S. Gong, "Video synopsis by heterogeneous multi-source correlation," Proceedings of the IEEE International Conference on Computer Vision, pp. 81-88, Sydney, Australia, December 2013.
13 S. Benini, P. Migliorati, and R. Leonardi, "Hidden Markov models for video skim generation," Eighth International Workshop on Image Analysis for Multimedia Interactive Services, pp. 6-6, Santorini, Greece, June 2007.
14 S. Benini, P. Migliorati, and R. Leonardi, "A statistical framework for video skimming based on logical story units and motion activity," 2007 International Workshop on Content-Based Multimedia Indexing, pp. 152-156, Bordeaux, France, June 2007.
15 Y. Pritch, A. Rav-Acha, and S. Peleg, "Nonchronological video synopsis and indexing," IEEE Trans. on pattern analysis and machine intelligence, Vol. 30, No. 11, pp. 1971-1984, November 2008.   DOI
16 Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg, "Webcam synopsis: Peeking around the world," 2007 IEEE 11th International Conference on Computer Vision, pp. 1-8, Rio de Janeiro, Brazil, December 2007.
17 J. Zhu, S. Feng, D. Yi, S. Liao, Z. Lei, and S. Z. Li, "High-performance video condensation system," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 25. No. 7, pp. 1113-1124, July 2014.   DOI
18 A. Rav-Acha, Y. Pritch, and S. Peleg, "Making a long video short: Dynamic video synopsis," 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 435-441, New York, USA, July 2006.
19 C. R. Huang, P. C. Chung, D. K. Yang, H. C. Chen, and G. J. Huang, "Maximuma PosterioriProbability Estimation for Online Surveillance Video Synopsis," IEEE Transactions on circuits and systems for video technology, Vol. 24, No. 8, pp. 1417-1429, August 2014.   DOI
20 S. Feng, Z. Lei, D. Yi, and S. Z. Li, "Online content-aware video condensation," 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2082-2087, Providence, USA, June 2012.
21 L. Sun, J. Xing, H. Ai, and S. Lao, "A tracking based fast online complete video synopsis approach," Proceedings of the 21st international conference on pattern recognition, pp. 1956-1959, Tsukuba, Japan, November 2012.
22 M. Lu, Y. Wang, and G. Pan, "Generating fluent tubes in video synopsis," 2013 IEEE international conference on acoustics, speech and signal processing, pp. 2292-2296, Vancouver, Canada, May 2013.
23 W. Fu, L. Gui, H. Lu, and S. Ma, "Online video synopsis of structured motion," Neurocomputing, Vol. 135, No. 5, pp. 155-162, July 2014.   DOI
24 R. Zhong, R. Hu, Z. Wang, and S. Wang, "Fast synopsis for moving objects using compressed video," IEEE signal processing letters, Vol. 21, No. 7, pp. 834-838, July 2014.   DOI
25 W. Lin, Y. Zhang, J. Lu, B. Zhou, J. Wang, and Y. Zhou, "Summarizing surveillance videos with local-patch-learning-based abnormality detection, blob sequence optimization, and type-based synopsis," Neurocomputing, Vol. 155, No. 1, pp. 84-98, May 2015.   DOI
26 S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, "Single-shot refinement neural network for object detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4203-4212, Salt Lake City, USA, June 2018.
27 J. Redmon, and A. Farhadi, "YOLO9000: better, faster, stronger," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263-7271, Honolulu, USA, July 2017.
28 T. Y. Lin, P. Goyal, R. Girshick, and K. He, "Focal loss for dense object detection," Proceedings of the IEEE international conference on computer vision, pp. 2980-2988, Venice, Italy, October 2017.
29 K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," Proceedings of the IEEE international conference on computer vision, pp. 2017, pp. 2961-2969, Venice, Italy, October 2017.
30 Q. ZHAO, T. Sheng, Y. Wang, Z. Tang, Y. Cheng, L. Cai, and H. Ling, "M2det: A single-shot object detector based on multi-level feature pyramid network," Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, No. 1, pp. 9259-9266, July 2019.
31 J. Jin, F. Liu, Z. Gan, and Z. Cui, "Online video synopsis method through simple tube projection strategy," 2016 8th International Conference on Wireless Communications & Signal Processing, pp. 1-5, Yangzhou, China, October 2016.
32 P. Perez, M. Gangnet, and A. Blake, "Poisson image editing," ACM Transactions on graphics, Vol. 22, No. 3, pp. 313-318, July 2003.   DOI
33 Y. Nie, C. Xiao, H. Sun, and P. Li, "Compact video synopsis via global spatiotemporal optimization," IEEE transactions on visualization and computer graphics, Vol. 19, No. 10, pp. 1664-1676, October 2013.   DOI
34 A. Agarwala, "Efficient gradient-domain compositing using quadtrees," ACM Transactions on Graphics, Vol. 26, No. 3, July 2007.
35 Z. Fangneng, J. Huang, and S. Lu, "Adaptive Composition GAN towards Realistic Image Synthesis," arXiv preprint, arXiv:1905.04693, May 2019.
36 H. Wu, S. Zheng, J. Zhang, and K. Huang, "Gp-gan: Towards realistic high-resolution image blending," arXiv preprint, arXiv:1703.07195, August 2019.
37 H. W. Kang, Y. Matsushita, X. Tang, and X. Q. Chen, "Space-time video montage," 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 1331-1338, New York, USA, June 2006.
38 M. Xu, S. Z. Li, B. Li, X. T. Yuan, and S. M. Xiang, "A set theoretical method for video synopsis," Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp. 366-370, Vancouver, Canada, October 2008.