A Method of Generating Table-of-Contents for Educational Video

교육용 비디오의 ToC 자동 생성 방법

  • Lee Gwang-Gook (Division of Electrical and Computer Engineering, Hanyang University) ;
  • Kang Jung-Won (Broadcasting Media Research Group, Digital Broadcasting Research Division, ETRI) ;
  • Kim Jae-Gon (Broadcasting Media Research Group, Digital Broadcasting Research Division, ETRI) ;
  • Kim Whoi-Yul (Division of Electrical and Computer Engineering, Hanyang University)
  • 이광국 (한양대학교 전자통신전파공학과) ;
  • 강정원 (한국전자통신연구원 디지털방송연구단 방송시스템연구그룹) ;
  • 김재곤 (한국전자통신연구원 디지털방송연구단 방송시스템연구그룹) ;
  • 김회율 (한양대학교 전자통신전파공학과)
  • Published : 2006.03.01

Abstract

Due to the rapid development of multimedia appliances, the increasing amount of multimedia data enforces the development of automatic video analysis techniques. In this paper, a method of ToC generation is proposed for educational video contents. The proposed method consists of two parts: scene segmentation followed by scene annotation. First, video sequence is divided into scenes by the proposed scene segmentation algorithm utilizing the characteristics of educational video. Then each shot in the scene is annotated in terms of scene type, existence of enclosed caption and main speaker of the shot. The ToC generated by the proposed method represents the structure of a video by the hierarchy of scenes and shots and gives description of each scene and shot by extracted features. Hence the generated ToC can help users to perceive the content of a video at a glance and. to access a desired position of a video easily. Also, the generated ToC automatically by the system can be further edited manually for the refinement to effectively reduce the required time achieving more detailed description of the video content. The experimental result showed that the proposed method can generate ToC for educational video with high accuracy.

양방향 맞춤형 방송의 실현으로 인해 비디오의 내용을 자동으로 분석하여 그 구조를 기술하거나 요약을 생성하는 등의 내용 기반 비디오 분석 기술의 필요성이 요구되고 있다. 본 논문에서는 온라인에서 수요가 높고 특히 맞춤형 방송에 적합한 방송 콘텐츠인 교육용 비디오의 ToC를 자동으로 생성하기 위한 방법을 제안한다. 제안한 ToC 생성 방법은 씬 분할과 씬 서술의 두 단계로 이루어져 있다. 씬 분할 단계에서는 삿 분할을 수행한 후 샷 간의 연결관계 분석을 통해 입력 영상을 씬 단위로 분할하게 된다. 씬 서술 단계에서는 분할된 각 씬이 장면 분류, 자막 검출, 화자 인식 등에 의해 그 내용이 자동으로 서술된다. 제안된 방법을 통해 생성된 ToC는 씬과 샷의 계층 구조를 통해 비디오의 구성을 표현하고, 검출된 여러 특정을 이용해 각 씬과 샷의 내용을 서술함으로써 사용자가 비디오의 내용을 한눈에 알아볼 수 있고 원하는 내용에 손쉽게 접근할 수 있도록 도와줄 수 있다. 또 보다 상세한 ToC가 요구되는 경우에는 유용한 정보들이 포함되어 있는 초기 형태의 ToC로써 이용되어 수작업에 의한 ToC 생성에 필요한 시간을 효과적으로 줄이는 것이 가능하다. 실험을 통해 제안한 방법으로 여러 개의 교육용 비디오에서 ToC를 효과적으로 생성될 수 있음을 확인하였다.

Keywords

References

  1. Y. Yusoff, W. Christmas, and J. Kittler, 'Video Shot Cut Detection Using Adaptive Thresholding,' in Proceedings of the 11th British Machine Vision Conference, pp. 362-372, 2000
  2. J. Bescos, J. M. Menendez, G. Cisneros, J. Cabrera, and J. M. Martinez, 'A Unified Approach to Gradual Shot Transition Detection', in Proceedings of International Conference on Image Processing, Vol. III, pp. 949-952, 2000
  3. M. Yeung and B. L. Yeo, 'Time-constrained clustering for segmentation of video into story units,' in Proceedings of ICPR, Vol. C, Vienna, Austria, Aug. 1996, pp. 375-380
  4. A. Hanjalic, R. L. Legendijk, and J. Biemond, 'Automated High-Level Movie Segmenation for Advanced Video-Retirieval Systems', in IEEE Transactions of Circuits and Systems for Video Technology, Vol. 9, No. 4, June 1999
  5. W. Tavananpong, 'Shot Clustering Techniques for Video Browsing,' in IEEE Transactions on Multimedia, Vol. 6, No. 5, August 2004
  6. 'MPEG-7 Visual part of experimentation Model Version 10.0, 'ISO/IEC JTC1/SC29/WG11, N4063, Singapore, Mar. 2001
  7. B. L. Yeo and B. Liu, 'Rapid Scene Analysis on Compressed Videos,' in IEEE Transactions on Circuits and Systems for Video Technology, 5(6): 533-544, Dec. 1995 https://doi.org/10.1109/76.475896
  8. H. Sundaram, S.-F. Chang, 'Computable scenes and structures in films,' in IEEE Transactions on Multimedia, Volume: 4 , Issue: 4 , Dec. 2002, Pages:482 - 491 https://doi.org/10.1109/TMM.2002.802017
  9. Winston H.-M. Hsu, L. Kennedy, C.-W. Huang, S.-F. Chang, C.-Y. Lin, G. Iyengar, 'News Video Story Segmentation using Fusion of Multi-Level Multi-modal Features in TRECVID 2003,' in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 17-21, 2004
  10. A. Girgensohn and J. Foote, 'Video Classification using Transform Coefficients,' in Proceedings of International Conference on Acoustics, Speech, and Signal, vol. 6, pp. 3045-3048, 1999., March 15, 1999
  11. A. Ekin, A. M. Tekalp and R. Mehrotra, 'Automatic Soccer Video Analysis and Summarization,' in IEEE Transactions on Image Processing, Vol. 12, No. 7, July 2003
  12. C. Wolf, J.-M. Jolion, F. Chassaing, 'Text Localization, Enhancement and Binarization in Multimedia Document' in Proceedings of 16th International Conference on Pattern Recognition, Volume: 2 , 11-15 Aug. 2002
  13. M. Xu, N. C. Maddage, C. Xu, M. Kankanhali and Q. Tian, 'Creating Audio Keywords for Event Detection in Soccer Video,' in Proceedings of International Conference on Multimedia and Expo, pp. 281-284, 2003
  14. D. A. Reynolds.: A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification. PhD thesis. Electrical Engineering Department, Georgia Institute of Technology, 2000
  15. W.Zhou, A.Vellaikal, and C. J. Kuo, 'Rule-based Video Classification System for Basketball Video Indexing,' in Proceedings of ACM Multimedia 2000 workshops, 2000