Clustering-based Hierarchical Scene Structure Construction for Movie Videos

영화 비디오를 위한 클러스터링 기반의 계층적 장면 구조 구축

  • 최익원 (연세대학교 컴퓨터과학과) ;
  • 변혜란 (연세대학교 컴퓨터과학과)
  • Published : 2000.05.15

Abstract

Recent years, the use of multimedia information is rapidly increasing, and the video media is the most rising one than any others, and this field Integrates all the media into a single data stream. Though the availability of digital video is raised largely, it is very difficult for users to make the effective video access, due to its length and unstructured video format. Thus, the minimal interaction of users and the explicit definition of video structure is a key requirement in the lately developing image and video management systems. This paper defines the terms and hierarchical video structure, and presents the system, which construct the clustering-based video hierarchy, which facilitate users by browsing the summary and do a random access to the video content. Instead of using a single feature and domain-specific thresholds, we use multiple features that have complementary relationship for each other and clustering-based methods that use normalization so as to interact with users minimally. The stage of shot boundary detection extracts multiple features, performs the adaptive filtering process for each features to enhance the performance by eliminating the false factors, and does k-means clustering with two classes. The shot list of a result after the proposed procedure is represented as the video hierarchy by the intelligent unsupervised clustering technique. We experimented the static and the dynamic movie videos that represent characteristics of various video types. In the result of shot boundary detection, we had almost more than 95% good performance, and had also rood result in the video hierarchy.

최근 들어 멀티 미디어 정보의 사용이 급격히 증가하면서, 여러 미디어 형태 중 비디오가 많은 각광을 받으며, 다른 타입의 모든 미디어 정보를 하나의 자료 흐름으로 묶고 있다. 디지털 비디오의 실용 가능성은 크게 증대되고 있으나 비디오의 방대한 길이와 비구조적 형식 때문에 효과적인 비디오의 접근은 어려운 실정이다. 따라서 최근에 개발되는 영상과 비디오 정보 관리 시스템은 본 논문에서 제안하는 사용자의 최소 상호 작용과 비디오 구조의 명확한 정의를 필요로 한다. 본 논문에서는 사용자가 쉽게 비디오 내용을 요약한 형태로 보고, 임의로 접근 할 수 있도록 클러스터링 기반 비디오 계층 구조 구축 시스템을 제시한다. 제안된 시스템은 크게 샷 경계면 검출과 계층 구조 구축 단계로 이루어진다. 샷 경계면 검출 단계에서는 복수 특징들을 추출하고, 이웃한 프레임 쌍들에 대한상호관계를 고려한 시간 적응적 필터링 기법을 이용하여 오판될 수 있는 왜곡 성분을 제거함으로써 성능을 향상시켰다. 처리된 복수 특징들은 임계치를 필요로 하지 않는 k-means 클러스터링의 입력으로 사용되어 샷 경계면을 검출한다. 결과인 순차적인 샷 리스트는 시간 지역성과 장면 구조를 효과적으로 모델링하는 특성을 가진 지능적 비감독 클러스터링 기법에 의해 계층 구조로 표현된다. 실험은 정적 영화 비디오와 동적 영화 비디오를 대상으로 수행하였으며, 샷 경계면 검출에서는 평균적으로 95%의 정확성을 보였으며 장면 경계면 검출을 하는 비디오 계층 구조 구축에서도 어느 정도 정확한 장면 경계면 검출 결과를 보였다.

Keywords

References

  1. Shih-Fu Chang, William Chen, Horace J.Meng, Hari Sundaram and Di Zhong, 'VideoQ: an automated content based video search system using visual cues,' ACM Multimedia 1997
  2. Thomas S. Huang and Yong Rui, 'Image Retrieval: Past, Present, and Future,' invited paper in Int Symposium on Multimedia Information Processing, Dec 11-13, 1997, Taipei, Taiwan
  3. Thomas S. Huang, Yong Rui, Trausti Kristjansson, Milind Naphade, and Yueting Zhuang, 'Video Analysis and Representation,' ISO/IEC JTC1/SC29/WG11 M3110, MPEG98
  4. Yong Rui, Thomas S. Huang, and Sharad Mehrotra, 'Constructing Table-of-Content for Videos,' to appear in ACM Multimedia Systems Journal, Special Issue Multimedia Systems on Video Libraries, Sept, 1999 https://doi.org/10.1007/s005300050138
  5. Ruud M. Bolle, Boon-Lock Yeo, Minerva M. Yeung, 'Video Query: Beyond the keywords,' Technical report, IBM Research Report, Oct 17 1996
  6. H.J. Zhang, A. Kankanhalli, and S.W. Smoliar. 'Automatic Partitioning of Full-motion Video,' Multimedia Systems, 1(1):10-28, 1993 https://doi.org/10.1007/BF01210504
  7. I.K. Sethi and Nilesh V. Patel, 'A Statistical Approach to Scene Change Detection,' IS&T SPIE Proceedings:Storage and Retrieval for Image and Video Databases III, Vol. 2420, pp. 329-339, Feb.1995, San Jose, California
  8. B. -L. Yeo, 'Efficiency processing of compressed image and video,' Technical report, PhD thesis, Princeton University, 1996
  9. R. Zabih, J. Miller, and K. Mai. 'A feature-based algorithm for detecting and classifying scene breaks,' ACM International Conference on Multimedia, pages 189-200, 1995 https://doi.org/10.1145/217279.215266
  10. J.S. Boreczky and L.A. Rowe. 'Comparison of Video Shot Boundary Detection Techniques,' I.K. Sethi and R.C. Jain, editors, Proceedings of IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases IV Vol. SPIE 2670, pages 170-179, 1996
  11. Wayne Wolf. 'Key frame selection by motion analysis,' In Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., 1996 https://doi.org/10.1109/ICASSP.1996.543588
  12. Yueting Zhuang, Yong Rui, Thomas S. Huang, and Sharad Mehrotra, 'Adaptive key frame extraction using unsupervised clustering,' In Proc. IEEE int. Cont. On Image Proc., 1998
  13. B. Yeo and B. Liu. 'On the Extraction of DC Sequence from MPEG Compressed Video,' Proceedings of ICIP, 2:260-264, 1995
  14. B. -L. Yeo and B. Liu. 'Rapid Scene Analysis on Compressed Video,' IEEE Transactions on Circuits and Systems for Video technology, 5(6), 1995 https://doi.org/10.1109/76.475896
  15. H.J. Zhang et al. 'Video Parsing using Compressed Data,' SPIE Symposium on Electronic Imaging Science and Technology: Image and Video Processing II, pages 142-149, 1994 https://doi.org/10.1117/12.171062
  16. Di Zhong and Shih-Fu Chang, 'Video Object Model and Segmentation for Content Based Video Indexing,' ISCAS'97, 11 June, Hong Kong https://doi.org/10.1109/ISCAS.1997.622202
  17. Di Zhong, H.J. Zhang and Shih-Fu Chang, 'Clustering methods for video browsing and annotation,' Storage and Retrieval for Still Image and Video Databases IV, IS&T/SPIE's Electronic Imaging: Science & Technology 96 [2670-38] https://doi.org/10.1117/12.234800
  18. H.J. Zhang, Yihong Gong, S. W. Smoliar, and Shuang Yeo Tan, 'Automatic Partitioning of news video,' In Proc. IEEE Int. Conf. on Muti-media, Computing and Systems, 1994
  19. B.-L. Yeo, M.M. Yeung, IBM T. J. Research Center. 'Classification, Simplification and Dynamic Visualization of Scene Transition Graphs for Video Browsing,' IS&T/SPIE Electronic Imaging98 .Storage and Image retrieval for Image and Video Databases VI, 1998 https://doi.org/10.1117/12.298470
  20. M.M. Yeung, B.-L. Yeo, W. Wolf, and B. Liu. 'Video Browsing using Clustering and Scene Transitions on Compressed Sequences,' IS&T/SPIE Multimedia Computing and Networking, 1995 https://doi.org/10.1117/12.206067
  21. M. Yeung, B. -L. Yeo, and B. Liu. 'Extracting Story Units from Long Programs for Video Browsing and Navigation,' International Conference on Multimedia Computing and Systems, June 1996
  22. A. M. Ferman, 'Efficient filtering and clustering methods for temporal video segmentation and visual summarization,' J. Vis. Comm. and Image Rep., vol. 9, no. 4 (special issue), pp. 336-351, Dec. 1998
  23. M. Naphade, R. Mehrotra, A. M. Ferman, J. Warnick, and T. S. Huang 'A high performance algorithm for shot boundary detection using multiple cues,' Proc. IEEE Int. Conf. Image Proc., Chicago, IL, Oct. 1998 https://doi.org/10.1109/ICIP.1998.723662
  24. A. Hampapur, R. Jain, and T. Weymouth. 'Digital Video Segmentation,' In Second Annual ACM MultiMedia Conference and Exposition, 1994 https://doi.org/10.1145/192593.192699
  25. Rafael C. Gonzales and Richard E. Woods. 'Digital Image Processing,' Addison Wesley Publishing Company, Reading, Massachussetts, 1993
  26. Ioannis Pitas, Digital Image Processing Algorithms, Cambridge, Prentice-Hall, 1993
  27. Rainer Lienhart, Silvia Pfeiffer and Wolfgang Effelsberg, 'Video Abstracting,' In Communications of ACM, pp. xx-yy, Dec. 1997 https://doi.org/10.1145/265563.265572
  28. Rainer Lienhart, Wolfgang Effelsberg and Ramesh Jain, 'Visual GREP:A ystematic method to compare and retrieve video sequence,' SPIE Vol. 3312, storage and Retrieval for image and Vidoe Databases VI, 1998
  29. Stephan Fischer, Rainer Lienhart and Wolfgang Effelsberg, 'Automatic Recognition of Film Genres,' In Proc. ACM Multimedia 95, San Francisco, CA, Nov. 1995, pp. 295-304
  30. Rainer Lienhart and Frank Stuber, 'Automatic text recognition in digital videos,' University of Mannheim, Department of Computer Science, Technical Report TR-95-036, Dec. 1995
  31. Di Zhong and Shih-Fu Chang, 'AMOS: AN ACTIVE SYSTEM FOR MPEG-4 VIDEO OBJECT SEGMENTATION,' 1998 International Conference on Image Processing, October 4-7, 1998, Chicago, Illinois, USA https://doi.org/10.1109/ICIP.1998.723609