Major Character Extraction using Character-Net

Character-Net을 이용한 주요배역 추출

  • 박승보 (인하대학교 정보공학과) ;
  • 김유원 (인하대학교 정보공학과) ;
  • 조근식 (인하대학교 컴퓨터정보공학과)
  • Published : 2010.02.28

Abstract

In this paper, we propose a novel method of analyzing video and representing the relationship among characters based on their contexts in the video sequences, namely Character-Net. As a huge amount of video contents is generated even in a single day, the searching and summarizing technologies of the contents have also been issued. Thereby, a number of researches have been proposed related to extracting semantic information of video or scenes. Generally stories of video, such as TV serial or commercial movies, are made progress with characters. Accordingly, the relationship between the characters and their contexts should be identified to summarize video. To deal with these issues, we propose Character-Net supporting the extraction of major characters in video. We first identify characters appeared in a group of video shots and subsequently extract the speaker and listeners in the shots. Finally, the characters are represented by a form of a network with graphs presenting the relationship among them. We present empirical experiments to demonstrate Character-Net and evaluate performance of extracting major characters.

본 논문에서는 동영상의 등장인물 간의 상황을 기초로 배역간의 관계를 정의한 Character-Net을 구축하는 방법과 이를 이용하여 동영상으로부터 주요배역을 추출하는 방법을 제안한다. 인터넷의 발전과 함께 디지털화된 동영상의 수가 기하급수적으로 증가하여 왔고 원하는 동영상을 검색하거나 축약하기 위해 동영상으로부터 의미정보를 추출하려는 다양한 시도가 있어왔다. 상업용 영화나 TV 드라마와 같이 이야기 구조를 가진 대부분의 동영상은 그 속에 존재하는 등장인물들에 의해 이야기 전개가 이루어지게 되므로, 동영상 분석을 위해 인물 간의 관계와 상황을 체계적으로 정리하고 주요배역을 추출하여 동영상 검색이나 축약을 위한 정보로 활용할 필요가 있다. Character-Net은 영상의 그룹 단위에 등장하는 인물들을 찾아 화자와 청자를 분류하여 등장인물 기반의 그래프로 표현하고 이 그래프를 누적하여 전체 동영상의 등장인물들 간의 관계를 묘사한 네트워크다. 그리고 이 네트워크에서 연결정도 중심성 분석을 통해 주요배역을 추출할 수 있다. 이를 위해 본 논문에서는 Character-Net을 구축하고 주요배역을 추출하는 실험을 진행 하였다.

Keywords

References

  1. J.R. Cozar, N. Guil, J.M. Gonzalez-Linares, E.L. Zapata, E. Izquierdo, "Logotype detection to support semantic-based video annotation," Signal Processing: Image Communication, Volume 22, Issues 7-8, pp. 669-679, Aug.-Sep. 2007. https://doi.org/10.1016/j.image.2007.05.006
  2. J. Yang, R. Yan, A.G. Hauptmann, "Multiple instance learning for labeling faces in broadcasting news video," in: Proceedings of the ACM International Conference on Multimedia, pp. 31-40, 2005.
  3. M. Everingham, J. Sivic, A, Zisserman, "Hello! My name is… Buffy" ? automatic naming of characters in TV video," Proceedings of the 17th British Machine Vision Conference, Edinburgh, UK, pp. 889-908, 2006.
  4. L. Liang, G. Haifeng, L. Li, and W. Liang. "Semantic event representation and recognition using syntactic attribute graph grammar," Pattern Recognition Letters, Vol. 30, Issue 2, pp. 180-186, 15 Jan. 2009. https://doi.org/10.1016/j.patrec.2008.02.023
  5. 박승보, 김유원, 조근식, "얼굴인식을 이용한 동영상 상황정보 어노테이션," 한국지능정보 시스템학회, 2008 추계학술대회 논문집, pp.319-324, 2008.11.28.
  6. V. Roth, "Content-based retrieval from digital video," Image and Vision Computing, Vol. 17, no. 7, pp. 531-540, 1999. https://doi.org/10.1016/S0262-8856(98)00144-9
  7. D. Cristinacce and T.F. Cootes, "Feature Detection and Tracking with Constrained Local Models," Proc. 17th British Machine Vision Conf., pp. 929-938, 2006.
  8. B. Leibe, E. Seemann, B. Schiele, "Pedestrian detection in crowded scenes," Computer Vision and Pattern Recognition, CVPR 2005. IEEE Computer Society Conference on, Vol. 1, pp. 878 - 885, 20-25 June 2005.
  9. S. H. Khatoonabadi, M. Rahmati, "Automatic soccer players tracking in goal scenes by camera motion elimination," Image and Vision Computing, Vol. 27, Issue 4, pp. 469-479, 3 Mar. 2009. https://doi.org/10.1016/j.imavis.2008.06.015
  10. P. Buehler, M. Everingham, D.P. Huttenlocher, A. Zisserman, "Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts," Proceedings of the British Machine Vision Conference, 2008.
  11. 손동원, "사회 네트워크 분석," pp.25-38, 161, 242-244, 경문사, 서울, 2002.
  12. Y. Rui, T.S. Huang, S. Mehrotra, "Constructing Table-of-Content for Videos," to appear in ACM Multimedia Systems Journal, Special Issue Multimedia Systems on Video Libraries, Sep. 1999.
  13. F. Nack, A. Parkes, "The Application of Video Semantics and Theme Representation in Automated Video Editing," Multimedia Tools and Applications, Vol. 4, No. 1, pp. 57-83, Jan. 1997. https://doi.org/10.1023/A:1009682315690
  14. M. Everingham, J. Sivic, A. Zisserman, "Taking the bite out of automated naming of characters in TV video," Image and Vision Computing, In Press, Corrected Proof, Available online, 4 May 2008.
  15. ISO, "MPEG-7 Overview", Oct. 2004.
  16. Mehmet Emin Donderler, Ediz Saykol, Umut Arslan, Ozgur Ulusoy and Ugur Gudukbay, "BilVideo: Design and Implementation of a Video Database Management System", Kluwer Academic Pub, pp.20-21, 2003
  17. Menzo Windhouwer, "Feature Grammar Systems", Incremental Maintenance of Indexes to Digital Media Warehouses, pp. 32-33, Nov. 2003
  18. R. Lienhart, S. Pfeiffer, W. Effelsberg, "Video abstracting," Communications of the ACM, Vol. 40, pp. 54-62, Dec. 1997.