Audio-Visual Content Analysis Based Clustering for Unsupervised Debate Indexing

Keum, Ji-Soo;Lee, Hyon-Soo;

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 27 Issue 5
/
Pages.244-251
/
2008
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

Audio-Visual Content Analysis Based Clustering for Unsupervised Debate Indexing

비교사 토론 인덱싱을 위한 시청각 콘텐츠 분석 기반 클러스터링

금지수 (경희대학교 컴퓨터공학과) ;
이현수 (경희대학교 컴퓨터공학과)

Published : 2008.07.31

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this research, we propose an unsupervised debate indexing method using audio and visual information. The proposed method combines clustering results of speech by BIC and visual by distance function. The combination of audio-visual information reduces the problem of individual use of speech and visual information. Also, an effective content based analysis is possible. We have performed various experiments to evaluate the proposed method according to use of audio-visual information for five types of debate data. From experimental results, we found that the effect of audio-visual integration outperforms individual use of speech and visual information for debate indexing.

본 연구에서는 시청각 정보를 이용한 비교사 토론 인덱싱 방법을 제안한다. 제안하는 방법은 BIC (Bayesian Information Criterion)에 의한 음성 클러스터링 결과와 거리기반 함수에 의한 영상 클러스터링 결과를 결합한다. 시청각 정보의 결합은 음성 또는 영상 정보를 개별적으로 사용하여 클러스터링할 때 나타나는 문제점을 줄일 수 있고, 토론 데이터의 효과적인 내용 기반의 분석이 가능하다. 제안하는 방법의 성능 평가를 위해 서로 다른 5종류의 토론 데이터에 대해 음성, 영상 정보를 개별적으로 사용할 때와 두 가지 정보를 동시에 사용할 때의 성능 평가를 수행하였다. 실험 결과 음성과 영상 정보를 결합한 방법이 음성, 영상 정보를 개별적으로 사용할 때 보다 토론 인덱싱에 효과적임을 확인하였다.

Keywords

References

Xinbo Gao, Xiaoou Tang, "Unsupervised Video-shot Segmentation and Model-free Anchorperson Detection for News Video Story Parsing," IEEE Trans. Circuits and Systems for Video Technology. 12(4), 765-776, 2002 https://doi.org/10.1109/TCSVT.2002.800510
Alberto Albiol, Luis Torres, Edward J. Delp. "The Indexing of Persons in News Sequences using Audio-visual Data," in Proc. International Conference on Acoustics, Speech, and Signal Processing, 3, 137-140, 2003
Yuya Akita, Masahiro Hasegawa, Tatsuya Kawahara, "Automatic Audio Archiving System for Panel Discussions," in Proc. International Conference on Multimedia & Expo, 3, 1895 -1862, 2004
Alfred Dielmann, Steve Renals, "Automatic Meeting Segmentation Using Dynamic Bayesian Networks," IEEE Trans. Multimedia, 9(1), 25-36, 2007 https://doi.org/10.1109/TMM.2006.886337
한학용, 허강인, 김수훈, "오디오 데이터의 특징 파라메터 구성 에 따른 내용기반 분석," 한국음향학회지 21(2), 182-189, 2002
손종목, 배건성, 강경옥, 김재곤, "내용기반 비디오 색인 및 검색 을 위한 음성인식기술 이용에 관한 연구," 한국음향학회지,20(2), 16-20, 2001
Soonil Kwon, Shrikanth Narayanan, "Unsupervised Speaker Indexing Using Generic Models," IEEE Trans. Speech and Audio Proc. 13(5), 1004-1013, 2005
Sue E. Tranter, Douglas A. Reynolds, "An Overview of Automatic Speaker Diarization Systems," IEEE Trans. Audio, Speech and Language Proc. 14(5), 1557-1565, 2006
Ying Li, Shrikanth Narayanan, C.-C. Jay Kuo, "Audiovisual -based Adaptive Speaker Identification," in Proc. International Conference on Acoustics, Speech, and Signal Processing, 5, 812-815, 2003
Ki Tae Park, Doo Sun Hwang, Young Shik Moon, "Anchor Frame Detection in News Video Using Anchor Object Extraction," IEICE Trans. Fund., E88-A(6), 1525-1528, 2005 https://doi.org/10.1093/ietfec/e88-a.6.1525
금지수, 임성길, 이현수, "스펙트럼 분석과 신경망을 이용한 음성 /음악 분류," 한국음향학회지, 26(5), 207-213, 2007
Scott Shaobing Chen, P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via The Bayesian Information Criterion," DARPA Broadcast News Transcription & Understanding Workshop, 1998
P. Delacourt, C. J. Wellekens, "DISTBIC: A Speaker Based Segmentation for Audio Data Indexing," Speech Communication, 32, 111-126, 2000 https://doi.org/10.1016/S0167-6393(00)00027-3
Min Yang, Yingchun Yang, Zhaohui Wu, "A Pitch-based Rapid Speech Segmentation for Speaker Indexing," in Proc. IEEE International Symposium on Multimedia, 2005
Xuejing, "Pitch Determination and Voice Quality Analysis using Subharmonic-to-Harmonic Ratio," in Proc. International Conference on Acoustics, Speech, and Signal Processing, 1, 333-336, 2002
Maria Zapata Ferrer, Mauro Barbieri, Hans Weda, "Automatic Classification of Field of View in Video," in Proc. International Conference on Multimedia & Expo, 1609-1612, 2006

The Journal of the Acoustical Society of Korea (한국음향학회지)

Audio-Visual Content Analysis Based Clustering for Unsupervised Debate Indexing

비교사 토론 인덱싱을 위한 시청각 콘텐츠 분석 기반 클러스터링

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)