Browse > Article

A Speaker Detection System based on Stereo Vision and Audio  

An, Jun-Ho (성균관대학교 휴대폰학과)
Hong, Kwang-Seok (성균관대학교 정보통신공학부)
Publication Information
Journal of Internet Computing and Services / v.11, no.6, 2010 , pp. 21-29 More about this Journal
Abstract
In this paper, we propose the system which detects the speaker, who is speaking currently, among a number of users. A proposed speaker detection system based on stereo vision and audio is mainly composed of the followings: a position estimation of speaker candidates using stereo camara and microphone, a current speaker detection, and a speaker information acquisition based on a mobile device. We use the haar-like features and the adaboost algorithm to detect the faces of speaker candidates with stereo camera, and the position of speaker candidates is estimated by a triangulation method. Next, the Time Delay Of Arrival (TDOA) is estimated by the Cross Power Spectrum Phase (CPSP) analysis to find the direction of source with two microphone. Finally we acquire the information of the speaker including his position, voice, and face by comparing the information of the stereo camera with that of two microphone. Furthermore, the proposed system includes a TCP client/server connection method for mobile service.
Keywords
Source Localization; Stereo Vision; Speaker Detection;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 진상현, 김동주, 홍광석, "스테레오 비전 기반의 사용자 위치정보 추정 방법에 관한 연구" 한국 신호처리 시스템학회 추계 학술대회 논문집. 제9권 제2호 pp.353-356
2 박정옥, 나승유, 김진영, "휴모노이드 로봇을 위한 시청각 정보 기반 음원 정위 시스템 구현" 한국음성학회, 음성과학 제11권 4호, 2004. pp.29-42
3 H.Atmoko, D.C.Tan, G.Y.Tian, Bruno Fazenda, "Accurate Sound Source Localization in a Reverberant Environment using Multiple Acoustic Sensors", Measurement Science and Technology Journal, Vol.19 No.2, 2008
4 K. Nakadai, H. G. Okuno, H. Kitano, "Real-time Sound Source Localization and Separation For Robot Audition" IEEE International Conference on Spoken Language Process. 2002. pp.193-196
5 T. Takiguchi, J. Adachi, Y. Ariki, "Audio-Based Video Editing with Two-Channel Microphone" International Conference on Multimedia and Ubiquitous Engineering. 2008. pp.282-287
6 Paul Viola, Michael Jones. "Rapid Object Detection using a Boosted Cascade of Simple Features", IEEE Conference on Computer Vision and Pattern Recognition, Vol.1, 2001. pp.511-518
7 채영남, 정지년, 양현승. "얼굴 색상과 에이다부스트를 이용한 효율적인 얼굴 검출", 정보과학회논문지 소프트웨어 및 응용 제36권 제7호, 2009. pp 548-559   과학기술학회마을
8 M. Omologo, P. Svaizer, "The generalized correlation method for estimation of time delay", IEEE Transactions. Acoustics. Speech and signal Processing, Vol 25, No 4, 1976
9 B.C. Park, K.D. Ban, K.C. Kwak, H.S. Yoon, "Sound Source Localization Based on Audio-visual Information for Intelligent service Robot", The 8th International Symposium on Advanced Intelligent Systems. 2007. pp.515-519
10 A. Kushal, M. Rahurkar, Li Fei-Fei, J. Ponce, T. Huang, "Audio-Visual Speaker Localization Using Graphical Models" 18th International Conference on Pattern Recognition. Vol 1, 2006 pp.291-294