Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface)

Park, Jin-Soo;Ko, Han-Seok;

doi:10.7776/ASK.2013.32.2.147

한국음향학회지 (The Journal of the Acoustical Society of Korea)

제32권2호
/
Pages.147-156
/
2013
/
1225-4428(pISSN)
/
2287-3775(eISSN)

한국음향학회 (The Acoustical Society of Korea)

DOI QR Code

인간로봇 상호작용을 위한 잡음환경에 강인한 음성 끝점 검출 기법

Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface)

박진수 (고려대학교 바이오마이크로시스템기술 협동과정) ;
고한석 (고려대학교 전기전자전파공학부)

투고 : 2012.02.03
심사 : 2013.01.10
발행 : 2013.03.31

https://doi.org/10.7776/ASK.2013.32.2.147 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 이동하는 로봇에 탑재한 대화체 음성인식기의 주위 잡음 환경에 강인한 새로운 음성 끝점 검출 기법을 제안한다. 기존의 기법은 특징 값의 갑작스러운 변화점을 찾기 위해 에지 검출 필터(edge detection filter)를 적용하여 끝점을 찾았다. 하지만 프레임 에너지의 특징은 잡음 환경에서 불안정하기 때문에 음성의 끝점을 정확하게 찾기 어렵다. 그러므로 두 번의 고속 퓨리에 변환과 통계적 모델 기반의 특징 추출 기법을 제안하여 에지 검출 필터에 적용한다. 제안한 기법이 기존의 기법보다 강인한 특징이 될 수 있음을 본 실험을 통하여 확인하였다.

In this paper, a new speech endpoint detection method in noisy environments for moving robot platforms is proposed. In the conventional method, the endpoint of speech is obtained by applying an edge detection filter that finds abrupt changes in the feature domain. However, since the feature of the frame energy is unstable in such noisy environments, it is difficult to accurately find the endpoint of speech. Therefore, a novel feature extraction method based on the twice-iterated fast fourier transform (TIFFT) and statistical models of speech is proposed. The proposed feature extraction method was applied to an edge detection filter for effective detection of the endpoint of speech. Representative experiments claim that there was a substantial improvement over the conventional method.

키워드

참고문헌

J. Beh, R. H. Baran, and H. Ko, "Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment," IEEE Trans. Consumer Electronics 52, 583-589 (2006). https://doi.org/10.1109/TCE.2006.1649683
J. Beh and H. Ko, "Spectral subtraction using spectral harmonics for robust speech recognition in car environments," LNCS 2660, 1109-1116 (2003).
L. R. Labiner and M. R. Sambur, "An algorithm for determining the endpoints for isolated utterance," Bell Syst. Tech. J. 54, 297-315 (1975). https://doi.org/10.1002/j.1538-7305.1975.tb02840.x
L. R. Labiner and B. H. Juang, Fundamentals of Speech Recognition, (Prentice Hall, NJ, 1993).
ITU-T, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to ITU-T V.70, (ITU-T Rec. G. 729, Annex B, 1996).
J. G. Wilpon and L. R. Labiner, "Application of hidden Markov models to automatic speech endpoint detection," Comput. Speech Lang. 2, 321-341 (1987). https://doi.org/10.1016/0885-2308(87)90015-5
E. Nemer, R. Goubran, and S. Mahmoud, "Robust voice activity detection using higher-order statistics in the LPC residual domain," IEEE Trans. Speech Audio Process. 9, 217-231 (2001). https://doi.org/10.1109/89.905996
K. Li, M. N. S. Swamy, and M. O. Ahmad, "An improved voice activity detection using higher order statistics," IEEE Trans. Speech Audio Process. 13, 965-974 (2005). https://doi.org/10.1109/TSA.2005.851955
B. F. Wu and K. C. Wang, "Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments," IEEE Trans. Speech Audio Process. 13, 762-775 (2005). https://doi.org/10.1109/TSA.2005.851909
Q. Li and A. Tsai, "A matched filter approach to endpoint detection for robust speaker verification," in Proc. IEEE Work. AIAT (1999).
Q. Li, J. Zheng, A. Tsai, and Q. Zhou, "Robust endpoint detection and energy normalization for real-time speech and speaker recognition," IEEE Trans. Speech Audio Process. 10, 146-157 (2002). https://doi.org/10.1109/TSA.2002.1001979
H. Ghaemmaghami, R. Vogt, S. Sridharan, and M. Mason, "Speech endpoint detection using gradient based edge detection techniques," in Proc. ICSPCS, 1-8 (2008).
T. Fukuda, O. Ichikawa, and M. Nishimura, "Long-term spectro-temporal and static harmonic features for voice activity detection," IEEE J. STSP 4, 834-844 (2010).
K. Ishizuka, T. Nakatani, and M. Fujimoto, "Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio," Speech Communication 52, 41-60 (2010). https://doi.org/10.1016/j.specom.2009.08.003
T. Kristjansson, S. Deligne, and P. Olsen, "Voicing features for robust speech detection," in Proc. Interspeech, 369-372 (2005).
Q. Jo, J. Chang, J. Kim, and N. Kim, "Statistical modelbased voice activity detection using support vector machine," IET Signal Process. 3, 205-210 (2009). https://doi.org/10.1049/iet-spr.2008.0128
Q. Jo, Y. Park, K. Lee, and J. Jang, "A support vector machine-based voice activity detection using effective feature vectors" (in Korean) J. Telecommunications Review 18, 362-370 (2008).
N. C. Maddage, K. Wan, and C. Xu, Wang, "Singing voice detection using twice-iterated composite fourier transform," in Proc. IEEE ICME, 1347-1350 (2004).
S. Gazor and W. Zhang, "A soft voice activity detector based on a Laplacian-Gaussian model," IEEE Trans. Speech Audio Process. 11, 498-505 (2003). https://doi.org/10.1109/TSA.2003.815518
J. Sohn and W. Sung, "A Voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE ICASSP, 365-368 (1998).

피인용 문헌

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals 2017, https://doi.org/10.1007/s11277-017-4645-x

한국음향학회지 (The Journal of the Acoustical Society of Korea)

인간로봇 상호작용을 위한 잡음환경에 강인한 음성 끝점 검출 기법

Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface)

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)