Speech Segmentation using Weighted Cross-correlation in CASA System

Kim, JungHo;Kang, ChulHo;

doi:10.5573/ieie.2014.51.5.188

전자공학회논문지 (Journal of the Institute of Electronics and Information Engineers)

제51권5호
/
Pages.188-194
/
2014
/
2287-5026(pISSN)
/
2288-159X(eISSN)

대한전자공학회 (The Institute of Electronics and Information Engineers)

DOI QR Code

계산적 청각 장면 분석 시스템에서 가중치 상호상관계수를 이용한 음성 분리

Speech Segmentation using Weighted Cross-correlation in CASA System

김정호 (광운대학교 전자통신공학과) ;
강철호 (광운대학교 전자통신공학과)

Kim, JungHo (Department of Electronics and Communication Engineering, Kwangwoon University) ;
Kang, ChulHo (Department of Electronics and Communication Engineering, Kwangwoon University)

투고 : 2014.02.19
심사 : 2014.04.30
발행 : 2014.05.25

https://doi.org/10.5573/ieie.2014.51.5.188 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

계산적 청각 장면 분석 시스템의 특징 추출은 시간 연속성과 주파수 채널간에 유사성을 이용하여 청각 요소의 상관지도를 구성한다. 세그먼테이션은 상호상관계수 함수를 이용하여 2진 마스크를 구성하고, 마스크 성분 1(음성)은 동일한 주기성과 동기를 가진다. 그러나 채널간에 비슷한 주기성을 갖지만 지연이 있는 경우에 음성으로 잘못 결정되는 문제가 있다. 본 논문에서는 세그먼테이션에서 가중치 상호상관계수를 이용해 채널간에 유사성의 변별력을 높이는 방법을 제안한다. 계산적 청각 장면 분석 시스템의 음성분리 성능을 평가하기 위하여 배경 잡음(사이렌, 기계, 백색, 자동차, 군중) 환경에서 신호 대 잡음비(5dB, 0dB)의 변화에 따라 실험을 수행하였다. 본 논문에서는 기존의 방법과 제안한 방법과 비교한 결과, 제안한 방법이 기존의 방법에 비하여 각각 신호 대 잡음비 5dB에서 2.75dB 그리고 0dB에서 4.84dB 향상되었다.

The feature extraction mechanism of the CASA(Computational Auditory Scene Analysis) system uses time continuity and frequency channel similarity to compose a correlogram of auditory elements. In segmentation, we compose a binary mask by using cross-correlation function, mask 1(speech) has the same periodicity and synchronization. However, when there is delay between autocorrelation signals with the same periodicity, it is determined as a speech, which is considered to be a drawback. In this paper, we proposed an algorithm to improve discrimination of channel similarity using Weighted Cross-correlation in segmentation. We conducted experiments to evaluate the speech segregation performance of the CASA system in background noise(siren, machine, white, car, crowd) environments by changing SNR 5dB and 0dB. In this paper, we compared the proposed algorithm to the conventional algorithm. The performance of the proposed algorithm has been improved as following: improvement of 2.75dB at SNR 5dB and 4.84dB at SNR 0dB for background noise environment.

키워드

참고문헌

A. S. Bregman, "Auditory Scene Analysis: The Perceptual Organization of Sound," MIT Press, 1994.
Loizou and Philipos C., "Speech Enhancement: Theory and Practice," Crc Press, 2007.
A. Hyvarinen, J. Karhunen and K. Oja, "Independent Component Analysis," Wiley-Interscience, 2001.
D. L. Wang and G. J. Brown, "Computational Auditory Scene Analysis," Wiley-IEEE Press, 2006.
D. L. Wang, "On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis," Speech Separation by Humans and Machines, pp. 181-197, Kluwer Academic, Norwell MA, 2005.
G. Hu and D. L. Wang, "Monaural speech segregation based on pitch tracking and amplitude modulation," IEEE Trans. on Neural Networks, vol. 15, no. 5, pp. 1135-1150, September 2004 https://doi.org/10.1109/TNN.2004.832812
Jung-Ho Kim, Hyung-Hwa Ko, Chul-Ho Kang, "A Study on Voice Activity Detection Using Auditory Scene and Periodic to Aperiodic Component Ratio in CASA System," Journal of The Institute of Electronics Engineers of Korea, vol. 50, no. 10, pp. 181-187, October 2013. https://doi.org/10.5573/ieek.2013.50.10.181
B. R. Glasberg and B. C. J. Moore, "Derivation of auditory filter shapes from notched-noise data," Hearing Research, vol. 47, no. 2, pp. 103-138, August 1990. https://doi.org/10.1016/0378-5955(90)90170-T
G. Jacovitti and G. Scarano, "Discrete Time Techniques for Time Delay Estimation," IEEE Trans. on Signal Processing, vol. 41, no. 2, pp. 525-533, February 1993. https://doi.org/10.1109/78.193195
G. Hu and PNL, "100 Nonspeech Sounds," http://www.cse.ohio-state.edu/pnl/corpus
G. Hu and D. L. Wang, "Auditory Segmentation Based on Onset and Offset Analysis," IEEE Tran. on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 396-405, February 2007. https://doi.org/10.1109/TASL.2006.881700

전자공학회논문지 (Journal of the Institute of Electronics and Information Engineers)

계산적 청각 장면 분석 시스템에서 가중치 상호상관계수를 이용한 음성 분리

Speech Segmentation using Weighted Cross-correlation in CASA System

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)