Browse > Article
http://dx.doi.org/10.5573/ieie.2014.51.5.188

Speech Segmentation using Weighted Cross-correlation in CASA System  

Kim, JungHo (Department of Electronics and Communication Engineering, Kwangwoon University)
Kang, ChulHo (Department of Electronics and Communication Engineering, Kwangwoon University)
Publication Information
Journal of the Institute of Electronics and Information Engineers / v.51, no.5, 2014 , pp. 188-194 More about this Journal
Abstract
The feature extraction mechanism of the CASA(Computational Auditory Scene Analysis) system uses time continuity and frequency channel similarity to compose a correlogram of auditory elements. In segmentation, we compose a binary mask by using cross-correlation function, mask 1(speech) has the same periodicity and synchronization. However, when there is delay between autocorrelation signals with the same periodicity, it is determined as a speech, which is considered to be a drawback. In this paper, we proposed an algorithm to improve discrimination of channel similarity using Weighted Cross-correlation in segmentation. We conducted experiments to evaluate the speech segregation performance of the CASA system in background noise(siren, machine, white, car, crowd) environments by changing SNR 5dB and 0dB. In this paper, we compared the proposed algorithm to the conventional algorithm. The performance of the proposed algorithm has been improved as following: improvement of 2.75dB at SNR 5dB and 4.84dB at SNR 0dB for background noise environment.
Keywords
CASA; Binary Mask; VAD;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Jung-Ho Kim, Hyung-Hwa Ko, Chul-Ho Kang, "A Study on Voice Activity Detection Using Auditory Scene and Periodic to Aperiodic Component Ratio in CASA System," Journal of The Institute of Electronics Engineers of Korea, vol. 50, no. 10, pp. 181-187, October 2013.   과학기술학회마을   DOI   ScienceOn
2 G. Hu and D. L. Wang, "Auditory Segmentation Based on Onset and Offset Analysis," IEEE Tran. on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 396-405, February 2007.   DOI   ScienceOn
3 B. R. Glasberg and B. C. J. Moore, "Derivation of auditory filter shapes from notched-noise data," Hearing Research, vol. 47, no. 2, pp. 103-138, August 1990.   DOI   ScienceOn
4 G. Jacovitti and G. Scarano, "Discrete Time Techniques for Time Delay Estimation," IEEE Trans. on Signal Processing, vol. 41, no. 2, pp. 525-533, February 1993.   DOI   ScienceOn
5 G. Hu and PNL, "100 Nonspeech Sounds," http://www.cse.ohio-state.edu/pnl/corpus
6 D. L. Wang, "On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis," Speech Separation by Humans and Machines, pp. 181-197, Kluwer Academic, Norwell MA, 2005.
7 G. Hu and D. L. Wang, "Monaural speech segregation based on pitch tracking and amplitude modulation," IEEE Trans. on Neural Networks, vol. 15, no. 5, pp. 1135-1150, September 2004   DOI   ScienceOn
8 A. S. Bregman, "Auditory Scene Analysis: The Perceptual Organization of Sound," MIT Press, 1994.
9 Loizou and Philipos C., "Speech Enhancement: Theory and Practice," Crc Press, 2007.
10 A. Hyvarinen, J. Karhunen and K. Oja, "Independent Component Analysis," Wiley-Interscience, 2001.
11 D. L. Wang and G. J. Brown, "Computational Auditory Scene Analysis," Wiley-IEEE Press, 2006.