Search | Korea Science

Kim, Gibak
- Journal of Broadcast Engineering
- /
- v.20 no.1
- /
- pp.164-170
- /
- 2015
This paper deals with the adaptation of classification model in the binary mask approach to suppress noise in the noisy environment. The binary mask estimation approach is known to improve speech intelligibility of noisy speech. However, the same type of noisy data for the test data should be included in the training data for building the classification model of binary mask estimation. The eigenvoice adaptation is applied to the noise-independent classification model and the adapted model is used as noise-dependent model. The results are reported in Hit rates and False alarm rates. The experimental results confirmed that the accuracy of classification is improved as the number of adaptation sentences increases.
https://doi.org/10.5909/JBE.2015.20.1.164 인용 PDF KSCI KPUBS HTML

Kim, Gibak
- Journal of Broadcast Engineering
- /
- v.17 no.6
- /
- pp.1061-1068
- /
- 2012
This paper deals with a noise reduction algorithm which uses the binary masking approach in the time-frequency domain to improve speech intelligibility. In the binary masking approach, the noise-corrupted speech is decomposed into time-frequency units. Noise-dominant time-frequency units are removed by setting the corresponding binary masks as "0"s and target-dominant units are retained untouched by assigning mask "1"s. We propose a binary mask estimation by comparing the local signal-to-noise ratio (SNR) to a threshold. The local SNR is estimated by a training-based approach. An optimal threshold is proposed, which is obtained from observing the distribution of the training database. The proposed method is evaluated by normal-hearing subjects and the intelligibility scores are computed by counting the number of words correctly recognized.
https://doi.org/10.5909/JBE.2012.17.6.1061 인용 PDF KSCI

Kim, Gibak
- Journal of Broadcast Engineering
- /
- v.18 no.2
- /
- pp.311-318
- /
- 2013
This paper deals with a noise reduction algorithm which uses the binary masking in the time-frequency domain. To improve speech intelligibility in noise, noise-masked speech is decomposed into time-frequency units and mask "0" is assigned to masker-dominant region removing time-frequency units where noise is dominant compared to speech. In the previous research, Gaussian mixture models were used to classify the speech-dominant region and noise-dominant region which correspond to mask "1" and mask "0", respectively. In each frequency band, data were collected and trained to build the Gaussian mixture models and detection procedure is performed to the test data where each time-frequency unit belongs to speech-dominant region or noise-dominant region. In this paper, we consider the correlation of masks in the frequency domain and propose a post-processing method which exploits the Viterbi algorithm.
https://doi.org/10.5909/JBE.2013.18.2.311 인용 PDF KSCI

Woo, Sung-Min;Jeong, Hong
- Proceedings of the IEEK Conference
- /
- 2008.06a
- /
- pp.1017-1018
- /
- 2008
In this paper, we propose a method that makes use of neighborhood relationship in 2D spectrogram of separated sources toward the generalization of the binary mask in Degenerate Unmixing Estimation Technique (DUET). A new generalized mask can be consist of five to ten mask. According to the new mask, the original power of the spectrogram in each frequency-time point is assigned. The result showed a smooth and tender wave-form, indicating a high speech separation performance compared to the original method.
PDF

Choi, Gab-Keun;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.7
- /
- pp.468-474
- /
- 2010
The major factor that disturbs practical use of speech recognition is distortion by the ambient and channel noises. Generally, the ambient noise drops the performance and restricts places to use. DSR (Distributed Speech Recognition) based speech recognition also has this problem. Various noise cancelling algorithms are applied to solve this problem, but loss of spectrum and remaining noise by incorrect noise estimation at low SNR environments cause drop of recognition rate. This paper proposes methods for speech enhancement. This method uses MMSE-STSA for noise cancelling and ideal binary mask to compensate damaged spectrum. According to experiments at noisy environment (SNR 15 dB ~ 0 dB), the proposed methods showed better spectral results and recognition performance.
https://doi.org/10.7776/ASK.2010.29.7.468 인용 PDF KSCI

Lee, Jae-Eun;Kim, Young-Moon;Lim, Chan;Kang, Hyun-Soo
- The Journal of the Korea Contents Association
- /
- v.7 no.8
- /
- pp.1-12
- /
- 2007
This paper presents a new demixing method that separates each source from a stereo sound mixture. Under the W-Disjoint Orthogonal assumption in DUET(Degenerate Unmixing Estimation Technique) algorithm. The proposed method is mainly processed in time-frequency domain by using windowed-fourier transform. In this paper there are two main contributions: a weighted mask by panning index distances and a binary mask by comparing each channel value. The former has tender demixing characteristic, and the latter has stronger demixing characteristic. In experimental results, we will show that both masks produce more robust demixing than the existing demixing methods do.
https://doi.org/10.5392/JKCA.2007.7.8.001 인용 PDF