Search | Korea Science

Choi, Bo Kyeong;Ban, Sung Min;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.34 no.4
- /
- pp.316-320
- /
- 2015
In this paper, the pole filtering concept is applied to the Mel-frequency cepstral coefficient (MFCC) feature vectors in the conventional cepstral mean normalization (CMN) and cepstral mean and variance normalization (CMVN) frameworks. Additionally, performance of the cepstral mean and scale normalization (CMSN), which uses scale normalization instead of variance normalization, is evaluated in speech recognition experiments in noisy environments. Because CMN and CMVN are usually performed on a per-utterance basis, in case of short utterance, they have a problem that reliable estimation of the mean and variance is not guaranteed. However, by applying the pole filtering and scale normalization techniques to the feature normalization process, this problem can be relieved. Experimental results using Aurora 2 database (DB) show that feature normalization method combining the pole-filtering and scale normalization yields the best improvements.
https://doi.org/10.7776/ASK.2015.34.4.316 인용 PDF KSCI

Choi, Bo Kyeong;Ban, Sung Min;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.9 no.2
- /
- pp.103-110
- /
- 2017
The pole filtering concept has been successfully applied to cepstral feature normalization techniques for noise-robust speech recognition. In this paper, it is proposed to apply the pole filtering selectively only to the speech intervals, in order to further improve the recognition performance for short utterances in noisy environments. Experimental results on AURORA 2 task with clean-condition training show that the proposed selectively pole-filtered cepstral mean normalization (SPFCMN) and selectively pole-filtered cepstral mean and variance normalization (SPFCMVN) yield error rate reduction of 38.6% and 45.8%, respectively, compared to the baseline system.
https://doi.org/10.13064/KSSS.2017.9.2.103 인용 PDF KSCI

Yang, IL-Ho;Heo, Hee-Soo;Yoon, Sung-Hyun;Yu, Ha-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.35 no.6
- /
- pp.501-509
- /
- 2016
We propose a method to improve the robustness of speaker verification on short test utterances. The accuracy of the state-of-the-art i-vector/probabilistic linear discriminant analysis systems can be degraded when testing utterance durations are short. The proposed method compensates for utterance variations of short test feature vectors using deep neural networks. We design three different types of DNN (Deep Neural Network) structures which are trained with different target output vectors. Each DNN is trained to minimize the discrepancy between the feed-forwarded output of a given short utterance feature and its original long utterance feature. We use short 2-10 s condition of the NIST (National Institute of Standards Technology, U.S.) 2008 SRE (Speaker Recognition Evaluation) corpus to evaluate the method. The experimental results show that the proposed method reduces the minimum detection cost relative to the baseline system.
https://doi.org/10.7776/ASK.2016.35.6.501 인용 PDF KSCI

Han, Jaemin;Kim, Min Sik;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.1
- /
- pp.64-68
- /
- 2020
In a conventional speech recognition system using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), the cepstral feature normalization method based on pole filtering was effective in improving the performance of recognition of short utterances in noisy environments. In this paper, the usefulness of this method for the state-of-the-art speech recognition system using Deep Neural Network (DNN) is examined. Experimental results on AURORA 2 DB show that the cepstral mean and variance normalization based on pole filtering improves the recognition performance of very short utterances compared to that without pole filtering, especially when there is a large mismatch between the training and test conditions.
https://doi.org/10.7776/ASK.2020.39.1.064 인용 PDF KSCI