Search | Korea Science

Choi, Sook-Nam;Shen, Guang-Hu;Chung, Hyun-Yeol
- Journal of Korea Multimedia Society
- /
- v.14 no.10
- /
- pp.1221-1228
- /
- 2011
The speech recognition system works well in general indoor environment. However, the recognition performance is dramatically decreased when the system is used in the real environment because of the several noises. In this paper we proposed CSFN-CMVN to improve the recognition performance of the existing CSFN(Cepstral distance based SFN). The CSFN-CMVN method is a combined method of cepstral normalization with CSFN that normalizes silence features using cepstral euclidean distance to classify speech/silence for better performance. From the test results using Aurora 2.0 DB, we could find out that our proposed CSFN-CMVN improves about 7% of more average word accuracy in all the test sets comparing with the typical silence features normalization SFN-I. We can also get improved accuracy of 6% and 5% respectively in compared tests with the conventional SFN-II and CSFN, showing the effectiveness of our proposed method.
https://doi.org/10.9717/kmms.2011.14.10.1221 인용 PDF KSCI

Choi, Bo Kyeong;Ban, Sung Min;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.34 no.4
- /
- pp.316-320
- /
- 2015
In this paper, the pole filtering concept is applied to the Mel-frequency cepstral coefficient (MFCC) feature vectors in the conventional cepstral mean normalization (CMN) and cepstral mean and variance normalization (CMVN) frameworks. Additionally, performance of the cepstral mean and scale normalization (CMSN), which uses scale normalization instead of variance normalization, is evaluated in speech recognition experiments in noisy environments. Because CMN and CMVN are usually performed on a per-utterance basis, in case of short utterance, they have a problem that reliable estimation of the mean and variance is not guaranteed. However, by applying the pole filtering and scale normalization techniques to the feature normalization process, this problem can be relieved. Experimental results using Aurora 2 database (DB) show that feature normalization method combining the pole-filtering and scale normalization yields the best improvements.
https://doi.org/10.7776/ASK.2015.34.4.316 인용 PDF KSCI

Shen, Guanghu;Choi, Sook-Nam;Chung, Hyun-Yeol
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.35 no.7C
- /
- pp.604-610
- /
- 2010
In this paper we proposed FSFN (Filter bank sub-band energy subtraction based CLSFN) method to improve the recognition performance of the existing CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization). The proposed FSFN reduces the energy of noise components in filter bank sub-band domain when extracting the features from speech data. This leads to extract the enhanced cepstral features and thus improves the accuracy of speech/silence classification using the enhanced cepstral features. Therefore, it can be expected to get improved performance comparing with the existing CLSFN. Experimental results conducted on Aurora 2.0 DB showed that our proposed FSFN method improves the averaged word accuracy of 2% comparing with the conventional CLSFN method, and FSFN combined with CMVN (Cepstral Mean and Variance Normalization) also showed the best recognition performance comparing with others.
PDF KSCI

Yang, IL-Ho;Kim, Min-Seok;So, Byung-Min;Kim, Myung-Jae;Yu, Ha-Jin
- Phonetics and Speech Sciences
- /
- v.3 no.2
- /
- pp.71-78
- /
- 2011
In this paper, we propose an approach which constructs classifier ensembles of various channel compensation and feature enhancement methods. CMN and CMVN are used as channel compensation methods. PCA, kernel PCA, greedy kernel PCA, and kernel multimodal discriminant analysis are used as feature enhancement methods. The proposed ensemble system is constructed with the combination of 15 classifiers which include three channel compensation methods (including 'without compensation') and five feature enhancement methods (including 'without enhancement'). Experimental results show that the proposed ensemble system gives highest average speaker identification rate in various environments (channels, noises, and sessions).
PDF