Search | Korea Science

Speech enhancement method based on feature compensation gain for effective speech recognition in noisy environments (잡음 환경에 효과적인 음성인식을 위한 특징 보상 이득 기반의 음성 향상 기법)

Bae, Ara;Kim, Wooil
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.1
- /
- pp.51-55
- /
- 2019
This paper proposes a speech enhancement method utilizing the feature compensation gain for robust speech recognition performances in noisy environments. In this paper we propose a speech enhancement method utilizing the feature compensation gain which is obtained from the PCGMM (Parallel Combined Gaussian Mixture Model)-based feature compensation method employing variational model composition. The experimental results show that the proposed method significantly outperforms the conventional front-end algorithms and our previous research over various background noise types and SNR (Signal to Noise Ratio) conditions in mismatched ASR (Automatic Speech Recognition) system condition. The computation complexity is significantly reduced by employing the noise model selection technique with maintaining the speech recognition performance at a similar level.
https://doi.org/10.7776/ASK.2019.38.1.051 인용 PDF KSCI HTML

A Study on Speech Recognition using Vocal Tract Area Function (성도 면적 함수를 이용한 음성 인식에 관한 연구)

송제혁;김동준
- Journal of Biomedical Engineering Research
- /
- v.16 no.3
- /
- pp.345-352
- /
- 1995
The LPC cepstrum coefficients, which are an acoustic features of speech signal, have been widely used as the feature parameter for various speech recognition systems and showed good performance. The vocal tract area function is a kind of articulatory feature, which is related with the physiological mechanism of speech production. This paper proposes the vocal tract area function as an alternative feature parameter for speech recognition. The linear predictive analysis using Burg algorithm and the vector quantization are performed. Then, recognition experiments for 5 Korean vowels and 10 digits are executed using the conventional LPC cepstrum coefficients and the vocal tract area function. The recognitions using the area function showed the slightly better results than those using the conventional LPC cepstrum coefficients.
PDF

Method of Speech Feature Parameter Extraction Using Modified-MFCC (Modified-MECC를 이용한 음성 특징 파라미터 추출 방법)

이상복;이철희;정성환;김종교
- Proceedings of the IEEK Conference
- /
- 2001.06d
- /
- pp.269-272
- /
- 2001
In speech recognition technology, the utterance of every talker have special resonant frequency according to shape of talker's lip and to the motion of tongue. And utterances are different according to each talker. Accordingly, we need the superior moth-od of speech feature parameter extraction which reflect talker's characteristic well. This paper suggests the modified-MfCC combined existing MFCC with gammatone filter. We experimented with speech data from telephone and then we obtained results of enhanced speech recognition rate which is higher than that of the other methods.
PDF

A Study on Speech Recognition using Vocal Tract Area function and Vector Quantization (성도 면적 함수와 벡터 양자화를 이용한 음성 인식에 관한 연구)

Song, Jei-Hyuck;Kim, Dong-Jun;Park, Sang-Hui
- Proceedings of the KOSOMBE Conference
- /
- v.1993 no.11
- /
- pp.171-174
- /
- 1993
We propose the vocal tract area function as the feature vector of speech recognition. Vocal tract area function is directly related to speech production. The vocal tract area function is not only showing mechanism of speech production but also can be used as an effective feature vector in speech, recognition in this study.
PDF

Analysis of Feature Extraction Methods for Distinguishing the Speech of Cleft Palate Patients (구개열 환자 발음 판별을 위한 특징 추출 방법 분석)

Kim, Sung Min;Kim, Wooil;Kwon, Tack-Kyun;Sung, Myung-Whun;Sung, Mee Young
- Journal of KIISE
- /
- v.42 no.11
- /
- pp.1372-1379
- /
- 2015
This paper presents an analysis of feature extraction methods used for distinguishing the speech of patients with cleft palates and people with normal palates. This research is a basic study on the development of a software system for automatic recognition and restoration of speech disorders, in pursuit of improving the welfare of speech disabled persons. Monosyllable voice data for experiments were collected for three groups: normal speech, cleft palate speech, and simulated clef palate speech. The data consists of 14 basic Korean consonants, 5 complex consonants, and 7 vowels. Feature extractions are performed using three well-known methods: LPC, MFCC, and PLP. The pattern recognition process is executed using the acoustic model GMM. From our experiments, we concluded that the MFCC method is generally the most effective way to identify speech distortions. These results may contribute to the automatic detection and correction of the distorted speech of cleft palate patients, along with the development of an identification tool for levels of speech distortion.
https://doi.org/10.5626/JOK.2015.42.11.1372 인용 KSCI

Intra-and Inter-frame Features for Automatic Speech Recognition

Lee, Sung Joo;Kang, Byung Ok;Chung, Hoon;Lee, Yunkeun
- ETRI Journal
- /
- v.36 no.3
- /
- pp.514-517
- /
- 2014
In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non-stationary nature of speech, and the other is the inter-frame change of a speech spectrum. We adopt the use of a sub-frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double-delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model-based speech recognizer.
https://doi.org/10.4218/etrij.14.0213.0181 인용 PDF KSCI KPUBS

A Study on the Removal of Unusual Feature Vectors in Speech Recognition (음성인식에서 특이 특징벡터의 제거에 대한 연구)

Lee, Chang-Young
- The Journal of the Korea institute of electronic communication sciences
- /
- v.8 no.4
- /
- pp.561-567
- /
- 2013
Some of the feature vectors for speech recognition are rare and unusual. These patterns lead to overfitting for the parameters of the speech recognition system and, as a result, cause structural risks in the system that hinder the good performance in recognition. In this paper, as a method of removing these unusual patterns, we try to exclude vectors whose norms are larger than a specified cutoff value and then train the speech recognition system. The objective of this study is to exclude as many unusual feature vectors under the condition of no significant degradation in the speech recognition error rate. For this purpose, we introduce a cutoff parameter and investigate the resultant effect on the speaker-independent speech recognition of isolated words by using FVQ(Fuzzy Vector Quantization)/HMM(Hidden Markov Model). Experimental results showed that roughly 3%~6% of the feature vectors might be considered as unusual, and therefore be excluded without deteriorating the speech recognition accuracy.
https://doi.org/10.13067/JKIECS.2013.8.4.561 인용 PDF KSCI

Speech Recognition Optimization Learning Model using HMM Feature Extraction In the Bhattacharyya Algorithm (바타차랴 알고리즘에서 HMM 특징 추출을 이용한 음성 인식 최적 학습 모델)

Oh, Sang-Yeob
- Journal of Digital Convergence
- /
- v.11 no.6
- /
- pp.199-204
- /
- 2013
Speech recognition system is shall be composed model of learning from the inaccurate input speech. Similar phoneme models to recognize, because it leads to the recognition rate decreases. Therefore, in this paper, we propose a method of speech recognition optimal learning model configuration using the Bhattacharyya algorithm. Based on feature of the phonemes, HMM feature extraction method was used for the phonemes in the training data. Similar learning model was recognized as a model of exact learning using the Bhattacharyya algorithm. Optimal learning model configuration using the Bhattacharyya algorithm. Recognition performance was evaluated. In this paper, the result of applying the proposed system showed a recognition rate of 98.7% in the speech recognition.
https://doi.org/10.14400/JDPM.2013.11.6.199 인용 PDF

Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy in Noisy Environments (잡음환경에서 Teager Energy 기반의 전역 음성부재확률을 이용하는 음성검출)

Park, Yun-Sik;Lee, Sang-Min
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.49 no.1
- /
- pp.97-103
- /
- 2012
In this paper, we propose a novel voice activity detection (VAD) algorithm to effectively distinguish speech from nonspeech in various noisy environments. Global speech absence probability (GSAP) derived from likelihood ratio (LR) based on the statistical model is widely used as the feature parameter for VAD. However, the feature parameter based on conventional GSAP is not sufficient to distinguish speech from noise at low SNRs (signal-to-noise ratios). The presented VAD algorithm utilizes GSAP based on Teager energy (TE) as the feature parameter to provide the improved performance of decision for speech segments in noisy environment. Performances of the proposed VAD algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.
PDF KSCI

Implementation of Speaker Independent Speech Recognition System Using Independent Component Analysis based on DSP (독립성분분석을 이용한 DSP 기반의 화자 독립 음성 인식 시스템의 구현)

김창근;박진영;박정원;이광석;허강인
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.8 no.2
- /
- pp.359-364
- /
- 2004
In this paper, we implemented real-time speaker undependent speech recognizer that is robust in noise environment using DSP(Digital Signal Processor). Implemented system is composed of TMS320C32 that is floating-point DSP of Texas Instrument Inc. and CODEC for real-time speech input. Speech feature parameter of the speech recognizer used robust feature parameter in noise environment that is transformed feature space of MFCC(met frequency cepstral coefficient) using ICA(Independent Component Analysis) on behalf of MFCC. In recognition result in noise environment, we hew that recognition performance of ICA feature parameter is superior than that of MFCC.
PDF KSCI

Search Result 711, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)