• Title/Summary/Keyword: Noise robust speech recognition

Search Result 134, Processing Time 0.025 seconds

A Study on Performance Improvement Method for the Multi-Model Speech Recognition System in the DSR Environment (DSR 환경에서의 다 모델 음성 인식시스템의 성능 향상 방법에 관한 연구)

  • Jang, Hyun-Baek;Chung, Yong-Joo
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.2
    • /
    • pp.137-142
    • /
    • 2010
  • Although multi-model speech recognizer has been shown to be quite successful in noisy speech recognition, the results were based on general speech front-ends which do not take into account noise adaptation techniques. In this paper, for the accurate evaluation of the multi-model based speech recognizer, we adopted a quite noise-robust speech front-end, AFE, which was proposed by the ETSI for the noisy DSR environment. For the performance comparison, the MTR which is known to give good results in the DSR environment has been used. Also, we modified the structure of the multi-model based speech recognizer to improve the recognition performance. N reference HMMs which are most similar to the input noisy speech are used as the acoustic models for recognition to cope with the errors in the selection of the reference HMMs and the noise signal variability. In addition, multiple SNR levels are used to train each of the reference HMMs to improve the robustness of the acoustic models. From the experimental results on the Aurora 2 databases, we could see better recognition rates using the modified multi-model based speech recognizer compared with the previous method.

Speech Enhancement Based on Feature Compensation for Independently Applying to Different Types of Speech Recognition Systems (이기종 음성 인식 시스템에 독립적으로 적용 가능한 특징 보상 기반의 음성 향상 기법)

  • Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.10
    • /
    • pp.2367-2374
    • /
    • 2014
  • This paper proposes a speech enhancement method which can be independently applied to different types of speech recognition systems. Feature compensation methods are well known to be effective as a front-end algorithm for robust speech recognition in noisy environments. The feature types and speech model employed by the feature compensation methods should be matched with ones of the speech recognition system for their effectiveness. However, they cannot be successfully employed by the speech recognition with "unknown" specification, such as a commercialized speech recognition engine. In this paper, a speech enhancement method is proposed, which is based on the PCGMM-based feature compensation method. The experimental results show that the proposed method significantly outperforms the conventional front-end algorithms for unknown speech recognition over various background noise conditions.

PCMM-Based Feature Compensation Method Using Multiple Model to Cope with Time-Varying Noise (시변 잡음에 대처하기 위한 다중 모델을 이용한 PCMM 기반 특징 보상 기법)

  • 김우일;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.473-480
    • /
    • 2004
  • In this paper we propose an effective feature compensation scheme based on the speech model in order to achieve robust speech recognition. The proposed feature compensation method is based on parallel combined mixture model (PCMM). The previous PCMM works require a highly sophisticated procedure for estimation of the combined mixture model in order to reflect the time-varying noisy conditions at every utterance. The proposed schemes can cope with the time-varying background noise by employing the interpolation method of the multiple mixture models. We apply the‘data-driven’method to PCMM tot move reliable model combination and introduce a frame-synched version for estimation of environments posteriori. In order to reduce the computational complexity due to multiple models, we propose a technique for mixture sharing. The statistically similar Gaussian components are selected and the smoothed versions are generated for sharing. The performance is examined over Aurora 2.0 and speech corpus recorded while car-driving. The experimental results indicate that the proposed schemes are effective in realizing robust speech recognition and reducing the computational complexities under both simulated environments and real-life conditions.

The Magnitude Distribution method of U/V decision (음성신호의 전폭분포를 이용한 유/무성음 검출에 대한 연구)

  • 배성근
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1993.06a
    • /
    • pp.249-252
    • /
    • 1993
  • In speech signal processing, The accurate detection of the voiced/unvoiced is important for robust word recognition and analysis. This algorithm is based on the MD in the frame of speech signals that does not require statistical information about either signal or background-noise to decide a voiced/unvoiced. This paper presents a method of estimation the Characteristic of Magnitude Distribution from noisy speech and also of estimation the optimal threshold based on the MD of the voiced/unvoiced decision. The performances of this detectors is evaluated and compared to that obtained from classifying other paper.

  • PDF

Features for Figure Speech Recognition in Noise Environment (잡음환경에서의 숫자음 인식을 위한 특징파라메타)

  • Lee, Jae-Ki;Koh, Si-Young;Lee, Kwang-Suk;Hur, Kang-In
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.473-476
    • /
    • 2005
  • This paper is proposed a robust various feature parameters in noise. Feature parameter MFCC(Mel Frequency Cepstral Coefficient) used in conventional speech recognition shows good performance. But, parameter transformed feature space that uses PCA(Principal Component Analysis)and ICA(Independent Component Analysis) that is algorithm transformed parameter MFCC's feature space that use in old for more robust performance in noise is compared with the conventional parameter MFCC's performance. The result shows more superior performance than parameter and MFCC that feature parameter transformed by the result ICA is transformed by PCA.

  • PDF

Noise-Robust Speech Recognition Using Histogram-Based Over-estimation Technique (히스토그램 기반의 과추정 방식을 이용한 잡음에 강인한 음성인식)

  • 권영욱;김형순
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.6
    • /
    • pp.53-61
    • /
    • 2000
  • In the speech recognition under the noisy environments, reducing the mismatch introduced between training and testing environments is an important issue. Spectral subtraction is widely used technique because of its simplicity and relatively good performance in noisy environments. In this paper, we introduce histogram method as a reliable noise estimation approach for spectral subtraction. This method has advantages over the conventional noise estimation methods in that it does not need to detect non-speech intervals and it can estimate the noise spectra even in time-varying noise environments. Even though spectral subtraction is performed using a reliable average noise spectrum by the histogram method, considerable amount of residual noise remains due to the variations of instantaneous noise spectrum about mean. To overcome this limitation, we propose a new over-estimation technique based on distribution characteristics of histogram used for noise estimation. Since the proposed technique decides the degree of over-estimation adaptively according to the measured noise distribution, it has advantages to be few the influence of the SNR variation on the noise levels. According to speaker-independent isolated word recognition experiments in car noise environment under various SNR conditions, the proposed histogram-based over-estimation technique outperforms the conventional over-estimation technique.

  • PDF

The Human-Machine Interface System with the Embedded Speech recognition for the telematics of the automobiles (자동차 텔레매틱스용 내장형 음성 HMI시스템)

  • 권오일
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.2
    • /
    • pp.1-8
    • /
    • 2004
  • In this paper, we implement the Digital Signal Processing System based on Human Machine Interface technology for the telematics with embedded noise-robust speech recognition engine and develop the communication system which can be applied to the automobile information center through the human-machine interface technology. Through the embedded speech recognition engine, we can develop the total DSP system based on Human Machine Interface for the telematics in order to test the total system and also the total telematics services.

Robust Speech Recognition Using Real-Time High Order Statistics Normalization and Smoothing Filter (실시간 고차통계 정규화와 Smoothing 필터를 이용한 강인한 음성인식)

  • Jeong, Ju-Hyun;Song, Hwa-Jeon;Kim, Hyung-Soon
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.91-94
    • /
    • 2005
  • The performance of speech recognition is degraded by the mismatch between training and test environments. Many methods have been presented to compensate for additive noise and channel effect in the cepstral domain, and Cepstral Mean Subtraction (CMS) is the representative method among them. Recently, high order cepstral moment normalization method has introduced to improve recognition accuracy. In this paper, we apply high order moment normalization method and smoothing filter for real-time processing. In experiments using Aurora2 DB, we obtained error rate reduction of 49.7% with the proposed algorithm in comparison with baseline system.

  • PDF

Mask Estimation Based on Band-Independent Bayesian Classifler for Missing-Feature Reconstruction (Missing-Feature 복구를 위한 대역 독립 방식의 베이시안 분류기 기반 마스크 예측 기법)

  • Kim Wooil;Stern Richard M.;Ko Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.2
    • /
    • pp.78-87
    • /
    • 2006
  • In this paper. we propose an effective mask estimation scheme for missing-feature reconstruction in order to achieve robust speech recognition under unknown noise environments. In the previous work. colored noise is used for training the mask classifer, which is generated from the entire frequency Partitioned signals. However it gives a limited performance under the restricted number of training database. To reflect the spectral events of more various background noise and improve the performance simultaneously. a new Bayesian classifier for mask estimation is proposed, which works independent of other frequency bands. In the proposed method, we employ the colored noise which is obtained by combining colored noises generated from each frequency band in order to reflect more various noise environments and mitigate the 'sparse' database problem. Combined with the cluster-based missing-feature reconstruction. the performance of the proposed method is evaluated on a task of noisy speech recognition. The results show that the proposed method has improved performance compared to the Previous method under white noise. car noise and background music conditions.

A Study on the Features for Building Korean Digit Recognition System Based on Multilayer Perceptron (다층 퍼셉트론에 기반한 한국어 숫자음 인식시스템 구현을 위한 특징 연구)

  • 김인철;김대영
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.6 no.4
    • /
    • pp.81-88
    • /
    • 2001
  • In this paper, a Korean digit recognition system based on a multilayer Perceptron is implemented. We also investigate the performance of widely used speech features, such as the Mel-scale filterbank, MFCC, LPCC, and PLP coefficients, by applying them as input of the proposed recognition system. In order to build a robust speech system, the experiments for demonstrating its recognition performance for the clean data as well as corrupt data are carried out. In experiments of recognizing 20 Korean digit, we found that the Mel-scale filterbank coefficients performs best in terms of recognition accuracy for the speech dependent and speech independent database even though noise is considerably added.

  • PDF