• Title/Summary/Keyword: Noisy Speech Recognition

Search Result 228, Processing Time 0.025 seconds

Noise Reduction Using MMSE Estimator-based Adaptive Comb Filtering (MMSE Estimator 기반의 적응 콤 필터링을 이용한 잡음 제거)

  • Park, Jeong-Sik;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.60
    • /
    • pp.181-190
    • /
    • 2006
  • This paper describes a speech enhancement scheme that leads to significant improvements in recognition performance when used in the ASR front-end. The proposed approach is based on adaptive comb filtering and an MMSE-related parameter estimator. While adaptive comb filtering reduces noise components remarkably, it is rarely effective in reducing non-stationary noises. Furthermore, due to the uniformly distributed frequency response of the comb-filter, it can cause serious distortion to clean speech signals. This paper proposes an improved comb-filter that adjusts its spectral magnitude to the original speech, based on the speech absence probability and the gain modification function. In addition, we introduce the modified comb filtering-based speech enhancement scheme for ASR in mobile environments. Evaluation experiments carried out using the Aurora 2 database demonstrate that the proposed method outperforms conventional adaptive comb filtering techniques in both clean and noisy environments.

  • PDF

A Study on Speech Recognition in a Running Automobile (주행중인 자동차 환경에서의 음성인식 연구)

  • 양진우;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.3-8
    • /
    • 2000
  • In this paper, we studied design and implementation of a robust speech recognition system in noisy car environment. The reference pattern used in the system is DMS(Dynamic Multi-Section). Two separate acoustic models, which are selected automatically depending on the noisy car environment for the speech in a car moving at below 80km/h and over 80km/h are proposed. PLP(Perceptual Linear Predictive) of order 13 is used for the feature vector and OSDP (One-Stage Dynamic Programming) is used for decoding. The system also has the function of editing the phone-book for voice dialing. The system yields a recognition rate of 89.75% for male speakers in SI (speaker independent) mode in a car running on a cemented express way at over 80km/h with a vocabulary of 33 words. The system also yields a recognition rate of 92.29% for male speakers in SI mode in a car running on a paved express way at over 80km/h.

  • PDF

Comparison of Integration Methods of Speech and Lip Information in the Bi-modal Speech Recognition (바이모달 음성인식의 음성정보와 입술정보 결합방법 비교)

  • 박병구;김진영;최승호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.31-37
    • /
    • 1999
  • A bimodal speech recognition using visual and audio information has been proposed and researched to improve the performance of ASR(Automatic Speech Recognition) system in noisy environments. The integration method of two modalities can be usually classified into an early integration and a late integration. The early integration method includes a method using a fixed weight of lip parameters and a method using a variable weight according to speech SNR information. The 4 late integration methods are a method using audio and visual information independently, a method using speech optimal path, a method using lip optimal path and a way using speech SNR information. Among these 6 methods, the method using the fixed weight of lip parameter showed a better recognition rate.

  • PDF

Class-Based Histogram Equalization for Robust Speech Recognition

  • Suh, Young-Joo;Kim, Hoi-Rin
    • ETRI Journal
    • /
    • v.28 no.4
    • /
    • pp.502-505
    • /
    • 2006
  • A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating the acoustic mismatch between training and test environments, but also at reducing the discrepancy between the phonetic distributions of training and test speech data. The algorithm utilizes multiple class-specific reference and test cumulative distribution functions, classifies the noisy test features into their corresponding classes, and equalizes the features by using their corresponding class-specific reference and test distributions. Experiments on the Aurora 2 database proved the effectiveness of the proposed method by reducing relative errors by 18.74%, 17.52%, and 23.45% over the conventional histogram equalization method and by 59.43%, 66.00%, and 50.50% over mel-cepstral-based features for test sets A, B, and C, respectively.

  • PDF

Performance Improvement ofSpeech Recognition Based on SPLICEin Noisy Environments (SPLICE 방법에 기반한 잡음 환경에서의 음성 인식 성능 향상)

  • Kim, Jong-Hyeon;Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
    • MALSORI
    • /
    • no.53
    • /
    • pp.103-118
    • /
    • 2005
  • The performance of speech recognition system is degraded by mismatch between training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE) was introduced to overcome environmental mismatch using stereo data. In this paper, we propose several methods to improve the conventional SPLICE and evaluate them in the Aurora2 task. We generalize SPLICE to compensate for covariance matrix as well as mean vector in the feature space, and thereby yielding the error rate reduction of 48.93%. We also employ the weighted sum of correction vectors using posterior probabilities of all Gaussians, and the error rate reduction of 48.62% is achieved. With the combination of the above two methods, the error rate is reduced by 49.61% from the Aurora2 baseline system.

  • PDF

Robust Speech Recognition Using Independent Component Analysis (독립성분분석을 이용한 강인한 음성인식)

  • 임형규;이창기
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.2
    • /
    • pp.269-274
    • /
    • 2004
  • Noisy speech recognition is one of most important problems in speech recognition. In this paper, a method which efficiently removes the mixed noise with speech, is proposed. The proposed method is based on the ICA to separate the mixed noise. ICA(Independent component analysis) is a signal processing technique, whose goal is to express a set of random variables as linear combinations of components that are statistically as independent from each other as possible.

  • PDF

Preprocessing Technique for Improvement of Speech Recognition in a Car (차량에서의 음성인식율 향상을 위한 전처리 기법)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.139-146
    • /
    • 2009
  • This paper addresses a modified spectral subtraction schemes which is suitable to speech recognition under low signal-to-noise ratio (SNR) noisy environment such as the automatic speech recognition (ASR) system in car. The conventional spectral subtraction schemes rely on the SNR such that attenuation is imposed on that part of the spectrum that appears to have low SNR, and accentuation is made on that part of high SNR. However, such postulation is adequate for high SNR environment, it is grossly inadequate for low SNR scenarios such as that of car environment. Proposed methods focused specifically to low SNR noisy environment by using weighting function for enhancing speech dominant region in speech spectrum. Experimental results by using voice commands for car show the superior performance of the proposed method over conventional methods.

Feature Compensation Method Based on Parallel Combined Mixture Model (병렬 결합된 혼합 모델 기반의 특징 보상 기술)

  • 김우일;이흥규;권오일;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.603-611
    • /
    • 2003
  • This paper proposes an effective feature compensation scheme based on speech model for achieving robust speech recognition. Conventional model-based method requires off-line training with noisy speech database and is not suitable for online adaptation. In the proposed scheme, we can relax the off-line training with noisy speech database by employing the parallel model combination technique for estimation of correction factors. Applying the model combination process over to the mixture model alone as opposed to entire HMM makes the online model combination possible. Exploiting the availability of noise model from off-line sources, we accomplish the online adaptation via MAP (Maximum A Posteriori) estimation. In addition, the online channel estimation procedure is induced within the proposed framework. For more efficient implementation, we propose a selective model combination which leads to reduction or the computational complexities. The representative experimental results indicate that the suggested algorithm is effective in realizing robust speech recognition under the combined adverse conditions of additive background noise and channel distortion.

On-line model compensation using noise masking effect for robust speech recognition (잡음 차폐를 이용한 온라인 모델 보상)

  • Jung Gue-Jun;Cho Hoon-Young;Oh Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.215-218
    • /
    • 2003
  • In this paper we apply PMC (parallel model combination) to speech recognition system online. As a representative of model based noise compensation techniques, PMC compensates environmental mismatch by combining pretrained clean speech models and real-time estimated noise information. This is very effective approach for compensating extreme environmental mismatch but is inadequate to use in on-line system for heavy computational cost. To reduce the computational cost and to apply PMC online, we use a noise masking effect - the energy in a frequency band is dominated either by clean speech energy or by noise energy - in the process of model compensation. Experiments on artificially produced noisy speech data confirm that the proposed technique is fast and effective for the on-line model compensation.

  • PDF

Speaker Identification Using Score-based Confidence in Noisy Environments (스코어 기반 관측신뢰도를 이용한 잡음환경하 화자식별)

  • Min, So-Hee;Song, Min-Gyu;Na, Seung-You;Choi, Seung-Ho;Kim, Jin-Young
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.145-156
    • /
    • 2007
  • The performance of speaker identification is severely degraded in noisy environments. Recently probability weighting method based on observation membership was proposed for overcoming the noise problem[1]. In the paper[1] the observation confidence was calculated from SNR with sigmoid function. However, estimating SNR needs additive calculation amount and estimated SNR is corrupted in dynamic noisy environments. In this paper we propose estimation methods of the observation confidence based on score-based reliabilities (SBR) of entropy and dispersion measures. Generally SBRs are obtained from speaker models' probabilities. The proposed methods are evaluated with ETRI speaker recognition DB. We compared the performances of the proposed methods with those in [1][8]. The experimental results show that the proposed methods can be successfully applied for the case where SNR is not available.

  • PDF