• Title/Summary/Keyword: Noisy Speech Recognition

Search Result 227, Processing Time 0.035 seconds

The Development of a Speech Recognition Method Robust to Channel Distortions and Noisy Environments for an Audio Response System(ARS) (잡음환경및 채널왜곡에 강인한 ARS용 전화음성인식 방식 연구)

  • Ahn, Jung-Mo;Yim, Kye-Jong;Kay, Young-Chul;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.41-48
    • /
    • 1997
  • This paper proposes the methods for improving the recognition rate of theARS, especially equipped with the speech recognition capability. Telephone speech, which is the input to the ARS, is usually affected by the announcements from the system, channel noise, and channel distortion, thus directly applying the recognition algorithm developed for clean speech to the noisy telephone speech will bring the significant performance degradation. To cope with this problem, this paper proposes three methods: 1)the accurate detection of the inputting instant of the speech in order to immediately turn off the announcements from the system at that instant, 2)the effective end-point detection of the noisy telephone speech on the basis of Teager energy, and 3)the SDCN-based compensation of the channel distortion. Experiments on speaker-independent, noisy telephone speech reveal that the combination of the above three proposed methods provides great improvements on the recognition rate over the conventional method, showing about 77% in contrast to only 23%.

  • PDF

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition (이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출)

  • Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.141-148
    • /
    • 2010
  • In this paper, voice activity detection (VAD) for dual-channel noisy speech recognition is proposed in which spatial cues are employed. In the proposed method, a probability model for speech presence/absence is constructed using spatial cues obtained from dual-channel input signal, and a speech activity interval is detected through this probability model. In particular, spatial cues are composed of interaural time differences and interaural level differences of dual-channel speech signals, and the probability model for speech presence/absence is based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed for speech segments that only include speech intervals detected by the proposed VAD method. The performance of the proposed method is compared with those of several methods such as an SNR-based method, a direction of arrival (DOA) based method, and a phase vector based method. It is shown from the speech recognition experiments that the proposed method outperforms conventional methods by providing relative word error rates reductions of 11.68%, 41.92%, and 10.15% compared with SNR-based, DOA-based, and phase vector based method, respectively.

  • PDF

A Data-Driven Jacobian Adaptation Method for the Noisy Speech Recognition (잡음음성인식을 위한 데이터 기반의 Jacobian 적응방식)

  • Chung Young-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.4
    • /
    • pp.159-163
    • /
    • 2006
  • In this paper a data-driven method to improve the performance of the Jacobian adaptation (JA) for the noisy speech recognition is proposed. In stead of constructing the reference HMM by using the model composition method like the parallel model combination (PMC), we propose to train the reference HMM directly with the noisy speech. This was motivated from the idea that the directly trained reference HMM will model the acoustical variations due to the noise better than the composite HMM. For the estimation of the Jacobian matrices, the Baum-Welch algorithm is employed during the training. The recognition experiments have been done to show the improved performance of the proposed method over the Jacobian adaptation as well as other model compensation methods.

Robust Distributed Speech Recognition under noise environment using MESS and EH-VAD (멀티밴드 스펙트럼 차감법과 엔트로피 하모닉을 이용한 잡음환경에 강인한 분산음성인식)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.101-107
    • /
    • 2011
  • The background noises and distortions by channel are major factors that disturb the practical use of speech recognition. Usually, noise reduce the performance of speech recognition system DSR(Distributed Speech Recognition) based speech recognition also bas difficulty of improving performance for this reason. Therefore, to improve DSR-based speech recognition under noisy environment, this paper proposes a method which detects accurate speech region to extract accurate features. The proposed method distinguish speech and noise by using entropy and detection of spectral energy of speech. The speech detection by the spectral energy of speech shows good performance under relatively high SNR(SNR 15dB). But when the noise environment varies, the threshold between speech and noise also varies, and speech detection performance reduces under low SNR(SNR 0dB) environment. The proposed method uses the spectral entropy and harmonics of speech for better speech detection. Also, the performance of AFE is increased by precise speech detections. According to the result of experiment, the proposed method shows better recognition performance under noise environment.

Syllable-Type-Based Phoneme Weighting Techniques for Listening Intelligibility in Noisy Environments (소음 환경에서의 명료한 청취를 위한 음절형태 기반 음소 가중 기술)

  • Lee, Young Ho;Joo, Jong Han;Choi, Seung Ho
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.165-169
    • /
    • 2014
  • Intelligibility of speech transmitted to listeners can significantly be degraded in noisy environments such as in auditorium and in train station due to ambient noises. Noise-masked speech signal is hard to be recognized by listeners. Among the conventional methods to improve speech intelligibility, consonant-vowel intensity ratio (CVR) approach reinforces the powers of overall consonants. However, excessively reinforced consonant is not helpful in recognition. Furthermore, only some of consonants are improved by the CVR approach. In this paper, we propose the corrective weighting (CW) approach that reinforces the powers of consonants according to syllable-type such as consonant-vowel-consonant (CVC), consonant-vowel (CV) and vowel-consonant (VC) in Korean differently, considering the level of listeners' recognition. The proposed CW approach was evaluated by the subjective test, Comparison Category Rating (CCR) test of ITU-T P.800, showed better performance, that is, 0.18 and 0.24 higher than the unprocessed CVR approach, respectively.

Speech Active Interval Detection Method in Noisy Speech (잡음음성에서의 음성 활성화 구간 검출 방법)

  • Lee, Kwang-Seok;Choo, Yeon-Gyu;Kim, Hyun-Deok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.779-782
    • /
    • 2008
  • It is important to detect speech active interval from Noisy Speech in speech communication and speech recognition. In this research, we propose characteristic parameter with combining spectral Entropy for detect speech active interval in Noisy Speech, and compare performance of speech active interval based on energy. The results shows that analysis using proposed characteristic parameter is higher performance the others in noisy environment.

  • PDF

Speech Recognition in Noise Environment by Independent Component Analysis and Spectral Enhancement (독립 성분 분석과 스펙트럼 향상에 의한 잡음 환경에서의 음성인식)

  • Choi Seung-Ho
    • MALSORI
    • /
    • no.48
    • /
    • pp.81-91
    • /
    • 2003
  • In this paper, we propose a speech recognition method based on independent component analysis (ICA) and spectral enhancement techniques. While ICA tris to separate speech signal from noisy speech using multiple channels, some noise remains by its algorithmic limitations. Spectral enhancement techniques can compensate for lack of ICA's signal separation ability. From the speech recognition experiments with instantaneous and convolved mixing environments, we show that the proposed approach gives much improved recognition accuracies than conventional methods.

  • PDF

Feature Vector Processing for Speech Emotion Recognition in Noisy Environments (잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리)

  • Park, Jeong-Sik;Oh, Yung-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.77-85
    • /
    • 2010
  • This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.

  • PDF

KORAN DIGIT RECOGNITION IN NOISE ENVIRONMENT USING SPECTRAL MAPPING TRAINING

  • Ki Young Lee
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1015-1020
    • /
    • 1994
  • This paper presents the Korean digit recognition method under noise environment using the spectral mapping training based on static supervised adaptation algorithm. In the presented recognition method, as a result of spectral mapping from one space of noisy speech spectrum to another space of speech spectrum without noise, spectral distortion of noisy speech is improved, and the recognition rate is higher than that of the conventional method using VQ and DTW without noise processing, and even when SNR level is 0 dB, the recognition rate is 10 times of that using the conventional method. It has been confirmed that the spectral mapping training has an ability to improve the recognition performance for speech in noise environment.

  • PDF

Noisy Speech Recognition using Probabilistic Spectral Subtraction (확률적 스펙트럼 차감법을 이용한 잡은 환경에서의 음성인식)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.94-99
    • /
    • 1997
  • This paper describes a technique of probabilistic spectral subtraction which uses the knowledge of both noise and speech so as to reduce automatic speech recognition errors in noisy environments. Spectral subtraction method estimates a noise prototype in non-speech intervals and the spectrum of clean speech is obtained from the spectrum of noisy speech by subtracting this noise prototype. Thus noise can not be suppressed effectively using a single noise prototype in case the characteristics of the noise prototype are different from those of the noise contained in input noisy speech. To modify such a drawback, multiple noise prototypes are used in probabilistic subtraction method. In this paper, the probabilistic characteristics of noise and the knowledge of speech which is embedded in hidden Markov models trained in clean environments are used to suppress noise. Futhermore, dynamic feature parameters are considered as well as static feature parameters for effective noise suppression. The proposed method reduced error rates in the recognition of 50 Korean words. The recognition rate was 86.25% with the probabilistic subtraction, 72.75% without any noise suppression method and 80.25% with spectral subtraction at SNR(Signal-to-Noise Ratio) 10 dB.

  • PDF