• 제목/요약/키워드: Noisy Speech Recognition

검색결과 228건 처리시간 0.027초

자동차 잡음환경 고립단어 음성인식에서의 VTS와 PMC의 성능비교 (Performance Comparison between the PMC and VTS Method for the Isolated Speech Recognition in Car Noise Environments)

  • 정용주;이승욱
    • 음성과학
    • /
    • 제10권3호
    • /
    • pp.251-261
    • /
    • 2003
  • There has been many research efforts to overcome the problems of speech recognition in noisy conditions. Among the noise-robust speech recognition methods, model-based adaptation approaches have been shown quite effective. Particularly, the PMC (parallel model combination) method is very popular and has been shown to give considerably improved recognition results compared with the conventional methods. In this paper, we experimented with the VTS (vector Taylor series) algorithm which is also based on the model parameter transformation but has not attracted much interests of the researchers in this area. To verify the effectiveness of it, we employed the algorithm in the continuous density HMM (Hidden Markov Model). We compared the performance of the VTS algorithm with the PMC method and could see that the it gave better results than the PMC method.

  • PDF

문맥 및 사용 패턴 정보를 이용한 음성인식의 성능 개선 (Performance Improvement of Speech Recognition Using Context and Usage Pattern Information)

  • 송원문;김명원
    • 정보처리학회논문지B
    • /
    • 제13B권5호
    • /
    • pp.553-560
    • /
    • 2006
  • 최근 음성인식에서는 잡음환경에서 좀 더 신뢰성 있는 결과를 얻기 위해 인식 결과 도출 단계에서 여러 가지 정보의 내용들을 융합하거나 이전 인식 결과의 후처리를 통하여 성능을 향상시키는 방법들이 연구되고 있다. 본 논문에서는 잡음 환경에서의 인식률 하락을 보완하기 위해 개인 모바일 기기를 위한 음성 명령어 인식에서 사용자의 사용패턴과 문맥 정보를 사용하는 방법을 제안한다. 기본 인식 결과를 보정하기 위해서 현재 명령어를 발화하기 이전에 사용자가 사용한 순차적 명령어 패턴을 사용하였다. 또한 문맥 정보를 위해서는 사용중인 기기의 현재 기능과 발화된 명령어간의 연관성을 사용하였다. 실험을 통해 제안한 방법이 기본 인식 시스템에서 발생한 오인식의 약 50%를 수정하였음을 보였으며 이로써 제안한 방법의 타당성을 검증하였다.

가중 ARMA 필터를 이용한 강인한 음성인식 (Robust Speech Recognition Using Weighted Auto-Regressive Moving Average Filter)

  • 반성민;김형순
    • 말소리와 음성과학
    • /
    • 제2권4호
    • /
    • pp.145-151
    • /
    • 2010
  • In this paper, a robust feature compensation method is proposed for improving the performance of speech recognition. The proposed method is incorporated into the auto-regressive moving average (ARMA) based feature compensation. We employ variable weights for the ARMA filter according to the degree of speech activity, and pass the normalized cepstral sequence through the weighted ARMA filter. Additionally when normalizing the cepstral sequences in training, the cepstral means and variances are estimated from total training utterances. Experimental results show the proposed method significantly improves the speech recognition performance in the noisy and reverberant environments.

  • PDF

히스토그램 기반의 Over-estimation을 이용한 잡음환경에서의 음성인식 (Speech Recognition in Noisy Environrrents using Histogram-based Over-estimation)

  • 권영욱
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
    • /
    • pp.262-266
    • /
    • 1998
  • In the speech recognition under the noisy environments, reducing the mismatch introduced between training and testing environments is an important issue, and spectral subtraction is widely used technique because of its simplicity and relatively good performance in noisy environments. In this paper, we introduced histogram method as a reliable noise estimationi approach for spectral subtraction. To deal with the problem of residual noise after spectral subtraction, we proposed a new ove-estimation technique based on distribution characteristics of histogram used for noise estimation. Since the proposed technique decides the degree of over-estimation adaptively according to the measured noise distribution, it can cope with the SNR variations effectively in compared with the conventional over-estimation technique.

  • PDF

A Novel Integration Scheme for Audio Visual Speech Recognition

  • Pham, Than Trung;Kim, Jin-Young;Na, Seung-You
    • 한국음향학회지
    • /
    • 제28권8호
    • /
    • pp.832-842
    • /
    • 2009
  • Automatic speech recognition (ASR) has been successfully applied to many real human computer interaction (HCI) applications; however, its performance tends to be significantly decreased under noisy environments. The invention of audio visual speech recognition (AVSR) using an acoustic signal and lip motion has recently attracted more attention due to its noise-robustness characteristic. In this paper, we describe our novel integration scheme for AVSR based on a late integration approach. Firstly, we introduce the robust reliability measurement for audio and visual modalities using model based information and signal based information. The model based sources measure the confusability of vocabulary while the signal is used to estimate the noise level. Secondly, the output probabilities of audio and visual speech recognizers are normalized respectively before applying the final integration step using normalized output space and estimated weights. We evaluate the performance of our proposed method via Korean isolated word recognition system. The experimental results demonstrate the effectiveness and feasibility of our proposed system compared to the conventional systems.

입술정보를 이용한 음성 특징 파라미터 추정 및 음성인식 성능향상 (Estimation of speech feature vectors and enhancement of speech recognition performance using lip information)

  • 민소희;김진영;최승호
    • 대한음성학회지:말소리
    • /
    • 제44호
    • /
    • pp.83-92
    • /
    • 2002
  • Speech recognition performance is severly degraded under noisy envrionments. One approach to cope with this problem is audio-visual speech recognition. In this paper, we discuss the experiment results of bimodal speech recongition based on enhanced speech feature vectors using lip information. We try various kinds of speech features as like linear predicion coefficient, cepstrum, log area ratio and etc for transforming lip information into speech parameters. The experimental results show that the cepstrum parameter is the best feature in the point of reconition rate. Also, we present the desirable weighting values of audio and visual informations depending on signal-to-noiso ratio.

  • PDF

음성기반 멀티모달 사용자 인터페이스의 사용성 평가 방법론 (Usability Test Guidelines for Speech-Oriented Multimodal User Interface)

  • 홍기형
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.103-120
    • /
    • 2008
  • Basic components for multimodal interface, such as speech recognition, speech synthesis, gesture recognition, and multimodal fusion, have their own technological limitations. For example, the accuracy of speech recognition decreases for large vocabulary and in noisy environments. In spite of those technological limitations, there are lots of applications in which speech-oriented multimodal user interfaces are very helpful to users. However, in order to expand application areas for speech-oriented multimodal interfaces, we have to develop the interfaces focused on usability. In this paper, we introduce usability and user-centered design methodology in general. There has been much work for evaluating spoken dialogue systems. We give a summary for PARADISE (PARAdigm for Dialogue System Evaluation) and PROMISE (PROcedure for Multimodal Interactive System Evaluation) that are the generalized evaluation frameworks for voice and multimodal user interfaces. Then, we present usability components for speech-oriented multimodal user interfaces and usability testing guidelines that can be used in a user-centered multimodal interface design process.

  • PDF

VoIP 환경에서의 잡음제거를 위한 최적화된 위너 필터 (Optimized Wiener Filter for Noise Reduction in VoIP Environments)

  • 정상배;이성독;한민수
    • 대한음성학회지:말소리
    • /
    • 제64호
    • /
    • pp.105-119
    • /
    • 2007
  • Noise reduction technologies are indispensable to achieve acceptable speech quality in VoIP systems. This paper proposes a Wiener filter optimized to the estimated SNR of noisy speech for the noise reduction in VoIP environments. The proposed noise canceller is applied as a pre-processor before speech encoding. The performance of the proposed method is evaluated by the PESQ in various noisy conditions. In this paper, the proposed algorithm is applied to G.711, G.723.1, and G.729A which are all VoIP speech codecs. The PESQ results show that the performance of our proposed noise reduction scheme outperforms those of the noise suppression in the IS-127 EVRC and the ETSI standard for the advanced distributed speech recognition front-end.

  • PDF

반향제거기를 갖는 자동차 실내 환경에서의 음성인식 (Robust speech recognition in car environment with echo canceller)

  • 박철호;허원철;배건성
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.147-150
    • /
    • 2005
  • The performance of speech recognition in car environment is severely degraded when there is music or news coming from a radio or a CD player. Since reference signals are available from the audio unit in the car, it is possible to remove them with an adaptive filter. In this paper, we present experimental results of speech recognition in car environment using the echo canceller. For this, we generate test speech signals by adding music or news to the car noisy speech from Aurora2 DB. The HTK-based continuous HMT system is constructed for a recognition system. In addition, the MMSE-STSA method is used to the output of the echo canceller to remove the residual noise more.

  • PDF

음성 에너지 최대화와 묵음 특징 정규화를 이용한 잡음 환경에 강인한 음성 검출 (Voice Activity Detection in Noisy Environment using Speech Energy Maximization and Silence Feature Normalization)

  • 안찬식;최기호
    • 디지털융복합연구
    • /
    • 제11권6호
    • /
    • pp.169-174
    • /
    • 2013
  • 음성 인식 성능 저하의 문제는 모델 훈련 환경과 인식 환경의 차이이다. 이러한 환경의 불일치를 줄이기 위한 방법으로 다양한 묵음 특징 정규화 방법을 사용하고 있다. 기존의 묵음 특징 정규화 방법은 낮은 신호 대 잡음비에서 묵음 구간의 에너지 레벨이 증가하여 음성과 비음성에 대한 분류의 정확도가 떨어짐으로 인해 인식 성능이 저하되는 문제점이 있다. 본 논문에서는 음성 에너지 최대화와 묵음 특징 정규화를 이용한 잡음 환경에 강인한 음성 검출 방법을 제안하였다. 제안한 방법은 높은 신호 대 잡음비에서는 음성 에너지를 최대화시켜 특징이 잡음의 영향을 적게 받는 특성을 이용하였고 낮은 신호 대 잡음비에서는 음성/비음성의 켑스트럼 특징 분포 특성을 이용하여 인식 성능을 향상시켰다. 인식 실험 결과 기존 방법에 비해 향상된 인식 성능을 확인할 수 있었다.