• 제목/요약/키워드: Speech Recognition Technology

검색결과 527건 처리시간 0.023초

잡음 환경에서의 인식 거부 성능 향상을 위한 신뢰 척도 (Confidence Measure for Utterance Verification in Noisy Environments)

  • 박정식;오영환
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 추계학술대회 발표논문집
    • /
    • pp.3-6
    • /
    • 2006
  • This paper proposes a confidence measure employed for utterance verification in noisy environments. Most of conventional approaches estimate the proper threshold of confidence measure and apply the value to utterance rejection in recognition process. As such, their performance may degrade for noisy speech since the threshold can be changed in noisy environments. This paper presents further robust confidence measure based on the multi-pass confidence measure. The isolated word recognition based experimental results demonstrate that the proposed method outperforms conventional approaches as utterance verifier.

  • PDF

Eigen - Environment 잡음 보상 방법을 이용한 강인한 음성인식 (Robust Speech Recognition using Noise Compensation Method Based on Eigen - Environment)

  • 송화전;김형순
    • 대한음성학회지:말소리
    • /
    • 제52호
    • /
    • pp.145-160
    • /
    • 2004
  • In this paper, a new noise compensation method based on the eigenvoice framework in feature space is proposed to reduce the mismatch between training and testing environments. The difference between clean and noisy environments is represented by the linear combination of K eigenvectors that represent the variation among environments. In the proposed method, the performance improvement of speech recognition systems is largely affected by how to construct the noisy models and the bias vector set. In this paper, two methods, the one based on MAP adaptation method and the other using stereo DB, are proposed to construct the noisy models. In experiments using Aurora 2 DB, we obtained 44.86% relative improvement with eigen-environment method in comparison with baseline system. Especially, in clean condition training mode, our proposed method yielded 66.74% relative improvement, which is better performance than several methods previously proposed in Aurora project.

  • PDF

최적화된 관측 신뢰도와 변형된 HMM 디코더를 이용한 잡음에 강인한 화자식별 시스템 (A Robust Speaker Identification Using Optimized Confidence and Modified HMM Decoder)

  • ;김진영;나승유
    • 대한음성학회지:말소리
    • /
    • 제64호
    • /
    • pp.121-135
    • /
    • 2007
  • Speech signal is distorted by channel characteristics or additive noise and then the performances of speaker or speech recognition are severely degraded. To cope with the noise problem, we propose a modified HMM decoder algorithm using SNR-based observation confidence, which was successfully applied for GMM in speaker identification task. The modification is done by weighting observation probabilities with reliability values obtained from SNR. Also, we apply PSO (particle swarm optimization) method to the confidence function for maximizing the speaker identification performance. To evaluate our proposed method, we used the ETRI database for speaker recognition. The experimental results showed that the performance was definitely enhanced with the modified HMM decoder algorithm.

  • PDF

Eigenvoice 병합을 이용한 연속 음성 인식 시스템의 고속 화자 적응 (Rapid Speaker Adaptation for Continuous Speech Recognition Using Merging Eigenvoices)

  • 최동진;오영환
    • 대한음성학회지:말소리
    • /
    • 제53호
    • /
    • pp.143-156
    • /
    • 2005
  • Speaker adaptation in eigenvoice space is a popular method for rapid speaker adaptation. To improve the performance of the method, the number of speaker dependent models should be increased and eigenvoices should be re-estimated. However, principal component analysis takes much time to find eigenvoices, especially in a continuous speech recognition system. This paper describes a method to reduce computation time to estimate eigenvoices only for supplementary speaker dependent models and to merge them with the used eigenvoices. Experiment results show that the computation time is reduced by 73.7% while the performance is almost the same in case that the number of speaker dependent models is the same as used ones.

  • PDF

강인한 음성 인식을 위한 선형 로그 함수 기반의 MFCC 특징 표현 연구 (Representation of MFCC Feature Based on Linlog Function for Robust Speech Recognition)

  • 윤영선
    • 대한음성학회지:말소리
    • /
    • 제59호
    • /
    • pp.13-25
    • /
    • 2006
  • In previous study, the linlog(linear log) RASTA(J-RASTA) approach based on PLP was proposed to deal with both the channel effect and the additive noise. The extraction of PLP required generally more steps and computation than the extraction of widely used MFCC. Thus, in this paper, we apply the linlog function to the MFCC for investigating the possibility of simple compensation method that removes both distortion. With the experimental results, the proposed method shows the similar tendency to the linlog RASTA-PLP_ When the J value is set to le-6, the best ERR(Error Reduction Rate) of 33% is obtained. For applying the linlog function to the feature extraction process, the J value plays a very important role in compensating the corruption. Thus, the study for the adaptive J or noise dependent J estimation is further required.

  • PDF

소음문장 제거를 위한 음소지속시간 사용 (The Usage of Phoneme Duration Information for Rejecting Garbage Sentences)

  • 구명완;김호경;박성준;김재인
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.219-222
    • /
    • 2003
  • In this paper, we study the usage of phoneme duration information for rejection garbage sentence. First, we build a phoneme duration modeling in a speech recognition system based on dicicion tree state tying, We assume that phone duration has a Gamma distribution. Next, we build a verification module in which word-level confidence measure is used. Finally, we make a comparative study on phoneme duration with speech DB obtained from the live system. This DB consistes of OOT(out-of-task) and ING(in-grammar) utterences. the usage of phone duration information yields that OOT recognition rate is improved by 46% and that another 8.4% error rate is reduced when combined with utterence verification module.

  • PDF

변형된 BBI 알고리즘에 기반한 음성 인식기의 계산량 감축 (Computational Complexity Reduction of Speech Recognizers Based on the Modified Bucket Box Intersection Algorithm)

  • 김건용;김동화
    • 대한음성학회지:말소리
    • /
    • 제60호
    • /
    • pp.109-123
    • /
    • 2006
  • Since computing the log-likelihood of Gaussian mixture density is a major computational burden for the speech recognizer based on the continuous HMM, several techniques have been proposed to reduce the number of mixtures to be used for recognition. In this paper, we propose a modified Bucket Box Intersection (BBI) algorithm, in which two relative thresholds are employed: one is the relative threshold in the conventional BBI algorithm and the other is used to reduce the number of the Gaussian boxes which are intersected by the hyperplanes at the boxes' edges. The experimental results show that the proposed algorithm reduces the number of Gaussian mixtures by 12.92% during the recognition phase with negligible performance degradation compared to the conventional BBI algorithm.

  • PDF

화자 인식을 위한 GMM기반의 이중 보상 구조 (Double Compensation Framework Based on GMM For Speaker Recognition)

  • 김유진;정재호
    • 대한음성학회지:말소리
    • /
    • 제45호
    • /
    • pp.93-105
    • /
    • 2003
  • In this paper, we present a single framework based on GMM for speaker recognition. The proposed framework can simultaneously minimize environmental variations on mismatched conditions and adapt the bias free and speaker-dependent characteristics of claimant utterances to the background GMM to create a speaker model. We compare the closed-set speaker identification for conventional method and the proposed method both on TIMIT and NTIMIT. In the several sets of experiments we show the improved recognition rates on a simulated channel and a telephone channel condition by 7.2% and 27.4% respectively.

  • PDF

다단계 인식기반의 POI 인식기 개발 (Multi-stage Recognition for POI)

  • 전형배;황규웅;정훈;김승희;박준;이윤근
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.131-134
    • /
    • 2007
  • We propose a multi-stage recognizer architecture that reduces the computation load and makes fast recognizer. To improve performance of baseline multi-stage recognizer, we introduced new feature. We used confidence vector for each phone segment instead of best phoneme sequence. The multi-stage recognizer with new feature has better performance on n-best and has more robustness.

  • PDF

Adaptive Comb Filtering을 이용한 이동 통신 환경에서의 효과적인 잡음 제거 (Effective Noise Reduction in Mobile Communication Environment using Adaptive Comb Filtering)

  • 박정식;정규준;오영환
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.203-206
    • /
    • 2003
  • In this paper, we employ the adaptive comb filtering for effective noise reduction in mobile communication environment. Adaptive comb filtering is a well- known method for noise reduction, but requires the correct pitch period and must be applied just in voiced speech frames. To satisfy these requirements we use two kinds of information extracted from speech packets, one of which is the pitch period information measured precisely by a speech coder and the other is the frame rate information related to a decision on speech or silence frame. Experiments on speech recognition system confirm the efficiency of this method. Feature parameters employing this method give superior performance in noise environment to those extracted directly from output speech.

  • PDF