• Title/Summary/Keyword: Speech Recognition Error

Search Result 282, Processing Time 0.025 seconds

Machine scoring method for speech recognizer detection mispronunciation of foreign language (외국어 발화오류 검출 음성인식기를 위한 스코어링 기법)

  • Kang, Hyo-Won;Bae, Min-Young;Lee, Jae-Kang;Kwon, Chul-Hong
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.239-242
    • /
    • 2004
  • An automatic pronunciation correction system provides users with correction guidelines for each pronunciation error. For this purpose, we propose a speech recognition system which automatically classifies pronunciation errors when Koreans speak a foreign language. In this paper, we also propose machine scoring methods for automatic assessment of pronunciation quality by the speech recognizer. Scores obtained from an expert human listener are used as the reference to evaluate the different machine scores and to provide targets when training some of algorithms. We use a log-likelihood score and a normalized log-likelihood score as machine scoring methods. Experimental results show that the normalized log-likelihood score had higher correlation with human scores than that obtained using the log-likelihood score.

  • PDF

Improving Speaker Enrolling Speed for Speaker Verification Systems Based on Multilayer Perceptrons by Using a Qualitative Background Speaker Selection (정질적 기준을 이용한 다층신경망 기반 화자증명 시스템의 등록속도 단축방법)

  • 이태승;황병원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.360-366
    • /
    • 2003
  • Although multilayer perceptrons (MLPs) present several advantages against other pattern recognition methods, MLP-based speaker verification systems suffer from slow enrollment speed caused by many background speakers to achieve a low verification error. To solve this problem, the quantitative discriminative cohort speakers (QnDCS) method, by introducing the cohort speakers method into the systems, reduced the number of background speakers required to enroll speakers. Although the QnDCS achieved the goal to some extent, the improvement rate for the enrolling speed was still unsatisfactory. To improve the enrolling speed, this paper proposes the qualitative DCS (QlDCS) by introducing a qualitative criterion to select less background speakers. An experiment for both methods is conducted to use the speaker verification system based on MLPs and continuants, and speech database. The results of the experiment show that the proposed QlDCS method enrolls speakers in two times shorter time than the QnDCS does over the online error backpropagation(EBP) method.

Speaker Identification Using Greedy Kernel PCA (Greedy Kernel PCA를 이용한 화자식별)

  • Kim, Min-Seok;Yang, Il-Ho;Yu, Ha-Jin
    • MALSORI
    • /
    • no.66
    • /
    • pp.105-116
    • /
    • 2008
  • In this research, we propose a speaker identification system using a kernel method which is expected to model the non-linearity of speech features well. We have been using principal component analysis (PCA) successfully, and extended to kernel PCA, which is used for many pattern recognition tasks such as face recognition. However, we cannot use kernel PCA for speaker identification directly because the storage required for the kernel matrix grows quadratically, and the computational cost grows linearly (computing eigenvector of $l{\times}l$ matrix) with the number of training vectors I. Therefore, we use greedy kernel PCA which can approximate kernel PCA with small representation error. In the experiments, we compare the accuracy of the greedy kernel PCA with the baseline Gaussian mixture models using MFCCs and PCA. As the results with limited enrollment data show, the greedy kernel PCA outperforms conventional methods.

  • PDF

Categorization and Analysis of Error Types in the Korean Speech Recognition System (한국어 음성 인식 시스템의 오류 유형 분류 및 분석)

  • Son, Junyoung;Park Chanjun;Seo, Jaehyung;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.144-151
    • /
    • 2021
  • 딥러닝의 등장으로 자동 음성 인식 (Automatic Speech Recognition) 기술은 인간과 컴퓨터의 상호작용을 위한 가장 중요한 요소로 자리 잡았다. 그러나 아직까지 유사 발음 오류, 띄어쓰기 오류, 기호부착 오류 등과 같이 해결해야할 난제들이 많이 존재하며 오류 유형에 대한 명확한 기준 정립이 되고 있지 않은 실정이다. 이에 본 논문은 음성 인식 시스템의 오류 유형 분류 기준을 한국어에 특화되게 설계하였으며 이를 다양한 상용화 음성 인식 시스템을 바탕으로 질적 분석 및 오류 분류를 진행하였다. 실험의 경우 도메인과 어투에 따른 분석을 각각 진행하였으며 이를 통해 각 상용화 시스템별 강건한 부분과 약점인 부분을 파악할 수 있었다.

  • PDF

Quantization Based Speaker Normalization for DHMM Speech Recognition System (DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.299-307
    • /
    • 2003
  • There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.

Study on the Improvement of Speech Recognizer by Using Time Scale Modification (시간축 변환을 이용한 음성 인식기의 성능 향상에 관한 연구)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.462-472
    • /
    • 2004
  • In this paper a method for compensating for thp performance degradation or automatic speech recognition (ASR) is proposed. which is mainly caused by speaking rate variation. Before the new method is proposed. quantitative analysis of the performance of an HMM-based ASR system according to speaking rate is first performed. From this analysis, significant performance degradation was often observed in the rapidly speaking speech signals. A quantitative measure is then introduced, which is able to represent speaking rate. Time scale modification (TSM) is employed to compensate the speaking rate difference between input speech signals and training speech signals. Finally, a method for compensating the performance degradation caused by speaking rate variation is proposed, in which TSM is selectively employed according to speaking rate. By the results from the ASR experiments devised for the 10-digits mobile phone number, it is confirmed that the error rate was reduced by 15.5% when the proposed method is applied to the high speaking rate speech signals.

A study on user defined spoken wake-up word recognition system using deep neural network-hidden Markov model hybrid model (Deep neural network-hidden Markov model 하이브리드 구조의 모델을 사용한 사용자 정의 기동어 인식 시스템에 관한 연구)

  • Yoon, Ki-mu;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.2
    • /
    • pp.131-136
    • /
    • 2020
  • Wake Up Word (WUW) is a short utterance used to convert speech recognizer to recognition mode. The WUW defined by the user who actually use the speech recognizer is called user-defined WUW. In this paper, to recognize user-defined WUW, we construct traditional Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), Linear Discriminant Analysis (LDA)-GMM-HMM and LDA-Deep Neural Network (DNN)-HMM based system and compare their performances. Also, to improve recognition accuracy of the WUW system, a threshold method is applied to each model, which significantly reduces the error rate of the WUW recognition and the rejection failure rate of non-WUW simultaneously. For LDA-DNN-HMM system, when the WUW error rate is 9.84 %, the rejection failure rate of non-WUW is 0.0058 %, which is about 4.82 times lower than the LDA-GMM-HMM system. These results demonstrate that LDA-DNN-HMM model developed in this paper proves to be highly effective for constructing user-defined WUW recognition system.

A Study on Utterance Verification Using Accumulation of Negative Log-likelihood Ratio (음의 유사도 비율 누적 방법을 이용한 발화검증 연구)

  • 한명희;이호준;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3
    • /
    • pp.194-201
    • /
    • 2003
  • In speech recognition, confidence measuring is to decide whether it can be accepted as the recognized results or not. The confidence is measured by integrating frames into phone and word level. In case of word recognition, the confidence measuring verifies the results of recognition and Out-Of-Vocabulary (OOV). Therefore, the post-processing could improve the performance of recognizer without accepting it as a recognition error. In this paper, we measure the confidence modifying log likelihood ratio (LLR) which was the previous confidence measuring. It accumulates only those which the log likelihood ratio is negative when integrating the confidence to phone level from frame level. When comparing the verification performance for the results of word recognizer with the previous method, the FAR (False Acceptance Ratio) is decreased about 3.49% for the OOV and 15.25% for the recognition error when CAR (Correct Acceptance Ratio) is about 90%.

An Adaptive Utterance Verification Framework Using Minimum Verification Error Training

  • Shin, Sung-Hwan;Jung, Ho-Young;Juang, Biing-Hwang
    • ETRI Journal
    • /
    • v.33 no.3
    • /
    • pp.423-433
    • /
    • 2011
  • This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add-on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two-stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.

A Study on the Voice Dialing using HMM and Post Processing of the Connected Digits (HMM과 연결 숫자음의 후처리를 이용한 음성 다이얼링에 관한 연구)

  • Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.74-82
    • /
    • 1995
  • This paper is study on the voice dialing using HMM and post processing of the connected digits. HMM algorithm is widely used in the speech recognition with a good result. But, the maximum likelihood estimation of HMM(Hidden Markov Model) training in the speech recognition does not lead to values which maximize recognition rate. To solve the problem, we applied the post processing to segmental K-means procedure are in the recognition experiment. Korea connected digits are influenced by the prolongation more than English connected digits. To decrease the segmentation error in the level building algorithm some word models which can be produced by the prolongation are added. Some rules for the added models are applied to the recognition result and it is updated. The recognition system was implemented with DSP board having a TMS320C30 processor and IBM PC. The reference patterns were made by 3 male speakers in the noisy laboratory. The recognition experiment was performed for 21 sort of telephone number, 252 data. The recognition rate was $6\%$ in the speaker dependent, and $80.5\%$ in the speaker independent recognition test.

  • PDF