Search | Korea Science

User Adaptive Post-Processing in Speech Recognition for Mobile Devices (모바일 기기를 위한 음성인식의 사용자 적응형 후처리)

Kim, Young-Jin;Kim, Eun-Ju;Kim, Myung-Won
- Journal of KIISE:Computing Practices and Letters
- /
- v.13 no.5
- /
- pp.338-342
- /
- 2007
In this paper we propose a user adaptive post-processing method to improve the accuracy of speaker dependent, isolated word speech recognition, particularly for mobile devices. Our method considers the recognition result of the basic recognizer simply as a high-level speech feature and processes it further for correct recognition result. Our method learns correlation between the output of the basic recognizer and the correct final results and uses it to correct the erroneous output of the basic recognizer. A multi-layer perceptron model is built for each incorrectly recognized word with high frequency. As the result of experiments, we achieved a significant improvement of 41% in recognition accuracy (41% error correction rate).
PDF KSCI

Isolated-Word Recognition Using Neural Network and Hidden Markov Model (Neural-HMM을 이용한 고립단어 인식)

김연수;김창석
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.17 no.11
- /
- pp.1199-1205
- /
- 1992
In this paper, a Korean word recognition method which usese Neural Network and Hidden Markov Models(HMM) is proposed to improve a recognition rate with a small amount of learning data. The method reduces the fluctuation due to personal differences which is a problem to a HMM recognition system. In this method, effective recognizer is designed by the complement of each recognition result of the Hidden Markov Models(HMM) and Neural Network. In order to evaluate this model, word recognition experiment is carried out for 28 cities which is DDD area names uttered by two male and a female in twenties. As a result of testing HMM with 8 state, codeword is 64, the recognition rate 91[%], as a result of testing Neural network(NN) with 64 codeword the recognition rate is 89[%]. Finally, as a result of testing NN-HMM with 64 codeword which the best condition in former tests, the recognition rate is 95[%].
PDF

Rapid Speaker Adaptation Based on MAPLR with Adaptive Hybrid Priors Estimated from Reference Speakers (참조화자로부터 추정된 적응적 혼성 사전분포를 이용한 MAPLR 고속 화자적응)

Song, Young-Rok;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.6
- /
- pp.315-323
- /
- 2011
This paper proposes two methods of estimating prior distribution to improve the performance of rapid speaker adaptation based on maximum a posteriori linear regression (MAPLR). In general, prior distribution of the transformation matrix used in MAPLR adaptation is estimated from all of the training speakers who are employed to construct the speaker-independent model, and it is applied identically to all new speakers. In this paper, we propose a method in which prior distribution is estimated from a group of reference speakers, selected using adaptation data, so that the acoustic characteristics of the selected reference speakers may be similar to that of the new speaker. Additionally, in MAPLR adaptation with block-diagonal transformation matrix, we propose a method in which the mean matrix and covariance matrix of prior distribution are estimated from two groups of transformation matrices obtained from the same training speakers, respectively. To evaluate the performance of the proposed methods, we examine word accuracy according to the number of adaptation words in the isolated word recognition task. Experimental results show that, for very limited adaptation data, statistically significant performance improvement is obtained in comparison with the conventional MAPLR adaptation.
https://doi.org/10.7776/ASK.2011.30.6.315 인용 PDF KSCI

Speech Recognition in the Noisy Environment using Weighted Projection-Based Likelihood Measure and Parallel Model Combination (가중 투영 우도 측정 및 병렬 모델 결합을 이용한 잡음 환경에서의 음성 인식)

신원호;양태영;김원구;윤대희;차일환
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.1
- /
- pp.49-54
- /
- 1998
본 논문에서는 잡음이 존재하는 환경에 강인한 것으로 알려져 있는 투영 방법을 우 도 측정에 가중 함수와 결합하여 사용하는 방법을 제안하였다. 반연속 HMM을 이용한 고립 단어의 인식 실험 결과, 제안한 방법이 실험에 사용된 잡음의 환경들에서 모두 좋은 성능을 나타내었다. 아울러 병렬 모델 결합 방법을 반연속 HMM에 적용하였는데 이는 코드북의 변 환반으로 쉽게 잡음의 특성을 반영할 수 있다. 가중 투영 우도 측정 방법을 병렬 모델 결합 방법에 적용한 경우에도 우수한 성능을 거둘 수 있었다.
PDF

Performance Improvement of Multilayer Perceptrons with Increased Output Nodes (다층퍼셉트론의 출력 노드 수 증가에 의한 성능 향상)

Oh, Sang-Hoon
- The Journal of the Korea Contents Association
- /
- v.9 no.1
- /
- pp.123-130
- /
- 2009
When we apply MLPs(multilayer perceptrons) to pattern classification problems, we generally allocate one output node for each class and the index of output node denotes a class. On the contrary, in this paper, we propose to increase the number of output nodes per each class for performance improvement of MLPs. For theoretical backgrounds, we derive the misclassification probability in two class problems with additional outputs under the assumption that the two classes have equal probability and outputs are uniformly distributed in each class. Also, simulations of 50 isolated-word recognition show the effectiveness of our method.
https://doi.org/10.5392/JKCA.2009.9.1.123 인용 PDF

Branch Algorithm for Phoneme Segmentation in Korean Speech Recognition System (한국어 음성인식 시스템에서 음소 경계 검출을 위한 Branch 알고리즘)

서영완;한승진;장흥종;이정현
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.04b
- /
- pp.357-359
- /
- 2000
음소 단위로 구축된 음성 데이터는 음성인식, 합성 및 분석 등의 분야에서 매우 중요하다. 일반적으로 음소는 유성음과 무성음으로 구분되어 진다. 이러한 유성음과 무성음은 많은 특징적 차이가 있지만, 기존의 음소 경계추출 알고리즘은 이를 고려하지 않고 시간 축을 기준으로 이전 프레임과 매개변수 (스펙트럼) 비교만을 통하여 음소의 경계를 결정한다. 본 논문에서는 음소 경계 추출을 위하여 유성음과 무성음의 특징적 차이를 고려한 블록기반의 Branch 알고리즘을 설계하였다. Branch 알고리즘을 사용하기 위한 스펙트럼 비교 방법은 MFCC(Mel-Frequency Cepstrum Coefficient)를 기반으로 한 거리 측정법을 사용하였고, 유성음과 무성음의 구분은 포만트 주파수를 이용하였다. 실험 결과 3~4음절 고립단어를 대상으로 약 78%의 정확도를 얻을수 있었다.
PDF

A Study on VQ/HMM using Nonlinear Clustering and Smoothing Method (비선형 집단화와 완화기법을 이용한 VQ/HMM에 관한 연구)

정희석;강철호
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.3
- /
- pp.35-42
- /
- 1999
In this paper, a modified clustering algorithm is proposed to improve the discrimination of discrete HMM(Hidden Markov Model), so that it has increased recognition rate of 2.16% in comparison with the original HMM using the K-means or LBG algorithm. And, for preventing the decrease of recognition rate because of insufficient training data at the training scheme of HMM, a modified probabilistic smoothing method is proposed, which has increased recognition rate of 3.07% for the speaker-independent case. In the experiment applied the two proposed algorithms, the average rate of recognition has increased 4.66% for the speaker-independent case in comparison with that of original VQ/HMM.
PDF

The Low Cost Implementation of Speech Recognition System for the Web (웹에서의 저가 음성인식 시스템의 구현)

Park, Yong-Beom;Park, Jong-Il
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.4
- /
- pp.1129-1135
- /
- 1999
isolated word recognition using the Dynamic Time warping algorithm has shown good recognition rate on speaker dependent environment. But, practically, since the searching time of the dynamic Time Warping algorithm is rapidly increased as searching data is increased. it is hard to implement. In the context-dependent-short-query system such as educational children's workbook on the Web, the number of responses to the specific questions is limited. Therefore, the searching space for the answers can be reduced depending on the questions. In this paper, low cost implementation method using DTW for the Web has been proposed. To cover the weakness of DTW, the searching space is reduced by the context. the searching space, depends on the specific questions, is chosen from interest searchable candidates. In the real implementation, the proposed method show better performance of both time and recognition rate.
PDF

A Study on Isolated Words Speech Recognition in a Running Automobile (주행중인 자동차 환경에서의 고립단어 음성인식 연구)

유봉근
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.381-384
- /
- 1998
본 논문은 주행중인 자동차 환경에서 운전자의 안전성 및 편의성의 동시 확보를 위하여, 보조적인 스위치 조작없이 상시 음성의 입, 출력이 가능하도록 한다. 이때 잡음에 강인한 threshold 값을 구하기 위하여, 일정한 시간마다 기준 에너지와 영교차율(Zero Crossing Rate)을 변경하며, 밴드패스 필터(bandpass filter)를 이용하여 1차, 2차로 나누어 실시간 상태에서 자동으로, 정확하게 끝점검출(End Point Detection)을 처리한다. 기준패턴(reference pattern)은 DMS(Dynamic Multi-Section)을 사용하며, 화자의 변별력을 높이기 위하여 2개의 모델사용을 제안한다. 또한 주행중인 차량의 잡음환경에 강인하기 위하여 일반주행(80km/h 이내), 고속주행(80km/h 이상)등으로 나누며 차량의 가변잡음 크기에 따라 자동으로 선택하도록 한다. 음성의 특징 벡터와 인식 알고리즘은 PLP 13차와 One-Stage Dynamic Programming (OSDP)를 이용한다. 실험결과, 자주 사용되는 차량 편의장치 제어명령 33개에 대하여 중부, 영동 고속도로(시속 80Km/h 이상)에서 화자독립 89.75%, 화자종속 90.08%의 인식율을 구하였으며, 경부 고속도로에서는 화자독립 92.29%, 화자종속 92.42%의 인식율을 구하였다. 그리고 저속 주행중인 자동차 환경(80km/h 이내, 시멘트, 아스팔트 등의 서울시내 및 시외독립)에서는 화자독립 92.89%, 화자종속 94.44% 인식율을 구하였다.
PDF

Bayesian Fusion of Confidence Measures for Confidence Scoring (베이시안 신뢰도 융합을 이용한 신뢰도 측정)

김태윤;고한석
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.5
- /
- pp.410-419
- /
- 2004
In this paper. we propose a method of confidence measure fusion under Bayesian framework for speech recognition. Centralized and distributed schemes are considered for confidence measure fusion. Centralized fusion is feature level fusion which combines the values of individual confidence scores and makes a final decision. In contrast. distributed fusion is decision level fusion which combines the individual decision makings made by each individual confidence measuring method. Optimal Bayesian fusion rules for centralized and distributed cases are presented. In isolated word Out-of-Vocabulary (OOV) rejection experiments. centralized Bayesian fusion shows over 13% relative equal error rate (EER) reduction compared with the individual confidence measure methods. In contrast. the distributed Bayesian fusion shows no significant performance increase.
PDF KSCI

Search Result 109, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)