• Title/Summary/Keyword: 음소 유사도

Search Result 97, Processing Time 0.018 seconds

The storage structure and retrieval mechanism for korean speech database (한국어 음성 데이타베이스의 저장 구조와 검색 기법)

  • Song, Gun-Seop;Park, Yeong-Bae
    • Annual Conference on Human and Language Technology
    • /
    • 1991.10a
    • /
    • pp.321-330
    • /
    • 1991
  • 기존의 데이타베이스에 음성 데이타를 저장하여 음성 데이타 베이스를 구축하고자 할 경우, 음성 데이타의 특성이 가변장(variable length)이며, 튜플(음소 단위)의 길이가 매우 긴 패턴 데이타이므로 기존의 데이타베이스 시스템에서는 지원할 수 없다. 또, 현재의 음성 인식 시스템에서는 패턴 데이타를 순차적인 검색 방법으로 검색하고 있어 빠른 검색 방법이 요구된다. 본 논문에서는 음성 데이타를 음소 단위로 인식하기 위해 음소 패턴 데이타를 저장하고, 유사한 특성을 갖는 부류와 음소 길이에 의한 분류를 혼합한 방법을 이용하여 빠른 시간에 검색을 할 수 있게 하기 위한 저장 구조와 검색 알고리즘을 제시한다.

  • PDF

Efficient Continuous Vocabulary Clustering Modeling for Tying Model Recognition Performance Improvement (공유모델 인식 성능 향상을 위한 효율적인 연속 어휘 군집화 모델링)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.1
    • /
    • pp.177-183
    • /
    • 2010
  • In continuous vocabulary recognition system by statistical method vocabulary recognition to be performed using probability distribution it also modeling using phoneme clustering for based sample probability parameter presume. When vocabulary search that low recognition rate problem happened in express vocabulary result from presumed probability parameter by not defined phoneme and insert phoneme and it has it's bad points of gaussian model the accuracy unsecure for one clustering modeling. To improve suggested probability distribution mixed gaussian model to optimized for based resemble Euclidean and Bhattacharyya distance measurement method mixed clustering modeling that system modeling for be searching phoneme probability model in clustered model. System performance as a result of represent vocabulary dependence recognition rate of 98.63%, vocabulary independence recognition rate of 97.91%.

Recognition of Restricted Continuous Korean Speech Using Perceptual Model (인지 모델을 이용한 제한된 한국어 연속음 인식)

  • Kim, Seon-Il;Hong, Ki-Won;Lee, Haing-Sei
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.61-70
    • /
    • 1995
  • In this paper, the PLP cepstrum which is close to human perceptual characteristics was extracted through the spread time area to get the temperal feature. Phonemes were recognized by artificial neural network similar to the learning method of human. The phoneme strings were matched by Markov models which well suited for sequence. Phoneme recognition for the continuous Korean speech had been done using speech blocks in which speech frames were gathered with unequal numbers. We parameterized the blocks using 7th order PLPs, PTP, zero crossing rate and energy, which neural network used as inputs. The 100 data composed of 10 Korean sentences which were taken from the speech two men pronounced five times for each sentence were used for the the recognition. As a result, maximum recognition rate of 94.4% was obtained. The sentence was recognized using Markov models generated by the phoneme strings recognized from earlier results the recognition for the 200 data which two men sounded 10 times for each sentence had been carried out. The sentence recognition rate of 92.5% was obtained.

  • PDF

Performance of Korean spontaneous speech recognizers based on an extended phone set derived from acoustic data (음향 데이터로부터 얻은 확장된 음소 단위를 이용한 한국어 자유발화 음성인식기의 성능)

  • Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.39-47
    • /
    • 2019
  • We propose a method to improve the performance of spontaneous speech recognizers by extending their phone set using speech data. In the proposed method, we first extract variable-length phoneme-level segments from broadcast speech signals, and convert them to fixed-length latent vectors using an long short-term memory (LSTM) classifier. We then cluster acoustically similar latent vectors and build a new phone set by choosing the number of clusters with the lowest Davies-Bouldin index. We also update the lexicon of the speech recognizer by choosing the pronunciation sequence of each word with the highest conditional probability. In order to analyze the acoustic characteristics of the new phone set, we visualize its spectral patterns and segment duration. Through speech recognition experiments using a larger training data set than our own previous work, we confirm that the new phone set yields better performance than the conventional phoneme-based and grapheme-based units in both spontaneous speech recognition and read speech recognition.

A Study on Utterance Verification Using Accumulation of Negative Log-likelihood Ratio (음의 유사도 비율 누적 방법을 이용한 발화검증 연구)

  • 한명희;이호준;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3
    • /
    • pp.194-201
    • /
    • 2003
  • In speech recognition, confidence measuring is to decide whether it can be accepted as the recognized results or not. The confidence is measured by integrating frames into phone and word level. In case of word recognition, the confidence measuring verifies the results of recognition and Out-Of-Vocabulary (OOV). Therefore, the post-processing could improve the performance of recognizer without accepting it as a recognition error. In this paper, we measure the confidence modifying log likelihood ratio (LLR) which was the previous confidence measuring. It accumulates only those which the log likelihood ratio is negative when integrating the confidence to phone level from frame level. When comparing the verification performance for the results of word recognizer with the previous method, the FAR (False Acceptance Ratio) is decreased about 3.49% for the OOV and 15.25% for the recognition error when CAR (Correct Acceptance Ratio) is about 90%.

A Study on the Rejection Capability Based on Anti-phone Modeling (반음소 모델링을 이용한 거절기능에 대한 연구)

  • 김우성;구명완
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.3-9
    • /
    • 1999
  • This paper presents the study on the rejection capability based on anti-phone modeling for vocabulary independent speech recognition system. The rejection system detects and rejects out-of-vocabulary words which were not included in candidate words which are defined while the speech recognizer is made. The rejection system can be classified into two categories by their implementation methods, keyword spotting method and utterance verification method. The keyword spotting method uses an extra filler model as a candidate word as well as keyword models. The utterance verification method uses the anti-models for each phoneme for the calculation of confidence score after it has constructed the anti-models for all phonemes. We implemented an utterance verification algorithm which can be used for vocabulary independent speech recognizer. We also compared three kinds of means for the calculation of confidence score, and found out that the geometric mean had shown the best result. For the normalization of confidence score, usually Sigmoid function is used. On using it, we compared the effect of the weight constant for Sigmoid function and determined the optimal value. And we compared the effects of the size of cohort set, the results showed that the larger set gave the better results. And finally we found out optimal confidence score threshold value. In case of using the threshold value, the overall recognition rate including rejection errors was about 76%. This results are going to be adapted for stock information system based on speech recognizer which is currently provided as an experimental service by Korea Telecom.

  • PDF

The Automated Threshold Decision Algorithm for Node Split of Phonetic Decision Tree (음소 결정트리의 노드 분할을 위한 임계치 자동 결정 알고리즘)

  • Kim, Beom-Seung;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.3
    • /
    • pp.170-178
    • /
    • 2012
  • In the paper, phonetic decision tree of the triphone unit was built for the phoneme-based speech recognition of 640 stations which run by the Korail. The clustering rate was determined by Pearson and Regression analysis to decide threshold used in node splitting. Using the determined the clustering rate, thresholds are automatically decided by the threshold value according to the average clustering rate. In the recognition experiments for verifying the proposed method, the performance improved 1.4~2.3 % absolutely than that of the baseline system.

Analysis of Unaspirated sound for Korean (한국어의 경음에 대한 분석)

  • Lim Soo-Ho;Kim Joo-Gon;Kim Bum-Guk;Jung Ho-Youl;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.41-44
    • /
    • 2004
  • 본 논문에서는 한국어에만 나타나는 경음에 대하여 음운학적, 음향학적 특성을 고찰하고 이를 기반으로 음성인식 실험을 수행한 후 그 결과를 분석하였다. 음성인식 실험을 위하여 입력 음성을 48개의 유사음소단위 (PLU; Phoneme Likely Unit)로 레이블링을 한 후 각각의 음소군에 대하여 LPC (Liner Predictive Coding) 분해능을 증가시키면서 음소인식 및 단어인식 실험을 수행하였다. 그 결과, 음소 인식 실험에서 경음군의 인식률이 가장 낮게 나타나 경음에 대한 분석이 보다 많이 필요함을 알 수 있었다. 또한 PLC의 분해 차원이 23차 일 때 경음과 전체 음소 인식률이 각각 $34.11\%,\;46.1\%$로 나타나 가장 양호함을 알 수 있었으며 단어인식 실험에서도 LPC 23차와 25차 일 때 $81.68\%,\;81.87\%$로 인식률이 가장 좋음을 알 수 있었다. 이상의 실험 결과에서 한국어의 경음은 전체 시스템의 인식 성능과 밀접한 관계가 있음을 알 수 있었다.

  • PDF

Speech Recognition Optimization Learning Model using HMM Feature Extraction In the Bhattacharyya Algorithm (바타차랴 알고리즘에서 HMM 특징 추출을 이용한 음성 인식 최적 학습 모델)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.199-204
    • /
    • 2013
  • Speech recognition system is shall be composed model of learning from the inaccurate input speech. Similar phoneme models to recognize, because it leads to the recognition rate decreases. Therefore, in this paper, we propose a method of speech recognition optimal learning model configuration using the Bhattacharyya algorithm. Based on feature of the phonemes, HMM feature extraction method was used for the phonemes in the training data. Similar learning model was recognized as a model of exact learning using the Bhattacharyya algorithm. Optimal learning model configuration using the Bhattacharyya algorithm. Recognition performance was evaluated. In this paper, the result of applying the proposed system showed a recognition rate of 98.7% in the speech recognition.

Implementation of the Speech Interface for Information Retrieving System (정보검색 시스템의 음성 인터페이스 구현)

  • 김정철;배건성
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.5
    • /
    • pp.104-111
    • /
    • 1999
  • 본 논문에서는 HMM 고립단어인식 기술을 이용하여 정보 사용자들이 윈도즈 환경에서 편리하게 정보를 검색할 수 있는 시스템을 구현하였다. 인식 시스템에서 인식단위로 유사음소모델을 이용하여 인식어의 확장성을 고려하였고 기본모델은 SPHINX 시스템에서 사용하는 형태의 음소모델을 연속분포 HMM으로 구현하였다. 정보검색 도구에서는 기능을 단순화하고 검색절차를 음성으로 출력하도록 하여 사용자의 편의성을 고려하였다.

  • PDF