• 제목/요약/키워드: Large vocabulary continuous speech recognition

검색결과 36건 처리시간 0.024초

대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계 (A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition)

  • 오유리;윤재삼;김홍국
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.103-106
    • /
    • 2005
  • In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

  • PDF

한국어 음성인식을 위한 효율적인 사전 구성에 관한 연구 (Study on Efficient Generation of Dictionary for Korean Vocabulary Recognition)

  • 이상복;최대림;김종교
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.41-44
    • /
    • 2002
  • This paper is related to the enhancement of speech recognition rate using enhanced pronunciation dictionary. Modern large vocabulary, continuous speech recognition systems have pronunciation dictionaries. A pronunciation dictionary provides pronunciation information for each word in the vocabulary in phonemic units, which are modeled in detail by the acoustic models. But in most speech recognition system based on Hidden Markov Model, actual pronunciation variations are disregarded. Without the pronunciation variations in the speech recognition system, the phonetic transcriptions in the dictionary do not match the actual occurrences in the database. In this paper, we proposed the unvoiced rule of semivowel in allophone rules to pronunciation dictionary. Experimental results on speech recognition system give higher performance than existing pronunciation dictionaries.

  • PDF

한국어 연속음성인식 시스템 구현을 위한 형태소 단위의 발음 변화 모델링 (Modeling Cross-morpheme Pronunciation Variations for Korean Large Vocabulary Continuous Speech Recognition)

  • 정민화;이경님
    • 대한음성학회지:말소리
    • /
    • 제49호
    • /
    • pp.107-121
    • /
    • 2004
  • In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished phonological rules that can be applied to phonemes in within-morpheme and cross-morpheme. The results of 33K-morpheme Korean CSR experiments show that an absolute reduction of 1.45% in WER from the baseline performance of 18.42% WER was achieved by modeling proposed pronunciation variations with a possible multiple context-dependent pronunciation lexicon.

  • PDF

한국어 음성인식 플랫폼(ECHOS)의 개선 및 평가 (Improvement and Evaluation of the Korean Large Vocabulary Continuous Speech Recognition Platform (ECHOS))

  • 권석봉;윤성락;장규철;김용래;김봉완;김회린;유창동;이용주;권오욱
    • 대한음성학회지:말소리
    • /
    • 제59호
    • /
    • pp.53-68
    • /
    • 2006
  • We report the evaluation results of the Korean speech recognition platform called ECHOS. The platform has an object-oriented and reusable architecture so that researchers can easily evaluate their own algorithms. The platform has all intrinsic modules to build a large vocabulary speech recognizer: Noise reduction, end-point detection, feature extraction, hidden Markov model (HMM)-based acoustic modeling, cross-word modeling, n-gram language modeling, n-best search, word graph generation, and Korean-specific language processing. The platform supports both lexical search trees and finite-state networks. It performs word-dependent n-best search with bigram in the forward search stage, and rescores the lattice with trigram in the backward stage. In an 8000-word continuous speech recognition task, the platform with a lexical tree increases 40% of word errors but decreases 50% of recognition time compared to the HTK platform with flat lexicon. ECHOS reduces 40% of recognition errors through incorporation of cross-word modeling. With the number of Gaussian mixtures increasing to 16, it yields word accuracy comparable to the previous lexical tree-based platform, Julius.

  • PDF

Review And Challenges In Speech Recognition (ICCAS 2005)

  • Ahmed, M.Masroor;Ahmed, Abdul Manan Bin
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2005년도 ICCAS
    • /
    • pp.1705-1709
    • /
    • 2005
  • This paper covers review and challenges in the area of speech recognition by taking into account different classes of recognition mode. The recognition mode can be either speaker independent or speaker dependant. Size of the vocabulary and the input mode are two crucial factors for a speech recognizer. The input mode refers to continuous or isolated speech recognition system and the vocabulary size can be small less than hundred words or large less than few thousands words. This varies according to system design and objectives.[2]. The organization of the paper is: first it covers various fundamental methods of speech recognition, then it takes into account various deficiencies in the existing systems and finally it discloses the various probable application areas.

  • PDF

비교사 분할 및 병합으로 구한 의사형태소 음성인식 단위의 성능 (Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging)

  • 방정욱;권오욱
    • 말소리와 음성과학
    • /
    • 제6권3호
    • /
    • pp.155-164
    • /
    • 2014
  • This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition (LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence is segmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained by splitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms are computed from the morpheme frequency distribution, the morpheme length distribution, and the morpheme frequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pair with the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and a trigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce the out-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.

한국어 방송 뉴스 인식 시스템을 위한 OOV update module (Korean broadcast news transcription system with out-of-vocabulary(OOV) update module)

  • 정의정;윤승
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2002년도 하계학술발표대회 논문집 제21권 1호
    • /
    • pp.33-36
    • /
    • 2002
  • We implemented a robust Korean broadcast news transcription system for out-of-vocabulary (OOV), tested its performance. The occurrence of OOV words in the input speech is inevitable in large vocabulary continuous speech recognition (LVCSR). The known vocabulary will never be complete due to the existence of for instance neologisms, proper names, and compounds in some languages. The fixed vocabulary and language model of LVCSR system directly face with these OOV words. Therefore our Broadcast news recognition system has an offline OOV update module of language model and vocabulary to solve OOV problem and selects morpheme-based recognition unit (so called, pseudo-morpheme) for OOV robustness.

  • PDF

공동 이용을 위한 음성 인식 및 합성용 음성코퍼스의 발성 목록 설계 (Design of Linguistic Contents of Speech Copora for Speech Recognition and Synthesis for Common Use)

  • 김연화;김형주;김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.89-99
    • /
    • 2002
  • Recently, researches into ways of improving large vocabulary continuous speech recognition and speech synthesis are being carried out intensively as the field of speech information technology is progressing rapidly. In the field of speech recognition, developments of stochastic methods such as HMM require large amount of speech data for training, and also in the field of speech synthesis, recent practices show that synthesis of better quality can be produced by selecting and connecting only the variable size of speech data from the large amount of speech data. In this paper we design and discuss linguistic contents for speech copora for speech recognition and synthesis to be shared in common.

  • PDF

Modified Phonetic Decision Tree For Continuous Speech Recognition

  • Kim, Sung-Ill;Kitazoe, Tetsuro;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • 제17권4E호
    • /
    • pp.11-16
    • /
    • 1998
  • For large vocabulary speech recognition using HMMs, context-dependent subword units have been often employed. However, when context-dependent phone models are used, they result in a system which has too may parameters to train. The problem of too many parameters and too little training data is absolutely crucial in the design of a statistical speech recognizer. Furthermore, when building large vocabulary speech recognition systems, unseen triphone problem is unavoidable. In this paper, we propose the modified phonetic decision tree algorithm for the automatic prediction of unseen triphones which has advantages solving these problems through following two experiments in Japanese contexts. The baseline experimental results show that the modified tree based clustering algorithm is effective for clustering and reducing the number of states without any degradation in performance. The task experimental results show that our proposed algorithm also has the advantage of providing a automatic prediction of unseen triphones.

  • PDF

대용량 한국어 연속음성인식 시스템 개발 (On the Development of a Large-Vocabulary Continuous Speech Recognition System for the Korean Language)

  • 최인정;권오욱;박종렬;박용규;김도영;정호영;은종관
    • 한국음향학회지
    • /
    • 제14권5호
    • /
    • pp.44-50
    • /
    • 1995
  • 본 논문에서는 연속분포 HMM을 이용한 대용량 한국어 연속음성인식 시스템에 관하여 기술한다. 인식 시스템의 성능을 개선하기 위하여 음성 모델링 단위의 선정, 단어간 모델링, 탐색 알고리듬, 문법에 관하여 연구하였다. 기본 인식단위로 트라이존을 사용하며 학습성을 개선하고 기능어에서의 에러 발생을 줄이기 위하여 일반화된 트라이폰과 function word-de-pendent phone을 사용한다. 단어 사이에는 묵음 모델과 null transition을 사용하여 선택적으로 묵음을 추가하였다. 언어모델로는 단어 클래스에 근거한 word pair 문법과 bigram 모델이 이용된다. 또한 지식 정보들을 효율적으로 활용할 수 있도록 N개의 후보 문장들을 탐색할 수 있는 알고리듬을 구현하였다. 후처리기에서는 word triple문법을 사용하여 N개의 최적 문장을 재정렬하여 최종적인 인식 문장을 결정하며, 마지막으로 후치사와 관련된 사소한 에러들을 수정한다. 3천단어의 연속음성 데이타베이스에 대한 인식실험에서, 후처리로 word triple 문법을 사용하여 $93.1\%$의 단어 인식률과 $73.8\%$의 문장 인식률을 얻었다.

  • PDF