• 제목/요약/키워드: Continuous Speech Recognition

검색결과 224건 처리시간 0.032초

청음 음성학적 지식에 기반한 음가분류에 의한 핵심어 검출 시스템 구현 (The Design of Keyword Spotting System based on Auditory Phonetical Knowledge-Based Phonetic Value Classification)

  • 김학진;김순협
    • 정보처리학회논문지B
    • /
    • 제10B권2호
    • /
    • pp.169-178
    • /
    • 2003
  • This study outlines two viewpoints the classification of phone likely unit (PLU) which is the foundation of korean large vocabulary speech recognition, and the effectiveness of Chiljongseong (7 Final Consonants) and Paljogseong (8 Final Consonants) of the korean language. The phone likely classifies the phoneme phonetically according to the location of and method of articulation, and about 50 phone-likely units are utilized in korean speech recognition. In this study auditory phonetical knowledge was applied to the classification of phone likely unit to present 45 phone likely unit. The vowels 'ㅔ, ㅐ'were classified as phone-likely of (ee) ; 'ㅒ, ㅖ' as [ye] ; and 'ㅚ, ㅙ, ㅞ' as [we]. Secondly, the Chiljongseong System of the draft for unified spelling system which is currently in use and the Paljongseonggajokyong of Korean script haerye were illustrated. The question on whether the phonetic value on 'ㄷ' and 'ㅅ' among the phonemes used in the final consonant of the korean fan guage is the same has been argued in the academic world for a long time. In this study, the transition stages of Korean consonants were investigated, and Ciljonseeng and Paljongseonggajokyong were utilized in speech recognition, and its effectiveness was verified. The experiment was divided into isolated word recognition and speech recognition, and in order to conduct the experiment PBW452 was used to test the isolated word recognition. The experiment was conducted on about 50 men and women - divided into 5 groups - and they vocalized 50 words each. As for the continuous speech recognition experiment to be utilized in the materialized stock exchange system, the sentence corpus of 71 stock exchange sentences and speech corpus vocalizing the sentences were collected and used 5 men and women each vocalized a sentence twice. As the result of the experiment, when the Paljongseonggajokyong was used as the consonant, the recognition performance elevated by an average of about 1.45% : and when phone likely unit with Paljongseonggajokyong and auditory phonetic applied simultaneously, was applied, the rate of recognition increased by an average of 1.5% to 2.02%. In the continuous speech recognition experiment, the recognition performance elevated by an average of about 1% to 2% than when the existing 49 or 56 phone likely units were utilized.

인지 모델을 이용한 제한된 한국어 연속음 인식 (Recognition of Restricted Continuous Korean Speech Using Perceptual Model)

  • 김선일;홍기원;이행세
    • 한국음향학회지
    • /
    • 제14권3호
    • /
    • pp.61-70
    • /
    • 1995
  • 본 논문에서는 사람의 인지 특성에 가까운 PLP 켑스트럼을 사용하여 음성의 시간적 특성을 잘 반영할 수 있도록 넓은 시간대에 걸쳐 특징을 추출하였으며 인간의 학습 방법과 유사한 인공신경망을 이용하여 음소를 인식하고 인식된 음소로부터 순서 특징을 잘 반영하는 Markov 모델을 통해 음소열을 인식하였다. 음소인식은 연속음성에 나타나는 음소에서 비균일한 프레임 개수로 채취된 음성 블록들을 사용하여 7차 PLP 켑스트럼, PTP, 영교차율 및 에너지를 구하고 이를 MLP 신경망의 입력으로 사용하여 두 사람이 각각 5번씩 발음한 10종류의 한국어 문장, 총 100개를 대상으로 음소 인식을 실시하여 최대 9.4%의 음소별 인식률을 얻을 수 있었다. 문장인식은 학습에 참여했던 두 사람이 각 문장에 대해 10번씩 새로 발음한 총 200개의 데이터에 대해 음소별 인식을 거쳐 첫 번째 실험을 통해 생성된 Markov 모델을 이용하여 문장 인식을 실시한 결과 92.5%의 문장 인식률을 얻었다.

  • PDF

언어 모델 네트워크에 기반한 대어휘 연속 음성 인식 (Large Vocabulary Continuous Speech Recognition Based on Language Model Network)

  • 안동훈;정민화
    • 한국음향학회지
    • /
    • 제21권6호
    • /
    • pp.543-551
    • /
    • 2002
  • 이 논문에서는 20,000 단어급의 대어휘를 대상으로 실시간 연속음성 인식을 수행할 수 있는 탐색 방법을 제안한다. 기본적인 탐색 방법은 토큰 전파 방식의 비터비 (Viterbi) 디코딩 알고리듬을 이용한 1 패스로 구성된다. 언어 모델 네트워크를 도입하여 다양한 언어 모델들을 일관된 탐색 공간으로 구성하도록 하였으며, 프루닝(pruning) 단계에서 살아남은 토큰들로부터 동적으로 탐색 공간을 재구성하였다. 용이한 후처리를 위해 워드그래프 및 N개의 최적 문장을 출력할 수 있도록 비터비 알고리듬을 수정하였다. 이렇게 구성된 디코더는 20,000 단어급 데이터 베이스에 대해 테스트하였으며 인식률 및 RTF측면에서 평가되었다.

비교사 분할 및 병합으로 구한 의사형태소 음성인식 단위의 성능 (Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging)

  • 방정욱;권오욱
    • 말소리와 음성과학
    • /
    • 제6권3호
    • /
    • pp.155-164
    • /
    • 2014
  • This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition (LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence is segmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained by splitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms are computed from the morpheme frequency distribution, the morpheme length distribution, and the morpheme frequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pair with the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and a trigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce the out-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.

Implementation of Speech Recognition System Using JAVA Applet

  • Park, Seungho;Park, Kwangkook;Kim, Kyungnam;Kim, Jingyoung;Kim, Kijung
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -1
    • /
    • pp.257-259
    • /
    • 2000
  • In this paper, a word-unit recognition is performed to implement a speech recognition system over the web, using JAVA Applet and continuous distributed HMM. The system based on Client/server model is designed. A client computer processes speech with Applet, and then transmits feature parameters to the server computer though the Internet. The speech recognition system in the server computer transmits the result applied by the forward algorithm to the client computer and the result is displayed in the client computer by text.

  • PDF

연속음성중 키워드(Keyword) 인식을 위한 Binary Clustering Network (Binary clustering network for recognition of keywords in continuous speech)

  • 최관선;한민홍
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1993년도 한국자동제어학술회의논문집(국내학술편); Seoul National University, Seoul; 20-22 Oct. 1993
    • /
    • pp.870-876
    • /
    • 1993
  • This paper presents a binary clustering network (BCN) and a heuristic algorithm to detect pitch for recognition of keywords in continuous speech. In order to classify nonlinear patterns, BCN separates patterns into binary clusters hierarchically and links same patterns at root level by using the supervised learning and the unsupervised learning. BCN has many desirable properties such as flexibility of dynamic structure, high classification accuracy, short learning time, and short recall time. Pitch Detection algorithm is a heuristic model that can solve the difficulties such as scaling invariance, time warping, time-shift invariance, and redundance. This recognition algorithm has shown recognition rates as high as 95% for speaker-dependent as well as multispeaker-dependent tests.

  • PDF

음성 인식을 이용한 증권 정보 검색 시스템의 개발 (Development of a Stock Information Retrieval System using Speech Recognition)

  • 박성준;구명완;전주식
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제6권4호
    • /
    • pp.403-410
    • /
    • 2000
  • 본 논문에서는 음성 인식을 이용한 증권 정보 검색 시스템의 개발에 대하여 기술하고 시스템의 주요 특징을 설명한다. 이 시스템은 DHMM (discrete hidden Markov model)에 기반을 두고, 유사 음소를 기본 인식 단위로 사용하였다. 끝점 검출과 반향 제거 기능을 포함시켜 사용자의 음성 입력이 편리하도록 만들었으며, 한 번의 음성 입력이 하나만의 단어가 아닌 여러 개의 단어가 될 수 있도록 연속 음성 인식기를 구현하였다. 상용화 이후의 몇 개월에 걸친 데이터를 이용하여 운용 결과를 분석하였다.

  • PDF

Korean LVCSR for Broadcast News Speech

  • Lee, Gang-Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • 제20권2E호
    • /
    • pp.3-8
    • /
    • 2001
  • In this paper, we will examine a Korean large vocabulary continuous speech recognition (LVCSR) system for broadcast news speech. The combined vowel and implosive unit is included in a phone set together with other short phone units in order to obtain a longer unit acoustic model. The effect of this unit is compared with conventional phone units. The dictionary units for language processing are automatically extracted from eojeols appearing in transcriptions. Triphone models are used for acoustic modeling and a trigram model is used for language modeling. Among three major speaker groups in news broadcasts-anchors, journalists and people (those other than anchors or journalists, who are being interviewed), the speech of anchors and journalists, which has a lot of noise, was used for testing and recognition.

  • PDF

Subspace distribution clustering hidden Markov model을 위한 codebook design (Codebook design for subspace distribution clustering hidden Markov model)

  • 조영규;육동석
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.87-90
    • /
    • 2005
  • Today's state-of the-art speech recognition systems typically use continuous distribution hidden Markov models with the mixtures of Gaussian distributions. To obtain higher recognition accuracy, the hidden Markov models typically require huge number of Gaussian distributions. Such speech recognition systems have problems that they require too much memory to run, and are too slow for large applications. Many approaches are proposed for the design of compact acoustic models. One of those models is subspace distribution clustering hidden Markov model. Subspace distribution clustering hidden Markov model can represent original full-space distributions as some combinations of a small number of subspace distribution codebooks. Therefore, how to make the codebook is an important issue in this approach. In this paper, we report some experimental results on various quantization methods to make more accurate models.

  • PDF

연속 음성에서의 신경회로망을 이용한 화자 적응 (Speaker Adaptation Using Neural Network in Continuous Speech Recognition)

  • 김선일
    • 한국음향학회지
    • /
    • 제19권1호
    • /
    • pp.11-15
    • /
    • 2000
  • RM 음성 Corpus를 이용한 화자 적응 연속 음성 인식을 수행하였다. RM Corpus의 훈련용 데이터를 이용해서 기준화자에 대한 HMM 학습을 실시하고 평가용 데이터를 이용하여 화자 적응 인식에 대한 평가를 실시하였다. 화자 적응을 위해서는 훈련용 데이터의 일부가 사용되었다. DTW를 이용하여 인식 대상화자의 데이터를 기준화자의 데이터와 시간적으로 일치시키고 오차 역전파 신경회로망을 사용하여 인식 대상화자의 스펙트럼이 기준화자의 스펙트럼 특성을 지니도록 변환시켰다. 최적의 화자 적응이 이루어지도록 하기 위해 신경회로망의 여러 요소들을 변화시키면서 실험을 실시하고 그 결과를 제시하였다. 학습을 거쳐 적절한 가중치를 지닌 신경회로망을 이용하여 기준화자에 적응시킨 결과 단어 인식율이 최대 2.1배, 단어 정인식율이 최대 4.7배 증가하였다.

  • PDF