• Title/Summary/Keyword: 연속음성인식

Search Result 259, Processing Time 0.023 seconds

Implementation of a Speech Recognition System for a Car Navigation System (차량 항법용 음성인식 시스템의 구현)

  • Lee, Tae-Han;Yang, Tae-Young;Park, Sang-Taick;Lee, Chung-Yong;Youn, Dae-Hee;Cha, Il-Hwan
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.9
    • /
    • pp.103-112
    • /
    • 1999
  • In this paper, a speaker-independent isolated world recognition system for a car navigation system is implemented using a general digital signal processor. This paper presents a method combining SNR normalization with RAS as a noise processing method. The semi-continuous hidden markov model is adopted and TMS320C31 is used in implementing the real-time system. Recognition word set is composed of 69 command words for a car navigation system. Experimental results showed that the recognition performance has a maximum of 93.62% in case of a combination of SNR normalization and spectral subtraction, and the performance improvement rate of the system is 3.69%, Presented noise processing method showed good speech recognition performance in 5dB SNR in car environment.

  • PDF

A Speech Translation System for Hotel Reservation (호텔예약을 위한 음성번역시스템)

  • 구명완;김재인;박상규;김우성;장두성;홍영국;장경애;김응인;강용범
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.24-31
    • /
    • 1996
  • In this paper, we present a speech translation system for hotel reservation, KT_STS(Korea Telecom Speech Translation System). KT-STS is a speech-to-speech translation system which translates a spoken utterance in Korean into one in Japanese. The system has been designed around the task of hotel reservation(dialogues between a Korean customer and a hotel reservation de나 in Japan). It consists of a Korean speech recognition system, a Korean-to-Japanese machine translation system and a korean speech synthesis system. The Korean speech recognition system is an HMM(Hidden Markov model)-based speaker-independent, continuous speech recognizer which can recognize about 300 word vocabularies. Bigram language model is used as a forward language model and dependency grammar is used for a backward language model. For machine translation, we use dependency grammar and direct transfer method. And Korean speech synthesizer uses the demiphones as a synthesis unit and the method of periodic waveform analysis and reallocation. KT-STS runs in nearly real time on the SPARC20 workstation with one TMS320C30 DSP board. We have achieved the word recognition rate of 94. 68% and the sentence recognition rate of 82.42% after the speech recognition tests. On Korean-to-Japanese translation tests, we achieved translation success rate of 100%. We had an international joint experiment in which our system was connected with another system developed by KDD in Japan using the leased line.

  • PDF

A Study on Detection of Accentual Phrase's Boundaries according to Reading Speeds (낭독속도에 따른 강세구 경계 검출에 관한 연구)

  • Ju Jangkyu;Lee Kiyoung;Song Minsuck
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.91-94
    • /
    • 2000
  • 최근 운율 구조와 문장구조 및 음운규칙과 관련 된 많은 언어학적 연구가 이루어져, 언어 이해 차원에서 의미 정보, 문장 구조 정보, discourse structure 등을 위한 운율 정보의 유용성이 입증되었으나, 이러한 결과가 최근의 음성인식 시스템에는 거의 적용되지 못하고 있다. 본 연구에서는 계층적인 방법을 기초로 하여 한국어의 연속음성으로부터 운율구를 검출하는 세그멘테이션법을 제안하였다. 우선, 입력된 음성으로부터 문장단위의 경계를 검출하기 위하여 휴지기를 이용하였으며 에너지, 휴지기의 지속시간 및 피치궤적을 참조하여 강세구의 경계를 검출하였다. 실험음성의 텍스트는 "만물상"이며, 남녀 각 2명의 표준어 화자가 빠른 속도와 보통 속도로 낭독한 음성데이터를 대상으로 비교하였다.

  • PDF

On Detecting the Transition Regions of Phonemes by Using the Asymmetrical Rate of Speech Waveforms (음성파형의 비대칭율을 이용한 음소의 전이구간 검출)

  • Bae, Myung-Jin;Lee, Eul-jae;Ann, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.9 no.4
    • /
    • pp.55-65
    • /
    • 1990
  • To recognize continued speech, it is necessary to segment the connected acoustic signal into phonetic units, In this paper, as a parameter to detect transition regions in continued speech, we propose a new asymmetrical rate. The suggested rate represents a change rate of magnitude of speech signals. As comparing this rate with other rate in adjacent frame, the state of the frame can be distinguished between steady state and transient state.

  • PDF

On Detcdting the Steady State Segments of Speech Waveform by using the Normalized AMDF (규준화된 AMDF 이용한 음성파형의안정상태 구간검출)

  • Bae, Myung-Jin;Kim, Ul-Je;Ahn, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.3
    • /
    • pp.44-50
    • /
    • 1991
  • To recognize continued speech, it is necessary to segment the connected acoustic signal into phonetic units. In this paper, as a parameter to detect the transition regions in continued speech, we propose a new noramlized AMDF. The suggested parameter represents a change rate of magnitude of speech signals. As comparing this value with the adjactent frames value the state of the frames can be distinguished as a level between the steady state and transient state.

  • PDF

Language Models constructed by Iterative Learning and Variation of the Acoustical Parameters (음향학적 파라미터의 변화 및 반복학습으로 작성한 언어모델에 대한 고찰)

  • Oh Se-Jin;Hwang Cheol-Jun;Kim Bum-Koog;Jung Ho-Youl;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.35-38
    • /
    • 2000
  • 본 연구에서는 연속음성인식 시스템의 성능 향상을 위한 기초 연구로서 시스템에 적합한 음향모델과 언어모델을 작성하고 항공편 예약 태스크를 대상으로 인식실험을 실시한 결과 그 유효성을 확인하였다. 이를 위하여 먼저 HMM의 출력확률분포의 mixture와 파라미터의 차원에 대한 정확한 분석을 통한 음향모델을 작성하였다. 또한 반복학습법으로 특정 태스크를 대상으로 N-gram 언어모델을 적용하여 인식 시스템에 적합한 모델을 작성하였다. 인식실험에 있어서는 3인의 화자가 발성한 200문장에 대해 파라미터 차원 및 mixture의 변화에 따른 음향모델과 반복학습에 의해 작성한 언어모델에 대해 multi-pass 탐색 알고리즘을 이용하였다. 그 결과, 25차원에 대한 mixture 수가 9인 음향모델과 10회 반복 학습한 언어모델을 이용한 경우 평균 $81.0\%$의 인식률을 얻었으며, 38차원에 대한 mixture 수가 9인 음향모델과 10회 반복 학습한 언어모델을 이용한 경우 평균 $90.2\%$의 인식률을 보여 인식률 제고를 위해서는 38차원에 대한 mixture 수가 9인 음향모델과 10회 반복학습으로 작성한 언어모델을 이용한 경우가 매우 효과적임을 알 수 있었다.

  • PDF

Gaussian Selection in HMM Speech Recognizer with PTM Model for Efficient Decoding (PTM 모델을 사용한 HMM 음성인식기에서 효율적인 디코딩을 위한 가우시안 선택기법)

  • 손종목;정성윤;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.75-81
    • /
    • 2004
  • Gaussian selection (GS) is a popular approach in the continuous density hidden Markov model for fast decoding. It enables fast likelihood computation by reducing the number of Gaussian components calculated. In this paper, we propose a new GS method for the phonetic tied-mixture (PTM) hidden Markov models. The PTM model can represent each state of the same topological location with a shared set of Gaussian mixture components and contort dependent weights. Thus the proposed method imposes constraint on the weights as well as the number of Gaussian components to reduce the computational load. Experimental results show that the proposed method reduces the percentage of Gaussian computation to 16.41%, compared with 20-30% for the conventional GS methods, with little degradation in recognition.

Lip-Synch System Optimization Using Class Dependent SCHMM (클래스 종속 반연속 HMM을 이용한 립싱크 시스템 최적화)

  • Lee, Sung-Hee;Park, Jun-Ho;Ko, Han-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.312-318
    • /
    • 2006
  • The conventional lip-synch system has a two-step process, speech segmentation and recognition. However, the difficulty of speech segmentation procedure and the inaccuracy of training data set due to the segmentation lead to a significant Performance degradation in the system. To cope with that, the connected vowel recognition method using Head-Body-Tail (HBT) model is proposed. The HBT model which is appropriate for handling relatively small sized vocabulary tasks reflects co-articulation effect efficiently. Moreover the 7 vowels are merged into 3 classes having similar lip shape while the system is optimized by employing a class dependent SCHMM structure. Additionally in both end sides of each word which has large variations, 8 components Gaussian mixture model is directly used to improve the ability of representation. Though the proposed method reveals similar performance with respect to the CHMM based on the HBT structure. the number of parameters is reduced by 33.92%. This reduction makes it a computationally efficient method enabling real time operation.

A Study on Pitch Extraction Method using FIR-STREAK Digital Filter (FIR-STREAK 디지털 필터를 사용한 피치추출 방법에 관한 연구)

  • Lee, Si-U
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.1
    • /
    • pp.247-252
    • /
    • 1999
  • In order to realize a speech coding at low bit rates, a pitch information is useful parameter. In case of extracting an average pitch information form continuous speech, the several pitch errors appear in a frame which consonant and vowel are coexistent; in the boundary between adjoining frames and beginning or ending of a sentence. In this paper, I propose an Individual Pitch (IP) extraction method using residual signals of the FIR-STREAK digital filter in order to restrict the pitch extraction errors. This method is based on not averaging pitch intervals in order to accomodate the changes in each pitch interval. As a result, in case of Ip extraction method suing FIR-STREAK digital filter, I can't find the pitch errors in a frame which consonant and vowel are consistent; in the boundary between adjoining frames and beginning or ending of a sentence. This method has the capability of being applied to many fields, such as speech coding, speech analysis, speech synthesis and speech recognition.

  • PDF

Emotion recognition in speech using hidden Markov model (은닉 마르코프 모델을 이용한 음성에서의 감정인식)

  • 김성일;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.3
    • /
    • pp.21-26
    • /
    • 2002
  • This paper presents the new approach of identifying human emotional states such as anger, happiness, normal, sadness, or surprise. This is accomplished by using discrete duration continuous hidden Markov models(DDCHMM). For this, the emotional feature parameters are first defined from input speech signals. In this study, we used prosodic parameters such as pitch signals, energy, and their each derivative, which were then trained by HMM for recognition. Speaker adapted emotional models based on maximum a posteriori(MAP) estimation were also considered for speaker adaptation. As results, the simulation performance showed that the recognition rates of vocal emotion gradually increased with an increase of adaptation sample number.

  • PDF