• Title/Summary/Keyword: Speech Recognition Technology

Search Result 527, Processing Time 0.022 seconds

A Study on the Characteristics of Segmental-Feature HMM (분절특징 HMM의 특성에 관한 연구)

  • Yun Young-Sun;Jung Ho-Young
    • MALSORI
    • /
    • no.43
    • /
    • pp.163-178
    • /
    • 2002
  • In this paper, we discuss the characteristics of Segmental-Feature HMM and summarize previous studies of SFHMM. There are several approaches to reduce the number of parameters in the previous studies. However, if the number of parameters decreased, the performance of systems also fell. Therefore, we consider the fast computation approach with preserving the same number of parameters. In this paper, we present the new segment comparison method to speed up the computation of SFHMM without loss of performance. The proposed method uses the three-frame calculation rather than the full(five) frames in the given segment. The experimental results show that the performance of the proposed system is better than that of the previous studies.

  • PDF

Phonetic Tied-Mixture Syllable Model for Efficient Decoding in Korean ASR (효율적 한국어 음성 인식을 위한 PTM 음절 모델)

  • Kim Bong-Wan;Lee Yong-Jn
    • MALSORI
    • /
    • no.50
    • /
    • pp.139-150
    • /
    • 2004
  • A Phonetic Tied-Mixture (PTM) model has been proposed as a way of efficient decoding in large vocabulary continuous speech recognition systems (LVCSR). It has been reported that PTM model shows better performance in decoding than triphones by sharing a set of mixture components among states of the same topological location[5]. In this paper we propose a Phonetic Tied-Mixture Syllable (PTMS) model which extends PTM technique up to syllables. The proposed PTMS model shows 13% enhancement in decoding speed than PTM. In spite of difference in context dependent modeling (PTM : cross-word context dependent modeling, PTMS : word-internal left-phone dependent modeling), the proposed model shows just less than 1% degradation in word accuracy than PTM with the same beam width. With a different beam width, it shows better word accuracy than in PTM at the same or higher speed.

  • PDF

The Relationship Between Voice and the Image Triggered by the Voice: American Speakers and American Listeners (목소리를 듣고 감지하는 인상에 대한 연구: 미국인화자와 미국인청자)

  • Moon, Seung-Jae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.111-118
    • /
    • 2009
  • The present study aims at investigating the relationship between voices and the physical images triggered by the voices. It is the final part of a four-part series and the results reported in the present study are limited to those of American speakers and American listeners. Combined with the results from previous studies (Moon, 2000; Moon, 2002; Tak, 2005), the results suggest that (1) there is a very strong, much higher than chance-level relationship between voices and the pictures chosen for the voices by the perception experiment subjects; (2) the more physical characteristics that are given, the better the chance for correctly matching voices with pictures; and (3) culture (in the present, language environment) seems to play a role in conjuring up the mental images from voices.

  • PDF

Universal and Specific Features in Intonation Perception

  • Makarova, Veronika
    • MALSORI
    • /
    • no.41
    • /
    • pp.73-81
    • /
    • 2001
  • This paper reports the results of an experimental phonetic study of intonation contrasts perception by speakers of British English, Japanese and Russian. Six series of re-synthesized two-syllable rise-fall contours with manipulated parameters of the rise in the first and the fall in the second syllable were employed in the experiment. Modifications of pitch height were executed in 2 st steps, and of duration in 30ms steps. The subjects, who were native speakers of British English, Japanese and Russian, identified the sentence type of presented re-synthesized stimuli. The results of the experiments demonstrate overall similarity of the perception strategies across the three groups of subjects, especially regarding the thresholds of declarative' sentence type judgement. Non-declarative judgements are more language-specific. The results can be employed for the teaching of English, Japanese and Russian as foreign languages as well as for speech synthesis and recognition.

  • PDF

Voice Activity Detection Based on Signal Energy and Entropy-difference in Noisy Environments (엔트로피 차와 신호의 에너지에 기반한 잡음환경에서의 음성검출)

  • Ha, Dong-Gyung;Cho, Seok-Je;Jin, Gang-Gyoo;Shin, Ok-Keun
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.32 no.5
    • /
    • pp.768-774
    • /
    • 2008
  • In many areas of speech signal processing such as automatic speech recognition and packet based voice communication technique, VAD (voice activity detection) plays an important role in the performance of the overall system. In this paper, we present a new feature parameter for VAD which is the product of energy of the signal and the difference of two types of entropies. For this end, we first define a Mel filter-bank based entropy and calculate its difference from the conventional entropy in frequency domain. The difference is then multiplied by the spectral energy of the signal to yield the final feature parameter which we call PEED (product of energy and entropy difference). Through experiments. we could verify that the proposed VAD parameter is more efficient than the conventional spectral entropy based parameter in various SNRs and noisy environments.

A Study on the Multiple Pronunciation Dictionary for Spontaneous Speech Recognition (대화체 연속음성인식을 위한 확장 다중발음 사전에 관한 연구)

  • Kang ByungOk
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.65-68
    • /
    • 2003
  • 본 논문에서는 대화체 연속음성인식 과정에서 사용되는 다중발음사전의 개념을 확장하여 대화체 발화에 빈번하게 나타나는 불규칙한 발음변이 현상을 포용하도록 한 확장된 발음사전의 방법을 적용하여 대화체 연속음성인식에서 인식성능의 향상을 가져오게 됨을 실험을 통해 보여준다. 대화체 음성에서 빈번하게 나타나는 음운축약 및 음운탈락, 전형적인 오발화, 양성음의 음성음화 등의 발음변이는 언어모델의 효율성을 떨어뜨리고 어휘 수를 증가시켜 음성인식의 성능을 저하시키고, 또한 음성인식 결과로 나타나는 출력형태가 정형화되지 못하는 단점을 가지고 있다. 이에 이러한 발음변이들을 발음사전에 수용할 때 각각의 대표어휘에 대한 변이발음으로 처리하고, 언어모델과 어휘사전은 대표어휘만을 이용해 구성하도록 한다. 그리고, 음성인식기의 탐색부에서는 각각의 변이발음의 발음열도 탐색하되 대표어휘로 언어모델을 참조하도록 하고, 인식결과를 출력하도록 하여 결과적으로 인식성능을 향상시키고, 정형화된 출력패턴을 얻도록 한다. 본 연구에서는 어절단위 뿐 아니라 의사형태소[2] 단위의 발음사전에도 발음변이를 포용하도록 하여 실험을 하였다. 실험을 통해 어절단위의 다중발음사전 구성을 통해 ERR 10.9%, 의사형태소 단위의 다중발음 사전의 구성을 통해 ERR 4.3%의 성능향상을 보였다.

  • PDF

Corpus Based Unrestricted vocabulary Mandarin TTS (코퍼스 기반 무제한 단어 중국어 TTS)

  • Yu Zheng;Ha Ju-Hong;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.175-179
    • /
    • 2003
  • In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.

  • PDF

UNIVERSAL AND SPECIFIC FEATURES IN INTONATION PERCEPTION

  • Makarova, Veronika
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.139-148
    • /
    • 2000
  • This paper reports the results of an experimental phonetic study of intonation contrasts perception by speakers of British English, Japanese and Russian. Six series of re-synthesized two-syllable rise-fall contours with manipulated parameters of the rise in the first and the fall in the second syllable were employed in the experiment. Modifications of pitch height were executed in 2 st steps, and of duration - in 30ms steps. The subjects, who were native speakers of British English, Japanese and Russian, identified the sentence type of presented re-synthesized stimuli. The results of the experiments demonstrate overall similarity of the perception strategies across the three groups of subjects, especially regarding the thresholds of 'declarative' sentence type judgement. Non-declarative judgements are more language-specific. The results can be employed for the teaching of English, Japanese and Russian as foreign languages as well as for speech synthesis and recognition.

  • PDF

Pronunciation Lexicon Optimization with Applying Variant Selection Criteria (발음 변이의 발음사전 포함 결정 조건을 통한 발음사전 최적화)

  • Jeon, Je-Hun;Chung, Min-Hwa
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.24-27
    • /
    • 2006
  • This paper describes how a domain dependent pronunciation lexicon is generated and optimized for Korean large vocabulary continuous speech recognition(LVCSR). At the level of lexicon, pronunciation variations are usually modeled by adding pronunciation variants to the lexicon. We propose the criteria for selecting appropriate pronunciation variants in lexicon: (i) likelihood and (ii) frequency factors to select variants. Our experiment is conducted in three steps. First, the variants are generated with knowledge-based rules. Second, we generate a domain dependent lexicon which includes various numbers of pronunciation variants based on the proposed criteria. Finally, the WERs and RTFs are examined with each lexicon. In the experiment, 0.72% WER reduction is obtained by introducing the variants pruning criteria. Furthermore, RTF is not deteriorated although the average number of variants is higher than that of compared lexica.

  • PDF

Performance Improvement in Distant-Talking Speech Recognition by an Integration of N-best results using Naive Bayesian Network (다채널 마이크 환경에서 Naive Bayesian Network의 Decision에 의한 음성인식 성능향상)

  • Ji, Mi-kyong;Kim, Hoi-Rin
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.151-154
    • /
    • 2005
  • 원거리 음성인식에서 인식률의 성능향상을 위해 필수적인 다채널 마이크 환경에서 방 안의 도처에 분산되어있는 원거리 마이크를 사용하여 TV, 조명 등의 주변 환경을 음성으로 제어하고자 한다. 이를 위해 각 채널의 인식결과를 통합하여 최적의 결과를 얻고자 채널의N-best 결과와 N-best 결과에 포함된 hypothesis의 frame-normalized likelihood 값을 사용하여 Bayesian network을 훈련하고 인식결과를 통합하여 최선의 결과를 decision 하는데 사용함으로써 원거리 음성인식의 성능을 향상시키고 또한 hands-free 응용을 현실화하기위한 방향을 제시한다.

  • PDF