• Title/Summary/Keyword: Speech Recognition Technology

Search Result 531, Processing Time 0.029 seconds

Robustness of Bimodal Speech Recognition on Degradation of Lip Parameter Estimation Performance (음성인식에서 입술 파라미터 열화에 따른 견인성 연구)

  • Kim Jinyoung;Shin Dosung;Choi Seungho
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.205-208
    • /
    • 2002
  • Bimodal speech recognition based on lip reading has been studied as a representative method of speech recognition under noisy environments. There are three integration methods of speech and lip modalities as like direct identification, separate identification and dominant recording. In this paper we evaluate the robustness of lip reading methods under the assumption that lip parameters are estimated with errors. We show that the dominant recording approach is more robust than other methods with lip reading experiments. Also, a measure of lip parameter degradation is proposed. This measure can be used in the determination of weighting values of video information.

  • PDF

A Phonetics Based Design of PLU Sets for Korean Speech Recognition (한국어 음성인식을 위한 음성학 기반의 유사음소단위 집합 설계)

  • Hong, Hye-Jin;Kim, Sun-Hee;Chung, Min-Hwa
    • MALSORI
    • /
    • no.65
    • /
    • pp.105-124
    • /
    • 2008
  • This paper presents the effects of different phone-like-unit (PLU) sets in order to propose an optimal PLU set for the performance improvement of Korean automatic speech recognition (ASR) systems. The examination of 9 currently used PLU sets indicates that most of them include a selection of allophones without any sufficient phonetic base. In this paper, a total of 34 PLU sets are designed based on Korean phonetic characteristics arid the effects of each PLU set are evaluated through experiments. The results show that the accuracy rate of each phone is influenced by different phonetic constraint(s) which determine(s) the PLU sets, and that an optimal PLU set can be anticipated through the phonetic analysis of the given speech data.

  • PDF

A Voice-Activated Dialing System with Distributed Speech Recognition in WiFi Environments (무선랜 환경에서의 분산 음성 인식을 이용한 음성 다이얼링 시스템)

  • Park Sung-Joon;Koo Myoung_wan
    • MALSORI
    • /
    • no.56
    • /
    • pp.135-145
    • /
    • 2005
  • In this paper, a WiFi phone system with distributed speech recognition is implemented. The WiFi phone with voice-activated dialing and its functions are explained. Features of the input speech are extracted and are sent to the interactive voice response (IVR) server according to the real-time transport protocol (RTP). Feature extraction is based on the European Telecommunication Standards Institute (ETSI) standard front-end, but is modified to reduce the processing time. The time for front-end processing on a WiFi phone is compared with that in a PC.

  • PDF

The Validation of Speech Recognition Performance Change according to the kind and established distance of the Microphone (마이크로폰의 종류 및 설치거리에 따른 음성인식성능변화의 검토)

  • Kim Yoen-Whoa;Lee Kwang-Hyun;Choi Dae-Lim;Kim Bong-Wan;Lee Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.141-143
    • /
    • 2003
  • Speech recognition performance depends on various factors. One of the factors is the characteristic and established distance of a microphone which is used when speech data is collected. Thus, in the present experiment speech databases for tests are created through the type and established distance of a microphone. Then, acoustic models are built based on these databases, and each of the acoustic models is assessed by the data to determine recognition performance depending on various microphones and established microphone distances.

  • PDF

Codebook design for subspace distribution clustering hidden Markov model (Subspace distribution clustering hidden Markov model을 위한 codebook design)

  • Cho, Young-Kyu;Yook, Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.87-90
    • /
    • 2005
  • Today's state-of the-art speech recognition systems typically use continuous distribution hidden Markov models with the mixtures of Gaussian distributions. To obtain higher recognition accuracy, the hidden Markov models typically require huge number of Gaussian distributions. Such speech recognition systems have problems that they require too much memory to run, and are too slow for large applications. Many approaches are proposed for the design of compact acoustic models. One of those models is subspace distribution clustering hidden Markov model. Subspace distribution clustering hidden Markov model can represent original full-space distributions as some combinations of a small number of subspace distribution codebooks. Therefore, how to make the codebook is an important issue in this approach. In this paper, we report some experimental results on various quantization methods to make more accurate models.

  • PDF

Sound's Direction Detection and Speech Recognition System for Humanoid Active Audition

  • Kim, Hyun-Don;Choi, Jong-Suk;Lee, Chang-Hoon;Park, Gwi-Tea;Kim, Mun-Sang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.633-638
    • /
    • 2003
  • In this paper, we propose a humanoid active audition system which detects the direction of sound and performs speech recognition using just three microphones. Compared with previous researches, this system which has simpler algorithm, fewer microphones and better amplifier shows better performance. In order to verify our system's performance, we install the proposed active audition system to the home service robot, called Hombot II, which has been developed at the KIST (Korea Institute of Science and Technology), thus we confirm excellent performance by experimental results

  • PDF

Noise removal algorithm for intelligent service robots in the high noise level environment (원거리 음성인식 시스템의 잡음 제거 기법에 대한 연구)

  • Woo, Sung-Min;Lee, Sang-Hoon;Jeong, Hong
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.413-414
    • /
    • 2007
  • Successful speech recognition in noisy environments for intelligent robots depends on the performance of preprocessing elements employed. We propose an architecture that effectively combines adaptive beamforming (ABF) and blind source separation (BSS) algorithms in the spatial domain to avoid permutation ambiguity and heavy computational complexity. We evaluated the structure and assessed its performance with a DSP module. The experimental results of speech recognition test shows that the proposed combined system guarantees high speech recognition rate in the noisy environment and better performance than the ABF and BSS system.

  • PDF

Speaker Identification Using Augmented PCA in Unknown Environments (부가 주성분분석을 이용한 미지의 환경에서의 화자식별)

  • Yu, Ha-Jin
    • MALSORI
    • /
    • no.54
    • /
    • pp.73-83
    • /
    • 2005
  • The goal of our research is to build a text-independent speaker identification system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severely degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(principal component analysis) can improve the performance in the situation. We also propose an augmented PCA process, which augments class discriminative information to the original feature vectors before PCA transformation and selects the best direction for each pair of highly confusable speakers. The proposed method reduced the relative recognition error by 21%.

  • PDF

A study on Effective Feature Parameters Comparison for Speaker Recognition (화자인식에 효과적인 특징벡터에 관한 비교연구)

  • Park TaeSun;Kim Sang-Jin;Kwang Moon;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.145-148
    • /
    • 2003
  • In this paper, we carried out comparative study about various feature parameters for the effective speaker recognition such as LPC, LPCC, MFCC, Log Area Ratio, Reflection Coefficients, Inverse Sine, and Delta Parameter. We also adopted cepstral liftering and cepstral mean subtraction methods to check their usefulness. Our recognition system is HMM based one with 4 connected-Korean-digit speech database. Various experimental results will help to select the most effective parameter for speaker recognition.

  • PDF

Acoustic and Pronunciation Model Adaptation Based on Context dependency for Korean-English Speech Recognition (한국인의 영어 인식을 위한 문맥 종속성 기반 음향모델/발음모델 적응)

  • Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro
    • MALSORI
    • /
    • v.68
    • /
    • pp.33-47
    • /
    • 2008
  • In this paper, we propose a hybrid acoustic and pronunciation model adaptation method based on context dependency for Korean-English speech recognition. The proposed method is performed as follows. First, in order to derive pronunciation variant rules, an n-best phoneme sequence is obtained by phone recognition. Second, we decompose each rule into a context independent (CI) or a context dependent (CD) one. To this end, it is assumed that a different phoneme structure between Korean and English makes CI pronunciation variabilities while coarticulation effects are related to CD pronunciation variabilities. Finally, we perform an acoustic model adaptation and a pronunciation model adaptation for CI and CD pronunciation variabilities, respectively. It is shown from the Korean-English speech recognition experiments that the average word error rate (WER) is decreased by 36.0% when compared to the baseline that does not include any adaptation. In addition, the proposed method has a lower average WER than either the acoustic model adaptation or the pronunciation model adaptation.

  • PDF