• Title/Summary/Keyword: Speech Recognition Technology

Search Result 530, Processing Time 0.025 seconds

The Korean Word Length Effect on Auditory Word Recognition (청각 단어 재인에서 나타난 한국어 단어길이 효과)

  • Choi Wonil;Nam Kichun
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.137-140
    • /
    • 2002
  • This study was conducted to examine the korean word length effects on auditory word recognition. Linguistically, word length can be defined by several sublexical units such as letters, phonemes, syllables, and so on. In order to investigate which units are used in auditory word recognition, lexical decision task was used. Experiment 1 and 2 showed that syllable length affected response time, and syllable length interacted with word frequency. As a result, in recognizing auditory word syllable length was important variable.

  • PDF

Speaker Recognition using PCA in Driving Car Environments (PCA를 이용한 자동차 주행 환경에서의 화자인식)

  • Yu, Ha-Jin
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.103-106
    • /
    • 2005
  • The goal of our research is to build a text independent speaker recognition system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severally degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(Principal component analysis) without dimension reduction can greatly increase the performance to a level close to matched condition. The error rate is reduced more by the proposed augmented PCA, which augment an axis to the feature vectors of the most confusable pairs of speakers before PCA

  • PDF

Robust Speech Recognition Algorithm of Voice Activated Powered Wheelchair for Severely Disabled Person (중증 장애우용 음성구동 휠체어를 위한 강인한 음성인식 알고리즘)

  • Suk, Soo-Young;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.250-258
    • /
    • 2007
  • Current speech recognition technology s achieved high performance with the development of hardware devices, however it is insufficient for some applications where high reliability is required, such as voice control of powered wheelchairs for disabled persons. For the system which aims to operate powered wheelchairs safely by voice in real environment, we need to consider that non-voice commands such as user s coughing, breathing, and spark-like mechanical noise should be rejected and the wheelchair system need to recognize the speech commands affected by disability, which contains specific pronunciation speed and frequency. In this paper, we propose non-voice rejection method to perform voice/non-voice classification using both YIN based fundamental frequency(F0) extraction and reliability in preprocessing. We adopted a multi-template dictionary and acoustic modeling based speaker adaptation to cope with the pronunciation variation of inarticulately uttered speech. From the recognition tests conducted with the data collected in real environment, proposed YIN based fundamental extraction showed recall-precision rate of 95.1% better than that of 62% by cepstrum based method. Recognition test by a new system applied with multi-template dictionary and MAP adaptation also showed much higher accuracy of 99.5% than that of 78.6% by baseline system.

Acoustic Channel Compensation at Mel-frequency Spectrum Domain

  • Jeong, So-Young;Oh, Sang-Hoon;Lee, Soo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1E
    • /
    • pp.43-48
    • /
    • 2003
  • The effects of linear acoustic channels have been analyzed and compensated at mel-frequency feature domain. Unlike popular RASTA filtering our approach incorporates separate filters for each mel-frequency band, which results in better recognition performance for heavy-reverberated speeches.

An Overview and Market Review of Speaker Recognition Technology (화자인식 기술 및 국내외시장 동향)

  • Yu, Ha-Jin
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.91-97
    • /
    • 2004
  • We provide a brief overview of the area of speaker recognition, describing underlying techniques and current market review. We describe the techniques mainly based on GMM(gaussian mixture model) that is the most prevalent and effective approach. Following the technical overview, we will outline the market review of the area inside and outside of the country.

  • PDF

Spectral Feature Transformation for Compensation of Microphone Mismatches

  • Jeong, So-Young;Oh, Sang-Hoon;Lee, Soo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4E
    • /
    • pp.150-154
    • /
    • 2003
  • The distortion effects of microphones have been analyzed and compensated at mel-frequency feature domain. Unlike popular bias removal algorithms a linear transformation of mel-frequency spectrum is incorporated. Although a diagonal matrix transformation is sufficient for medium-quality microphones, a full-matrix transform is required for low-quality microphones with severe nonlinearity. Proposed compensation algorithms are tested with HTIMIT database, which resulted in about 5 percents improvements in recognition rate over conventional CMS algorithm.

The text-to-speech system assessment based on word frequency and word regularity effects (단어빈도와 단어규칙성 효과에 기초한 합성음 평가)

  • Nam Kichun;Choi Wonil;Lee Donghoon;Koo Minmo;Kim Jongjin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.105-108
    • /
    • 2002
  • In the present study, the intelligibility of the synthesized speech sounds was evaluated by using the psycholinguistic and fMRI techniques, In order to see the difference in recognizing words between the natural and synthesized speech sounds, word regularity and word frequency were varied. The results of Experiment1 and Experiment2 showed that the intelligibility difference of the synthesized speech comes from word regularity. There were smaller activation of the auditory areas in brain and slower recognition time for the regular words.

  • PDF

Development of FSN-based Large Vocabulary Continuous Speech Recognition System (FSN 기반의 대어휘 연속음성인식 시스템 개발)

  • Park, Jeon-Gue;Lee, Yun-Keun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.327-329
    • /
    • 2007
  • This paper presents a FSN-based LVCSR system and it's application to the speech TV program guide. Unlike the most popular statistical language model-based system, we used FSN grammar based on the graph theory-based FSN optimization algorithm and knowledge-based advanced word boundary modeling. For the memory and latency efficiency, we implemented the dynamic pruning scheduling based on the histogram of active words and their likelihood distribution. We achieved a 10.7% word accuracy improvement with 57.3% speedup.

  • PDF

A Numerical Speech Recognition by Parameters Estimated from the Data on the Estimated Plane and a Neural Network (추정평면에서 평가한 데이터와 인공신경망에 의한 숫자음 인식)

  • Choi, Il-Hong;Jang, Seung-Kwan;Cha, Tae-Hoo;Choi, Ung-Se;Kim, Chang-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.58-64
    • /
    • 1996
  • This paper was proposed the recognition method by using parameters which was estimated from the data on the estimated plane and a neural network. After the LPC estimated in each frame algorithm was mapped to the estimated plane by the optimum feature mapping function, we estimated the C-LPC and the maximum and minimum value and 3 divided power from the mapping data on the estimated plane. As a result of the experiment of the speech recognition that those parameters were applied to the input of a neural network, it was found that those parameters estimated from the estimated plane have the features of the original speech for a change in the time scale and that the recongnition rate by the proposed methods was 96.3 percent.

  • PDF

Speech Recognition Error Detection Using Deep Learning (딥 러닝을 이용한 음성인식 오류 판별 방법)

  • Kim, Hyun-Ho;Yun, Seung;Kim, Sang-Hun
    • Annual Conference on Human and Language Technology
    • /
    • 2015.10a
    • /
    • pp.157-162
    • /
    • 2015
  • 자동통역(Speech-to-speech translation)의 최우선 단계인 음성인식과정에서 발생한 오류문장은 대부분 비문법적 구조를 갖거나 의미를 이해할 수 없는 문장들이다. 이러한 문장으로 자동번역을 할 경우 심각한 통역오류가 발생하게 되어 이에 대한 개선이 반드시 필요한 상황이다. 이에 본 논문에서는 음성인식 오류문장이 정상적인 인식문장에 비해 비문법적이거나 무의미하다는 특징을 이용하여 DNN(Deep Neural Network) 기반 음성인식오류 판별기를 구현하였으며 84.20%의 오류문장 분류성능결과를 얻었다.

  • PDF