• Title/Summary/Keyword: 'Speech recognition

검색결과 2,053건 처리시간 0.03초

Acoustic Driving Simulator Design for Evaluating an In-car Speech Recognizer

  • Lee, Seongjae;Kang, Sunmee
    • 말소리와 음성과학
    • /
    • 제5권2호
    • /
    • pp.93-97
    • /
    • 2013
  • This paper is on designing an indoor driving simulator to evaluate the performance of in-car speech recognizer when influenced by the elements, which lower the success rate of speech recognition. The proposed simulator simulates vehicle noise which was pre-recorded in diverse driving environments and driver's speech. Additionally, the proposed Lombard effect conversion module in this simulator enables the speech recorded in a studio environment to convert into various possible driving scenarios. The relevant experimental results have confirmed that the proposed simulator is a feasible approach for realizing an effective method as it achieved similar speech recognition results to the real driving environment.

음성기반 멀티모달 사용자 인터페이스의 사용성 평가 방법론 (Usability Test Guidelines for Speech-Oriented Multimodal User Interface)

  • 홍기형
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.103-120
    • /
    • 2008
  • Basic components for multimodal interface, such as speech recognition, speech synthesis, gesture recognition, and multimodal fusion, have their own technological limitations. For example, the accuracy of speech recognition decreases for large vocabulary and in noisy environments. In spite of those technological limitations, there are lots of applications in which speech-oriented multimodal user interfaces are very helpful to users. However, in order to expand application areas for speech-oriented multimodal interfaces, we have to develop the interfaces focused on usability. In this paper, we introduce usability and user-centered design methodology in general. There has been much work for evaluating spoken dialogue systems. We give a summary for PARADISE (PARAdigm for Dialogue System Evaluation) and PROMISE (PROcedure for Multimodal Interactive System Evaluation) that are the generalized evaluation frameworks for voice and multimodal user interfaces. Then, we present usability components for speech-oriented multimodal user interfaces and usability testing guidelines that can be used in a user-centered multimodal interface design process.

  • PDF

ETRI 방송뉴스음성인식시스템 소개 (Introduction of ETRI Broadcast News Speech Recognition System)

  • 박준
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 춘계 학술대회 발표논문집
    • /
    • pp.89-93
    • /
    • 2006
  • This paper presents ETRI broadcast news speech recognition system. There are two major issues on the broadcast news speech recognition: 1) real-time processing and 2) out-of-vocabulary handling. For real-time processing, we devised the dual decoder architecture. The input speech signal is segmented based on the long-pause between utterances, and each decoder processes the speech segment alternatively. One decoder can start to recognize the current speech segment without waiting for the other decoder to recognize the previous speech segment completely. Thus, the processing delay is not accumulated. For out-of-vocabulary handling, we updated both the vocabulary and the language model, based on the recent news articles on the internet. By updating the language model as well as the vocabulary, we can improve the performance up to 17.2% ERR.

  • PDF

Telephone Speech Recognition with Data-Driven Selective Temporal Filtering based on Principal Component Analysis

  • Jung Sun Gyun;Son Jong Mok;Bae Keun Sung
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2004년도 학술대회지
    • /
    • pp.764-767
    • /
    • 2004
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis.

  • PDF

RASTA 필터를 이용한 립리딩 성능향상에 관한 연구 (A Study on Lip-reading enhancement using RATSTA fileter)

  • 신도성;김진영;최승호;김상훈
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.191-194
    • /
    • 2002
  • Lip-reading technology that is studied them is used to compensate speech recognition degradation in noise environment in bi-modal's form. The most important thing is that search for correct lips area in this lip-reading. But, it is hard to forecast stable performance in dynamic environment. Used RASTA filter that show good performance to remove noise in the speech to compensate. This filter shows that improve performance of using time domain of digital filter. To this experiment observes performance of speech recognition only using image information, service chooses possible 22 words and did recognition experiment in car. We used hidden Markov model by speech recognition algorithm to compare this words' recognition performance.

  • PDF

Semantic-Oriented Error Correction for Voice-Activated Information Retrieval System

  • Yoon, Yong-Wook;Kim, Byeong-Chang;Lee, Gary-Geunbae
    • 대한음성학회지:말소리
    • /
    • 제44호
    • /
    • pp.115-130
    • /
    • 2002
  • Voice input is often required in many new application environments, but the low rate of speech recognition makes it difficult to extend its application. Previous approaches were to raise the accuracy of the recognition by post-processing of the recognition results, which were all lexical-oriented. We suggest a new semantic-oriented approach in speech recognition error correction. Through experiments using a speech-driven in-vehicle telematics information application, we show the excellent performance of our approach and some advantages it has as a semantic-oriented approach over a pure lexical-oriented approach.

  • PDF

음성인식을 위한 새로운 포만트트랙킹 알고리즘의 제안과 평가 (An Proposal and Evaluation of the New formant Tracking Algorithm for Speech Recognition)

  • 송정영
    • 인터넷정보학회논문지
    • /
    • 제3권4호
    • /
    • pp.51-59
    • /
    • 2002
  • 본 논문에서는, 음성인식을 위한 한가지 방법으로 새로운 포만트 트랙킹 알고리즘을 제안한다. 본 연구에서는 실험을 위한 인식 데이터로 한국어 숫자음성을 사용하였다. 새롭게 제안한 알고리즘을 사용하여 인식실험을 한 결과, 숫자음성 300개에 대한 인식률은 91%의 결과를 얻었다. 본 연구의 새로운 알고리즘은, 인식실험을 통하여 그 유효성이 확인되었다.

  • PDF

음성의 감성요소 추출을 통한 감성 인식 시스템 (The Emotion Recognition System through The Extraction of Emotional Components from Speech)

  • 박창현;심귀보
    • 제어로봇시스템학회논문지
    • /
    • 제10권9호
    • /
    • pp.763-770
    • /
    • 2004
  • The important issue of emotion recognition from speech is a feature extracting and pattern classification. Features should involve essential information for classifying the emotions. Feature selection is needed to decompose the components of speech and analyze the relation between features and emotions. Specially, a pitch of speech components includes much information for emotion. Accordingly, this paper searches the relation of emotion to features such as the sound loudness, pitch, etc. and classifies the emotions by using the statistic of the collecting data. This paper deals with the method of recognizing emotion from the sound. The most important emotional component of sound is a tone. Also, the inference ability of a brain takes part in the emotion recognition. This paper finds empirically the emotional components from the speech and experiment on the emotion recognition. This paper also proposes the recognition method using these emotional components and the transition probability.

한국어 연속음성에서의 조사 및 어미 인식에 관한 연구 (A Study on Recognition of Korean Postpositions and Suffixes in Continuous Speech)

  • 송민석;이기영
    • 음성과학
    • /
    • 제6권
    • /
    • pp.181-195
    • /
    • 1999
  • This study proposes a method of recognizing postpositions and suffixes in Korean spoken language, using prosodic information. We detect grammatical boundaries automatically at first, by using prosodic information of the accentual phrase, and then we recognize grammatical function words by backward-tracking from the boundaries. The experiment employs 300 sentential speech data of 10 men's and 5 women's voice spoken in standard Korean, in which 1080 accentual phrases and 11 postpositions and suffixes are included. The result shows the recognition rate of postpositions in two cases. In one case in which only correctly detected boundaries are included, the recognition rate is 97.5%, and in the other case in which all detected boundaries are included, the recognition rate is 74.8%.

  • PDF

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

  • Chao, Hao;Song, Cheng
    • Journal of Information Processing Systems
    • /
    • 제12권3호
    • /
    • pp.410-421
    • /
    • 2016
  • In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.