• Title/Summary/Keyword: Speech Recognition Technology

Search Result 522, Processing Time 0.022 seconds

The Robot Speech Recognition using TMS320VC5510 DSK (TMS320VC5510 DSK를 이용한 음성인식 로봇)

  • Choi, Ji-Hyun;Chung, Ik-Joo
    • Journal of Industrial Technology
    • /
    • v.27 no.A
    • /
    • pp.211-218
    • /
    • 2007
  • As demands for interaction of humans and robots are increasing, robots are expected to be equipped with intelligibility which humans have. Especially, for natural communication, hearing capabilities are so essential that speech recognition technology for robot is getting more important. In this paper, we implement a speech recognizer suitable for robot applications. One of the major problem in robot speech recognition is poor speech quality captured when a speaker talks distant from the microphone a robot is mounted with. To cope with this problem, we used wireless transmission of commands recognized by the speech recognizer implemented using TMS320VC5510 DSK. In addition, as for implementation, since TMS320VC5510 DSP is a fixed-point device, we represent efficient realization of HMM algorithm using fixed-point arithmetic.

  • PDF

Comparison of Speech Intelligibility & Performance of Speech Recognition in Real Driving Environments (자동차 주행 환경에서의 음성 전달 명료도와 음성 인식 성능 비교)

  • Lee Kwang-Hyun;Choi Dae-Lim;Kim Young-Il;Kim Bong-Wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.50
    • /
    • pp.99-110
    • /
    • 2004
  • The normal transmission characteristics of sound are hardly obtained due to the various noises and structural factors in a running car environment. It is due to the channel distortion of the original source sound recorded by microphones, and it seriously degrades the performance of the speech recognition in real driving environments. In this paper we analyze the degree of intelligibility under the various sound distortion environments by channels according to driving speed with respect to speech transmission index(STI) and compare the STI with rates of speech recognition. We examine the correlation between measures of intelligibility depending on sound pick-up patterns and performance in speech recognition. Thereby we consider the optimal location of a microphone in single channel environment. In experimentation we find that high correlation is obtained between STI and rates of speech recognition.

  • PDF

A Study on Design and Implementation of Embedded System for speech Recognition Process

  • Kim, Jung-Hoon;Kang, Sung-In;Ryu, Hong-Suk;Lee, Sang-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.201-206
    • /
    • 2004
  • This study attempted to develop a speech recognition module applied to a wheelchair for the physically handicapped. In the proposed speech recognition module, TMS320C32 was used as a main processor and Mel-Cepstrum 12 Order was applied to the pro-processor step to increase the recognition rate in a noisy environment. DTW (Dynamic Time Warping) was used and proven to be excellent output for the speaker-dependent recognition part. In order to utilize this algorithm more effectively, the reference data was compressed to 1/12 using vector quantization so as to decrease memory. In this paper, the necessary diverse technology (End-point detection, DMA processing, etc.) was managed so as to utilize the speech recognition system in real time

Multi-stage Speech Recognition Using Confidence Vector (신뢰도 벡터 기반의 다단계 음성인식)

  • Jeon, Hyung-Bae;Hwang, Kyu-Woong;Chung, Hoon;Kim, Seung-Hi;Park, Jun;Lee, Yun-Keun
    • MALSORI
    • /
    • no.63
    • /
    • pp.113-124
    • /
    • 2007
  • In this paper, we propose a use of confidence vector as an intermediate input feature for multi-stage based speech recognition architecture to improve recognition accuracy. A multi-stage speech recognition structure is introduced as a method to reduce the computational complexity of the decoding procedure and then accomplish faster speech recognition. Conventional multi-stage speech recognition is usually composed of three stages, acoustic search, lexical search, and acoustic re-scoring. In this paper, we focus on improving the accuracy of the lexical decoding by introducing a confidence vector as an input feature instead of phoneme which was used typically. We take experimental results on 220K Korean Point-of-Interest (POI) domain and the experimental results show that the proposed method contributes on improving accuracy.

  • PDF

A Study on Noise-Robust Methods for Broadcast News Speech Recognition (방송뉴스 인식에서의 잡음 처리 기법에 대한 고찰)

  • Chung Yong-joo
    • MALSORI
    • /
    • no.50
    • /
    • pp.71-83
    • /
    • 2004
  • Recently, broadcast news speech recognition has become one of the most attractive research areas. If we can transcribe automatically the broadcast news and store their contents in the text form instead of the video or audio signal itself, it will be much easier for us to search for the multimedia databases to obtain what we need. However, the desirable speech signal in the broadcast news are usually affected by the interfering signals such as the background noise and/or the music. Also, the speech of the reporter who is speaking over the telephone or with the ill-conditioned microphone is severely distorted by the channel effect. The interfered or distorted speech may be the main reason for the poor performance in the broadcast news speech recognition. In this paper, we investigated some methods to cope with the problems and we could see some performance improvements in the noisy broadcast news speech recognition.

  • PDF

Performance Analysis of Noisy Speech Recognition Depending on Parameters for Noise and Signal Power Estimation in MMSE-STSA Based Speech Enhancement (MMSE-STSA 기반의 음성개선 기법에서 잡음 및 신호 전력 추정에 사용되는 파라미터 값의 변화에 따른 잡음음성의 인식성능 분석)

  • Park Chul-Ho;Bae Keun-Sung
    • MALSORI
    • /
    • no.57
    • /
    • pp.153-164
    • /
    • 2006
  • The MMSE-STSA based speech enhancement algorithm is widely used as a preprocessing for noise robust speech recognition. It weighs the gain of each spectral bin of the noisy speech using the estimate of noise and signal power spectrum. In this paper, we investigate the influence of parameters used to estimate the speech signal and noise power in MMSE-STSA upon the recognition performance of noisy speech. For experiments, we use the Aurora2 DB which contains noisy speech with subway, babble, car, and exhibition noises. The HTK-based continuous HMM system is constructed for recognition experiments. Experimental results are presented and discussed with our findings.

  • PDF

Comparison of HMM models and various cepstral coefficients for Korean whispered speech recognition (은닉 마코프 모델과 켑스트럴 계수들에 따른 한국어 속삭임의 인식 비교)

  • Park, Chan-Eung
    • 전자공학회논문지 IE
    • /
    • v.43 no.2
    • /
    • pp.22-29
    • /
    • 2006
  • Recently the use of whispered speech has increased due to mobile phone and the necessity of whispered speech recognition is increasing. So various feature vectors, which are mainly used for speech recognition, are applied to their HMMs, normal speech models, whispered speech models, and integrated models with normal speech and whispered speech so as to find out suitable recognition system for whispered speech. The experimental results of recognition test show that the recognition rate of whispered speech applied to normal speech models is too low to be used in practical applications, but separate whispered speech models recognize whispered speech with the highest rates at least 85%. And also integrated models with normal speech and whispered speech score acceptable recognition rate but more study is needed to increase recognition rate. MFCE and PLCC feature vectors score higher recognition rate when applied to separate whispered speech models, but PLCC is the best when a lied to integrated models with normal speech and whispered speech.

Semantic-Oriented Error Correction for Voice-Activated Information Retrieval System

  • Yoon, Yong-Wook;Kim, Byeong-Chang;Lee, Gary-Geunbae
    • MALSORI
    • /
    • no.44
    • /
    • pp.115-130
    • /
    • 2002
  • Voice input is often required in many new application environments, but the low rate of speech recognition makes it difficult to extend its application. Previous approaches were to raise the accuracy of the recognition by post-processing of the recognition results, which were all lexical-oriented. We suggest a new semantic-oriented approach in speech recognition error correction. Through experiments using a speech-driven in-vehicle telematics information application, we show the excellent performance of our approach and some advantages it has as a semantic-oriented approach over a pure lexical-oriented approach.

  • PDF

A Basic Performance Evaluation of the Speech Recognition APP of Standard Language and Dialect using Google, Naver, and Daum KAKAO APIs (구글, 네이버, 다음 카카오 API 활용앱의 표준어 및 방언 음성인식 기초 성능평가)

  • Roh, Hee-Kyung;Lee, Kang-Hee
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.12
    • /
    • pp.819-829
    • /
    • 2017
  • In this paper, we describe the current state of speech recognition technology and identify the basic speech recognition technology and algorithms first, and then explain the code flow of API necessary for speech recognition technology. We use the application programming interface (API) of Google, Naver, and Daum KaKao, which have the most famous search engine among the speech recognition APIs, to create a voice recognition app in the Android studio tool. Then, we perform a speech recognition experiment on people's standard words and dialects according to gender, age, and region, and then organize the recognition rates into a table. Experiments were conducted on the Gyeongsang-do, Chungcheong-do, and Jeolla-do provinces where the degree of tongues was severe. And Comparative experiments were also conducted on standardized dialects. Based on the resultant sentences, the accuracy of the sentence is checked based on spacing of words, final consonant, postposition, and words and the number of each error is represented by a number. As a result, we aim to introduce the advantages of each API according to the speech recognition rate, and to establish a basic framework for the most efficient use.

On the Evaluation of Speech Recognition Systems (음성 인식 기술 평가 동향)

  • Yu, Ha-Jin;Kim, Dong-hyun;Yook, Dong-suk
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.201-206
    • /
    • 2005
  • We present a survey on the evaluation methods of speech recognition technology and propose a procedure for evaluating Korean speech recognition systems. Currently there are various kinds of evaluation events conducted by NIST and ELDA every year. In this paper, we introduce these activities, and propose an evaluation procedure for Korean speech recognition systems. In designing the procedure, we consider the characteristics of Korean language, as well as the trends of Korean speech technology industry.

  • PDF