• Title/Summary/Keyword: Robust voice recognition

Search Result 33, Processing Time 0.022 seconds

A Study On Intelligent Robot Control Based On Voice Recognition For Smart FA (스마트 FA를 위한 음성인식 지능로봇제어에 관한 연구)

  • Sim, H.S.;Kim, M.S.;Choi, M.H.;Bae, H.Y.;Kim, H.J.;Kim, D.B.;Han, S.H.
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.21 no.2
    • /
    • pp.87-93
    • /
    • 2018
  • This Study Propose A New Approach To Impliment A Intelligent Robot Control Based on Voice Recognition For Smart Factory Automation Since human usually communicate each other by voices, it is very convenient if voice is used to command humanoid robots or the other type robot system. A lot of researches has been performed about voice recognition systems for this purpose. Hidden Markov Model is a robust statistical methodology for efficient voice recognition in noise environments. It has being tested in a wide range of applications. A prediction approach traditionally applied for the text compression and coding, Prediction by Partial Matching which is a finite-context statistical modeling technique and can predict the next characters based on the context, has shown a great potential in developing novel solutions to several language modeling problems in speech recognition. It was illustrated the reliability of voice recognition by experiments for humanoid robot with 26 joints as the purpose of application to the manufacturing process.

Robust Speech Recognition Algorithm of Voice Activated Powered Wheelchair for Severely Disabled Person (중증 장애우용 음성구동 휠체어를 위한 강인한 음성인식 알고리즘)

  • Suk, Soo-Young;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.250-258
    • /
    • 2007
  • Current speech recognition technology s achieved high performance with the development of hardware devices, however it is insufficient for some applications where high reliability is required, such as voice control of powered wheelchairs for disabled persons. For the system which aims to operate powered wheelchairs safely by voice in real environment, we need to consider that non-voice commands such as user s coughing, breathing, and spark-like mechanical noise should be rejected and the wheelchair system need to recognize the speech commands affected by disability, which contains specific pronunciation speed and frequency. In this paper, we propose non-voice rejection method to perform voice/non-voice classification using both YIN based fundamental frequency(F0) extraction and reliability in preprocessing. We adopted a multi-template dictionary and acoustic modeling based speaker adaptation to cope with the pronunciation variation of inarticulately uttered speech. From the recognition tests conducted with the data collected in real environment, proposed YIN based fundamental extraction showed recall-precision rate of 95.1% better than that of 62% by cepstrum based method. Recognition test by a new system applied with multi-template dictionary and MAP adaptation also showed much higher accuracy of 99.5% than that of 78.6% by baseline system.

Multi-resolution DenseNet based acoustic models for reverberant speech recognition (잔향 환경 음성인식을 위한 다중 해상도 DenseNet 기반 음향 모델)

  • Park, Sunchan;Jeong, Yongwon;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.33-38
    • /
    • 2018
  • Although deep neural network-based acoustic models have greatly improved the performance of automatic speech recognition (ASR), reverberation still degrades the performance of distant speech recognition in indoor environments. In this paper, we adopt the DenseNet, which has shown great performance results in image classification tasks, to improve the performance of reverberant speech recognition. The DenseNet enables the deep convolutional neural network (CNN) to be effectively trained by concatenating feature maps in each convolutional layer. In addition, we extend the concept of multi-resolution CNN to multi-resolution DenseNet for robust speech recognition in reverberant environments. We evaluate the performance of reverberant speech recognition on the single-channel ASR task in reverberant voice enhancement and recognition benchmark (REVERB) challenge 2014. According to the experimental results, the DenseNet-based acoustic models show better performance than do the conventional CNN-based ones, and the multi-resolution DenseNet provides additional performance improvement.

A Study on Speech Recognition in a Running Automobile (주행중인 자동차 환경에서의 음성인식 연구)

  • 양진우;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.3-8
    • /
    • 2000
  • In this paper, we studied design and implementation of a robust speech recognition system in noisy car environment. The reference pattern used in the system is DMS(Dynamic Multi-Section). Two separate acoustic models, which are selected automatically depending on the noisy car environment for the speech in a car moving at below 80km/h and over 80km/h are proposed. PLP(Perceptual Linear Predictive) of order 13 is used for the feature vector and OSDP (One-Stage Dynamic Programming) is used for decoding. The system also has the function of editing the phone-book for voice dialing. The system yields a recognition rate of 89.75% for male speakers in SI (speaker independent) mode in a car running on a cemented express way at over 80km/h with a vocabulary of 33 words. The system also yields a recognition rate of 92.29% for male speakers in SI mode in a car running on a paved express way at over 80km/h.

  • PDF

Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition (음성인식기 성능 향상을 위한 영상기반 음성구간 검출 및 적응적 문턱값 추정)

  • Song, Taeyup;Lee, Kyungsun;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.321-327
    • /
    • 2015
  • In this paper, we propose an algorithm for achieving robust Visual Voice Activity Detection (VVAD) for enhanced speech recognition. In conventional VVAD algorithms, the motion of lip region is found by applying an optical flow or Chaos inspired measures for detecting visual speech frames. The optical flow-based VVAD is difficult to be adopted to driving scenarios due to its computational complexity. While invariant to illumination changes, Chaos theory based VVAD method is sensitive to motion translations caused by driver's head movements. The proposed Local Variance Histogram (LVH) is robust to the pixel intensity changes from both illumination change and translation change. Hence, for improved performance in environmental changes, we adopt the novel threshold estimation using total variance change. In the experimental results, the proposed VVAD algorithm achieves robustness in various driving situations.

Noise-Robust Speech Detection Using The Coefficient of Variation of Spectrum (스펙트럼의 변동계수를 이용한 잡음에 강인한 음성 구간 검출)

  • Kim Youngmin;Hahn Minsoo
    • MALSORI
    • /
    • no.48
    • /
    • pp.107-116
    • /
    • 2003
  • This paper deals with a new parameter for voice detection which is used for many areas of speech engineering such as speech synthesis, speech recognition and speech coding. CV (Coefficient of Variation) of speech spectrum as well as other feature parameters is used for the detection of speech. CV is calculated only in the specific range of speech spectrum. Average magnitude and spectral magnitude are also employed to improve the performance of detector. From the experimental results the proposed voice detector outperformed the conventional energy-based detector in the sense of error measurements.

  • PDF

Robust Voice Activity Detection in Noisy Environment Using Entropy and Harmonics Detection (엔트로피와 하모닉 검출을 이용한 잡음환경에 강인한 음성검출)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.1
    • /
    • pp.169-174
    • /
    • 2010
  • This paper explains end-point detection method for better speech recognition rates. The proposed method determines speech and non-speech region with the entropy and the harmonic detection of speech. The end-point detection using entropy on the speech spectral energy has good performance at the high SNR(SNR 15dB) environments. At the low SNR environment(SNR 0dB), however, the threshold level of speech and noise varies, so the precise end-point detection is difficult. Therefore, this paper introduces the end-point detection methods which uses speech spectral entropy and harmonics. Experiment shows better performance than the conventional entropy methods.

A Land and Maritime Unified Tourism Information Guide System Based on Robust Speech Recognition in Ship Noise Environments (선박 잡음 환경에서의 강건한 음성 인식 기반 육해상 통합 관광 정보 안내 시스템)

  • Jeon, Kwang Myung;Lee, Jang Won;Park, Ji Hun;Lee, Seong Ro;Lee, Yeonwoo;Maeng, Se Young;Kim, Hong Kook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.2
    • /
    • pp.189-195
    • /
    • 2013
  • In this paper, a land and maritime unified tourism information guide system is proposed which employs robust speech recognition in ship noise environments. Most of conventional front-ends for speech recognition have used a Wiener filter to compensate for stationary noise such as car or babble noises. However, such the conventional front-ends have limitation in reducing non-stationary noise that are occurred inside the ship on voyage. To overcome such a limitation, the proposed system incorporates nonlinear multi-band spectral subtraction to provide highly accurate tourism route recognition. It is shown from the experiment that compared to a conventional system the proposed system achieves relative improvement of a tourism route recognition rate by 5.54% under a noise condition of 10 dB signal-to-noise ratio (SNR).

Real-Time Implementation of Acoustic Echo Canceller Using TMS320C6711 DSK

  • Heo, Won-Chul;Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.15 no.1
    • /
    • pp.75-83
    • /
    • 2008
  • The interior of an automobile is a very noisy environment with both stationary cruising noise and the reverberated music or speech coming out from the audio system. For robust speech recognition in a car environment, it is necessary to extract a driver's voice command well by removing those background noises. Since we can handle the music and speech signals from an audio system in a car, the reverberated music and speech sounds can be removed using an acoustic echo canceller. In this paper, we implement an acoustic echo canceller with robust double-talk detection algorithm using TMS-320C6711 DSK. First we developed the echo canceller on the PC for verifying the performance of echo cancellation, then implemented it on the TMS320C6711 DSK. For processing of one speech sample with 8kHz sampling rate and 256 filter taps of the echo canceller, the implemented system used only 0.035ms and achieved the ERLE of 20.73dB.

  • PDF

Accurate Speech Detection based on Sub-band Selection for Robust Keyword Recognition (강인한 핵심어 인식을 위해 유용한 주파수 대역을 이용한 음성 검출기)

  • Ji Mikyong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.183-186
    • /
    • 2002
  • The speech detection is one of the important problems in real-time speech recognition. The accurate detection of speech boundaries is crucial to the performance of speech recognizer. In this paper, we propose a speech detector based on Mel-band selection through training. In order to show the excellence of the proposed algorithm, we compare it with a conventional one, so called, EPD-VAA (EndPoint Detector based on Voice Activity Detection). The proposed speech detector is trained in order to better extract keyword speech than other speech. EPD-VAA usually works well in high SNR but it doesn't work well any more in low SNR. But the proposed algorithm pre-selects useful bands through keyword training and decides the speech boundary according to the energy level of the sub-bands that is previously selected. The experimental result shows that the proposed algorithm outperforms the EPD-VAA.

  • PDF