• Title/Summary/Keyword: Voice Signal process

Search Result 50, Processing Time 0.03 seconds

A Study on Realization of Speech Recognition System based on VoiceXML for Railroad Reservation Service (철도예약서비스를 위한 VoiceXML 기반의 음성인식 구현에 관한 연구)

  • Kim, Beom-Seung;Kim, Soon-Hyob
    • Journal of the Korean Society for Railway
    • /
    • v.14 no.2
    • /
    • pp.130-136
    • /
    • 2011
  • This paper suggests realization method for real-time speech recognition using VoiceXML in telephony environment based on SIP for Railroad Reservation Service. In this method, voice signal incoming through PSTN or Internet is treated as dialog using VoiceXML and the transferred voice signal is processed by Speech Recognition System, and the output is returned to dialog of VoiceXML which is transferred to users. VASR system is constituted of dialog server which processes dialog, APP server for processing voice signal, and Speech Recognition System to process speech recognition. This realizes transfer method to Speech Recognition System in which voice signal is recorded using Record Tag function of VoiceXML to process voice signal in telephony environment and it is played in real time.

Extraction of voice signal embedded in 1/f noise using wavelet

  • Toyama, Naoki;Sasaya, Takashi;Akizuki, Kageo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.564-567
    • /
    • 1997
  • This paper deals with the problem of extraction of voice signal embedded in 1/f noise. We propose the extraction method using wavelet. This method is based on Wornell's modelling which can construct 1/f process in terms of uncorrelated variables and is well suited on treating 1/f process. Finally, we show further describe our method through simulation.

  • PDF

Voice conversion using low dimensional vector mapping (낮은 차원의 벡터 변환을 통한 음성 변환)

  • Lee, Kee-Seung;Doh, Won;Youn, Dae-Hee
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.4
    • /
    • pp.118-127
    • /
    • 1998
  • In this paper, we propose a voice personality transformation method which makes one person's voice sound like another person's voice. In order to transform the voice personality, vocal tract transfer function is used as a transformation parameter. Comparing with previous methods, the proposed method can obtain high-quality transformed speech with low computational complexity. Conversion between the vocal tract transfer functions is implemented by a linear mapping based on soft clustering. In this process, mean LPC cepstrum coefficients and mean removed LPC cepstrum modeled by the low dimensional vector are used as transformation parameters. To evaluate the performance of the proposed method, mapping rules are generated from 61 Korean words uttered by two male and one female speakers. These rules are then applied to 9 sentences uttered by the same persons, and objective evaluation and subjective listening tests for the transformed speech are performed.

  • PDF

Voice Activity Detection Algorithm using Fuzzy Membership Shifted C-means Clustering in Low SNR Environment (낮은 신호 대 잡음비 환경에서의 퍼지 소속도 천이 C-means 클러스터링을 이용한 음성구간 검출 알고리즘)

  • Lee, G.H.;Lee, Y.J.;Cho, J.H.;Kim, M.N.
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.3
    • /
    • pp.312-323
    • /
    • 2014
  • Voice activity detection is very important process that find voice activity from noisy speech signal for noise cancelling and speech enhancement. Over the past few years, many studies have been made on voice activity detection, it has poor performance for speech signal of sentence form in a low SNR environment. In this paper, it proposed new voice activity detection algorithm that has beginning VAD process using entropy and main VAD process using fuzzy membership shifted c-means clustering. We conduct an experiment in various SNR environment of white noise to evaluate performance of the proposed algorithm and confirmed good performance of the proposed algorithm.

A Study on Stable Motion Control of Humanoid Robot with 24 Joints Based on Voice Command

  • Lee, Woo-Song;Kim, Min-Seong;Bae, Ho-Young;Jung, Yang-Keun;Jung, Young-Hwa;Shin, Gi-Soo;Park, In-Man;Han, Sung-Hyun
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.21 no.1
    • /
    • pp.17-27
    • /
    • 2018
  • We propose a new approach to control a biped robot motion based on iterative learning of voice command for the implementation of smart factory. The real-time processing of speech signal is very important for high-speed and precise automatic voice recognition technology. Recently, voice recognition is being used for intelligent robot control, artificial life, wireless communication and IoT application. In order to extract valuable information from the speech signal, make decisions on the process, and obtain results, the data needs to be manipulated and analyzed. Basic method used for extracting the features of the voice signal is to find the Mel frequency cepstral coefficients. Mel-frequency cepstral coefficients are the coefficients that collectively represent the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The reliability of voice command to control of the biped robot's motion is illustrated by computer simulation and experiment for biped walking robot with 24 joint.

A Fast Motion Estimation Algorithm using Adaptive Search According to Importance of Search Ranges (탐색영역의 중요도에 따라 적응적인 탐색을 이용한 고속 움직임 예측 알고리즘)

  • Kim, Tae Hwan;Kim, Jong Nam;Jeong, Shin Il
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.4
    • /
    • pp.437-442
    • /
    • 2015
  • Voice activity detection is very important process that voice activity separated form noisy speech signal for speech enhance. Over the past few years, many studies have been made on voice activity detection, but it has poor performance in low signal to noise ratio environment or fickle noise such as car noise. In this paper, it proposed new voice activity detection algorithm using ensemble variance based on wavelet band entropy and soft thresholding method. We conduct a survey in a lot of signal to noise ratio environment of car noise to evaluate performance of the proposed algorithm and confirmed performance of the proposed algorithm.

Voice Activity Detection Algorithm using Wavelet Band Entropy Ensemble Analysis in Car Noisy Environments (자동차 잡음 환경에서 웨이브렛 밴드 엔트로피 앙상블 분석을 이용한 음성구간 검출 알고리즘)

  • Lee, G.H.;Lee, Y.J.;Kim, M.N.
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.9
    • /
    • pp.1005-1017
    • /
    • 2013
  • Voice activity detection is very important process that voice activity separated form noisy speech signal for speech enhance. Over the past few years, many studies have been made on voice activity detection, but it has poor performance in low signal to noise ratio environment or fickle noise such as car noise. In this paper, it proposed new voice activity detection algorithm using ensemble variance based on wavelet band entropy and soft thresholding method. We conduct a survey in a lot of signal to noise ratio environment of car noise to evaluate performance of the proposed algorithm and confirmed performance of the proposed algorithm.

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator

  • Park, Hyung Woo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.962-977
    • /
    • 2019
  • The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.

Voice Recognition Performance Improvement using a convergence of Voice Energy Distribution Process and Parameter (음성 에너지 분포 처리와 에너지 파라미터를 융합한 음성 인식 성능 향상)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.313-318
    • /
    • 2015
  • A traditional speech enhancement methods distort the sound spectrum generated according to estimation of the remaining noise, or invalid noise is a problem of lowering the speech recognition performance. In this paper, we propose a speech detection method that convergence the sound energy distribution process and sound energy parameters. The proposed method was used to receive properties reduce the influence of noise to maximize voice energy. In addition, the smaller value from the feature parameters of the speech signal The log energy features of the interval having a more of the log energy value relative to the region having a large energy similar to the log energy feature of the size of the voice signal containing the noise which reducing the mismatch of the training and the recognition environment recognition experiments Results confirmed that the improved recognition performance are checked compared to the conventional method. Car noise environment of Pause Hit Rate is in the 0dB and 5dB lower SNR region showed an accuracy of 97.1% and 97.3% in the high SNR region 10dB and 15dB 98.3%, showed an accuracy of 98.6%.

A Study on Intelligent Control of Mobile Robot for Human-Robot Cooperative Operation in Manufacturing Process (인간-로봇 상호협력작업을 위한 모바일로봇의 지능제어에 관한 연구)

  • Kim, DuBeum;Bae, HoYoung;Kim, SangHyun;Im, ODeuk;Back, Young-Tae;Han, SungHyun
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.22 no.2
    • /
    • pp.137-146
    • /
    • 2019
  • This study proposed a new technique to control of mobile robot based on voice command for (Human-Robot Cooperative operation in manufacturing precess). High performance voice recognition and control system was designed In this paper for smart factory. robust voice recognition is essential for a robot to communicate with people. One of the main problems with voice recognition robots is that robots inevitably effects real environment including with noises. The noise is captured with strong power by the microphones, because the noise sources are closed to the microphones. The signal-to-noise ratio of input voice becomes quite low. However, it is possible to estimate the noise by using information on the robot's own motions and postures, because a type of motion/gesture produces almost the same pattern of noise every time it is performed. In this paper, we describe an robust voice recognition system which can robustly recognize voice by adults and students in noisy environments. It is illustrated by experiments the voice recognition performance of mobile robot placed in a real noisy environment.