• Title/Summary/Keyword: speech signal

Search Result 1,174, Processing Time 0.022 seconds

Automatic speech recognition using acoustic doppler signal (초음파 도플러를 이용한 음성 인식)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.1
    • /
    • pp.74-82
    • /
    • 2016
  • In this paper, a new automatic speech recognition (ASR) was proposed where ultrasonic doppler signals were used, instead of conventional speech signals. The proposed method has the advantages over the conventional speech/non-speech-based ASR including robustness against acoustic noises and user comfortability associated with usage of the non-contact sensor. In the method proposed herein, 40 kHz ultrasonic signal was radiated toward to the mouth and the reflected ultrasonic signals were then received. Frequency shift caused by the doppler effects was used to implement ASR. The proposed method employed multi-channel ultrasonic signals acquired from the various locations, which is different from the previous method where single channel ultrasonic signal was employed. The PCA(Principal Component Analysis) coefficients were used as the features of ASR in which hidden markov model (HMM) with left-right model was adopted. To verify the feasibility of the proposed ASR, the speech recognition experiment was carried out the 60 Korean isolated words obtained from the six speakers. Moreover, the experiment results showed that the overall word recognition rates were comparable with the conventional speech-based ASR methods and the performance of the proposed method was superior to the conventional signal channel ASR method. Especially, the average recognition rate of 90 % was maintained under the noise environments.

Real-time implementation and performance evaluation of speech classifiers in speech analysis-synthesis

  • Kumar, Sandeep
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.82-94
    • /
    • 2021
  • In this work, six voiced/unvoiced speech classifiers based on the autocorrelation function (ACF), average magnitude difference function (AMDF), cepstrum, weighted ACF (WACF), zero crossing rate and energy of the signal (ZCR-E), and neural networks (NNs) have been simulated and implemented in real time using the TMS320C6713 DSP starter kit. These speech classifiers have been integrated into a linear-predictive-coding-based speech analysis-synthesis system and their performance has been compared in terms of the percentage of the voiced/unvoiced classification accuracy, speech quality, and computation time. The results of the percentage of the voiced/unvoiced classification accuracy and speech quality show that the NN-based speech classifier performs better than the ACF-, AMDF-, cepstrum-, WACF- and ZCR-E-based speech classifiers for both clean and noisy environments. The computation time results show that the AMDF-based speech classifier is computationally simple, and thus its computation time is less than that of other speech classifiers, while that of the NN-based speech classifier is greater compared with other classifiers.

A Study on Variation and Determination of Gaussian function Using SNR Criteria Function for Robust Speech Recognition (잡음에 강한 음성 인식에서 SNR 기준 함수를 사용한 가우시안 함수 변형 및 결정에 관한 연구)

  • 전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.112-117
    • /
    • 1999
  • In case of spectral subtraction for noise robust speech recognition system, this method often makes loss of speech signal. In this study, we propose a method that variation and determination of Gaussian function at semi-continuous HMM(Hidden Markov Model) is made on the basis of SNR criteria function, in which SNR means signal to noise ratio between estimation noise and subtracted signal per frame. For proving effectiveness of this method, we show the estimation error to be related with the magnitude of estimated noise through signal waveform. For this reason, Gaussian function is varied and determined by SNR. When we test recognition rate by computer simulation under the noise environment of driving car over the speed of 80㎞/h, the proposed Gaussian decision method by SNR turns out to get more improved recognition rate compared with the frequency subtracted and non-subtracted cases.

  • PDF

Implementation of G.726 ADPCM Dual Rate Speech Codec of 16Kbps and 40Kbps (16Kbps와 40Kbps의 Dual Rate G.726 ADPCM 음성 codec구현)

  • Kim Jae-Oh;Han Kyong-Ho
    • Journal of IKEEE
    • /
    • v.2 no.2 s.3
    • /
    • pp.233-238
    • /
    • 1998
  • In this paper, the implementation of dual rate ADPCM using G.726 16Kbps and 40Kbps speech codec algorithm is handled. For small signals, the low rate 16Kbps coding algorithm shows almost the same SNR as the high rate 40Kbps coding algorithm , while the high rate 40Kbps coding algorithm shows the higher SNR than the low rate 16Kbps coding algorithm fur large signal. To obtain the good trade-off between the data rate and synthesized speech quality, we applied low rate 16Kbps for the small signal and high rate 40Kbps for the large signal. Various threshold values determining the rate are applied for good trade-off between data rate and speech quality. The simulation result shows the good speech quality at a low rate comparing with 16Kbps & 40Kbps.

  • PDF

Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information (음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할)

  • 박창목;왕지남
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.24-30
    • /
    • 2001
  • This article is concerned with automatic segmentation for Korean speech signals. All kinds of transition cases of phonetic units are classified into 3 types and different strategies for each type are applied. The type 1 is the discrimination of silence, voiced-speech and unvoiced-speech. The histogram analysis of each indicators which consists of wavelet coefficients and SVF (Spectral Variation Function) in wavelet coefficients are used for type 1 segmentation. The type 2 is the discrimination of adjacent vowels. The vowel transition cases can be characterized by spectrogram. Given phonetic transcription and transition pattern spectrogram, the speech signal, having consecutive vowels, are automatically segmented by the template matching. The type 3 is the discrimination of vowel and voiced-consonants. The smoothed short-time RMS energy of Wavelet low pass component and SVF in cepstral coefficients are adopted for type 3 segmentation. The experiment is performed for 342 words utterance set. The speech data are gathered from 6 speakers. The result shows the validity of the method.

  • PDF

Robust Speech Enhancement Using HMM and $H_\infty$ Filter (HMM과 $H_\infty$필터를 이용한 강인한 음성 향상)

  • 이기용;김준일
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.7
    • /
    • pp.540-547
    • /
    • 2004
  • Since speech enhancement algorithms based on Kalman/Wiener filter require a priori knowledge of the noise and have focused on the minimization of the variance of the estimation error between clean and estimated speech signal, small estimation error on the noise statistics may lead to large estimation error. However, H/sub ∞/ filter does not require any assumptions and a priori knowledge of the noise statistics, but searches the best estimated signal among the entire estimated signal by applying least upper bound, consequently it is more robust to the variation of noise statistics than Kalman/Wiener filter. In this paper, we Propose a speech enhancement method using HMM and multi H/sub ∞/ filters. First, HMM parameters are estimated with the training data. Secondly, speech is filtered with multiple number of H/sub ∞/ filters. Finally, the estimation of clean speech is obtained from the sum of the weighted filtered outputs. Experimental results shows about 1dB∼2dB SNR improvement with a slight increment of computation compared with the Kalman filter method.

Preliminary study of Korean Electro-palatography (EPG) for Articulation Treatment of Persons with Communication Disorders (의사소통장애인의 조음치료를 위한 한국형 전자구개도의 구현)

  • Woo, Seong Tak;Park, Young Bin;Oh, Da Hee;Ha, Ji-wan
    • Journal of Sensor Science and Technology
    • /
    • v.28 no.5
    • /
    • pp.299-304
    • /
    • 2019
  • Recently, the development of rehabilitation medical technology has resulted in an increased interest in speech therapy equipment. In particular, research on articulation therapy for communication disorders is being actively conducted. Existing methods for the diagnosis and treatment of speech disorders have many limitations, such as traditional tactile perception tests and methods based on empirical judgment of speech therapists. Moreover, the position and tension of the tongue are key factors of speech disorders with regards to articulation. This is a very important factor in the distinction of Korean characters such as lax, fortis, and aspirated consonants. In this study, we proposed a Korean electropalatography (EPG) system to easily measure and monitor the position and tension of the tongue in articulation treatment and diagnosis. In the proposed EPG system, a sensor was fabricated using an AgCl electrode and biocompatible silicon. Furthermore, the measured signal was analyzed by implementing the bio-signal processing module and monitoring program. In particular, the bio-signal was measured by inserting it into the palatal from an experimental control group. As a result, it was confirmed that it could be applied to clinical treatment in speech therapy.

Preemphasis of Speech Signals in the Estimation of Time Difference of Arrival with Two Microphones (마이크로폰 쌍을 이용한 음원의 도달시간차이 추정에서 음성신호의 프리엠퍼시스 영향 분석)

  • Kwon Hongseok;Kim Siho;Bae Keunsung
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.35-38
    • /
    • 2004
  • In this paper, we investigate and analyze the problems encountered in frame-based estimation of TDOA(Time Difference of Arrival) using CPSP function. Spectral leakage occurring in framing of a speech signal by a rectangular window makes estimation of CPSP spectrum inaccurate. Framing with a Hamming window to reduce the spectral leakage effect distorts the signal due to the different weighting at temporally same sample, which make the TDOA estimation using CPSP function inaccurate. In this paper, we solve this problem by reducing the dynamic range of the spectrum of a speech signal with preemphasis. Experimental results confirm that the framing of pre-emphasized microphone output with a rectangular window shows higher success ratio of TDOA estimation than any other framing methods.

  • PDF

Image Data Compression Using Laplacian Pyramid Processing and Vector Quantization (라플라시안 피라미드 프로세싱과 백터 양자화 방법을 이용한 영상 데이타 압축)

  • Park, G.H.;Cha, I.H.;Youn, D.H.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1347-1351
    • /
    • 1987
  • This thesis aims at studying laplacian pyramid vector quantization which keeps a simple compression algorithm and stability against various kinds of image data. To this end, images are devied into two groups according to their statistical characteristics. At 0.860 bits/pixel and 0.360 bits/pixel respectively, laplacian pyramid vector quantization is compared to the existing spatial domain vector quantization and transform coding under the same condition in both objective and subjective value. The laplacian pyramid vector quantization is much more stable against the statistical characteristics of images than the existing vector quantization and transform coding.

  • PDF

A Study on the Extraction of the Excitation Pattern for Auditory Prothesis (청각 보철을 위한 자극패턴 추출에 관한 연구)

  • Park, Sang-Hui;Yoon, Tae-Sung;Lee, Jae-Hyuk;Beack, Seunt-Hwa
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1322-1325
    • /
    • 1987
  • In this study, the excitation pattern, which can be sensated by a man having hearing loss due to the damage of inner ear, is extracted, and the procedure of the auditory speech signal processing is simulated with the computer. Therefore, the excitation pattern is extracted by the neural tuning model satisfying the physiological characteristic of the inner ear and by the infor.ation extracted from speech signal. The firing pattern is also extracted by inputting this excitation pattern to the auditory neural model. With this extracted firing pattern, the possibility that the patient can sensate the speech signal is studied by the computer simulation.

  • PDF