• Title/Summary/Keyword: Speech Processing

Search Result 960, Processing Time 0.037 seconds

Design and Implementation of a Text-to Speech System using the Prosody and Duration Information (운율 및 길이 정보를 이용한 무제한 음성 합성기의 설계 및 구현)

  • Yang, Jin-Seok;Kim, Jae-Beom;Lee, Jeong-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1121-1129
    • /
    • 1996
  • To produce more natural speech in a Text-to-Speech system, the processing of the prosody and duration must be processing in advance, and then extracted the prosody and duration information by means of trial-and-error experiments. In this paper, a method is proposed to improve the naturalness in a Text-to Speech system using this information. As the results, the Text-to-Speech system proposed and implemented in this paper showed more natural speech synthesis than the systems, which do not use this information, did.

  • PDF

Performance Evaluation of Speech Onset Representation Characteristic of Cochlear Implants Speech Processor using Spike Train Decoding (Spike Train Decoding에 기반한 인공와우 어음처리기의 음성시작점 정보 전달특성 평가)

  • Kim, Doo-Hee;Kim, Jin-Ho;Kim, Kyung-Hwan
    • Journal of Biomedical Engineering Research
    • /
    • v.28 no.5
    • /
    • pp.694-702
    • /
    • 2007
  • The adaptation effect originating from the chemical synapse between auditory nerve and inner hair cell gives advantage in accurate representation of temporal cues of incoming speech such as speech onset. Thus it is expected that the modification of conventional speech processing strategies of cochlear implant(CI) by incorporating the adaptation effect will result in considerable improvement of speech perception performance such as consonant perception score. Our purpose in this paper was to evaluate our new CI speech processing strategy incorporating the adaptation effect by the observation of auditory nerve responses. By classifying the presence or absence of speech from the auditory nerve responses, i. e. spike trains, we could quantitatively compare speech onset detection performances of conventional and improved strategies. We could verify the effectiveness of the adaptation effect in improving the speech onset representation characteristics.

Neural Network Approaches and Trends for Speech Recognition (음성 인식을 위한 신경회로망 접근과 동향)

  • 김순협
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.33-41
    • /
    • 1995
  • We proposed the approach method of neural network for signal processing, especially speech signal processing and reviewed the algorithms for several neural networks which are used for many alppication field in speech processing. Finally, investigated the trends in neural network method through 3 conference jounal and the ASK jounal in 1994.

  • PDF

ETRI small-sized dialog style TTS system (ETRI 소용량 대화체 음성합성시스템)

  • Kim, Jong-Jin;Kim, Jeong-Se;Kim, Sang-Hun;Park, Jun;Lee, Yun-Keun;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.217-220
    • /
    • 2007
  • This study outlines a small-sized dialog style ETRI Korean TTS system which applies a HMM based speech synthesis techniques. In order to build the VoiceFont, dialog-style 500 sentences were used in training HMM. And the context information about phonemes, syllables, words, phrases and sentence were extracted fully automatically to build context-dependent HMM. In training the acoustic model, acoustic features such as Mel-cepstrums, logF0 and its delta, delta-delta were used. The size of the VoiceFont which was built through the training is 0.93Mb. The developed HMM-based TTS system were installed on the ARM720T processor which operates 60MHz clocks/second. To reduce computation time, the MLSA inverse filtering module is implemented with Assembly language. The speed of the fully implemented system is the 1.73 times faster than real time.

  • PDF

A Simulation Study on Improvements of Speech Processing Strategy of Cochlear Implants Using Adaptation Effect of Inner Hair Cell and Auditory Nerve Synapse (청각신경 시냅스의 적응 효과를 이용한 인공와우 어음처리 알고리즘의 개선에 대한 시뮬레이션 연구)

  • Kim, Jin-Ho;Kim, Kyung-Hwan
    • Journal of Biomedical Engineering Research
    • /
    • v.28 no.2
    • /
    • pp.205-211
    • /
    • 2007
  • A novel envelope extraction algorithm for speech processor of cochlear implants, called adaptation algorithm, was developed which is based on a adaptation effect of the inner hair cell(IHC)/auditory nerve(AN) synapse. We achieved acoustic simulation and hearing experiments with 12 normal hearing persons to compare this adaptation algorithm with existent standard envelope extraction method. The results shows that speech processing strategy using adaptation algorithm showed significant improvements in speech recognition rate under most channel/noise condition, compared to conventional strategy We verified that the proposed adaptation algorithm may yield better speech perception under considerable amount of noise, compared to the conventional speech processing strategy.

Simulation of speech processing and coding strategy for cochlear implants (인공 청각 장치의 음성신호 처리와 자극방법의 시뮬레이션)

  • Kim, Young-Hoon;Park, Kwang-Suk
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1991 no.11
    • /
    • pp.30-33
    • /
    • 1991
  • The object of speech processor for cochlear implants is to deliver speech information to the central nerve system. In this study we have presented the method which simulate speech processing and coding strategy for cochlear implants and simulated two different processing methods to the 12 adults with normal ears. The formant sinusoidal coding was better than the formant pulse coding In the consonant perception test and learning effects.(p < 0.05)

  • PDF

A Noise Robust Speech Recognition Method Using Model Compensation Based on Speech Enhancement (음성 개선 기반의 모델 보상 기법을 이용한 강인한 잡음 음성 인식)

  • Shen, Guang-Hu;Jung, Ho-Youl;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.4
    • /
    • pp.191-199
    • /
    • 2008
  • In this paper, we propose a MWF-PMC noise processing method which enhances the input speech by using Mel-warped Wiener Filtering (MWF) at pre-processing stage and compensates the recognition model by using PMC (Parallel Model Combination) at post-processing stage for speech recognition in noisy environments. The PMC uses the residual noise extracted from the silence region of enhanced speech at pre-processing stage to compensate the clean speech model and thus this method is considered to improve the performance of speech recognition in noisy environments. For recognition experiments we dew.-sampled KLE PBW (Phoneme Balanced Words) 452 word speech data to 8kHz and made 5 different SNR levels of noisy speech, i.e., 0dB. 5dB, 10dB, 15dB and 20dB, by adding Subway, Car and Exhibition noise to clean speech. From the recognition results, we could confirm the effectiveness of the proposed MWF-PMC method by obtaining the improved recognition performances over all compared with the existing combined methods.

Selecting Good Speech Features for Recognition

  • Lee, Young-Jik;Hwang, Kyu-Woong
    • ETRI Journal
    • /
    • v.18 no.1
    • /
    • pp.29-41
    • /
    • 1996
  • This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, mel-cepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the class-related information and as small amount of the class-irrelevant variation as possible. In this paper, we suggest a method to measure the class-related information and the amount of the class-irrelevant variation based on the Shannon's information theory. Using this method, we compare the mel-scaled FFT, cepstrum, mel-cepstrum, and wavelet features of the TIMIT speech data. The result shows that, among these features, the mel-scaled FFT is the best feature for speech recognition based on the proposed measure.

  • PDF

A new approach technique on Speech-to-Speech Translation (신호의 복원된 위상 공간을 이용한 오디오 상황 인지)

  • Le, Thanh Hien;Lee, Sung-young;Lee, Young-Koo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.239-240
    • /
    • 2009
  • We live in a flat world in which globalization fosters communication, travel, and trade among more than 150 countries and thousands of languages. To surmount the barriers among these languages, translation is required; Speech-to-Speech translation will automate the process. Thanks to recent advances in Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), one can now utilize a system to translate a speech of source language to a speech of target language and vice versa in affordable manner. The three phase process establishes that the source speech be transcribed into a (set of) text of the source language (ASR) before the source text is translated into the target text (MT). Finally, the target speech is synthesized from the target text (TTS).

Performance Evaluation of Cochlear Implants Speech Processing Strategy Using Neural Spike Train Decoding (Neural Spike Train Decoding에 기반한 인공와우 어음처리방식 성능평가)

  • Kim, Doo-Hee;Kim, Jin-Ho;Kim, Kyung-Hwan
    • Journal of Biomedical Engineering Research
    • /
    • v.28 no.2
    • /
    • pp.271-279
    • /
    • 2007
  • We suggest a novel method for the evaluation of cochlear implant (CI) speech processing strategy based on neural spike train decoding. From formant trajectories of input speech and auditory nerve responses responding to the electrical pulse trains generated from a specific CI speech processing strategy, optimal linear decoding filter was obtained, and used to estimate formant trajectory of incoming speech. Performance of a specific strategy is evaluated by comparing true and estimated formant trajectories. We compared a newly-developed strategy rooted from a closer mimicking of auditory periphery using nonlinear time-varying filter, with a conventional linear-filter-based strategy. It was shown that the formant trajectories could be estimated more exactly in the case of the nonlinear time-varying strategy. The superiority was more prominent when background noise level is high, and the spectral characteristic of the background noise was close to that of speech signals. This confirms the superiority observed from other evaluation methods, such as acoustic simulation and spectral analysis.