• Title/Summary/Keyword: speech distortion

Search Result 227, Processing Time 0.024 seconds

Improvement of the Linear Predictive Coding with Windowed Autocorrelation (윈도우가 적용된 자기상관에 의한 선형예측부호의 개선)

  • Lee, Chang-Young;Lee, Chai-Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.2
    • /
    • pp.186-192
    • /
    • 2011
  • In this paper, we propose a new procedure for improvement of the linear predictive coding. To reduce the error power incurred by the coding, we interchanged the order of the two procedures of windowing on the signal and linear prediction. This scheme corresponds to LPC extraction with windowed autocorrelation. The proposed method requires more calculational time because it necessitates matrix inversion on more parameters than the conventional technique where an efficient Levinson-Durbin recursive procedure is applicable with smaller parameters. Experimental test over various speech phonemes showed, however, that our procedure yields about 5 % less power distortion compared to the conventional technique. Consequently, the proposed method in this paper is thought to be preferable to the conventional technique as far as the fidelity is concerned. In a separate study of speaker-dependent speech recognition test for 50 isolated words pronounced by 40 people, our approach yielded better performance too.

Noisy Speech Recognition Based on Spectral Mapping Techniques (스펙트럼사상기법을 기초로 한 잡음음성인식)

  • Lee, Ki-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1E
    • /
    • pp.39-45
    • /
    • 1995
  • This paper presents noisy speech recognition method based on spectral mapping techniques of speaker adaptation method. In the presented method, the spectral mapping training makes the spectral distortion of noisy speech reduced, and for the more correctively spectral mapping, let the adjustment window;s slope be adaptive to several word lengths. As a result of recognition experiment, the recognition rate is higher than that of the conventional method using VQ and DTW without noise processing. Even when SNR level is 0 dB, the recognition rate is 10 times more than that using the conventional method. It is confirmed that the speacker adaptation technique using the spectral mapping training has an ability to improve the recognition performance for noisy speech.

  • PDF

Korean Digit Recognition Under Noise Environment Using Spectral Mapping Training (스펙트럼사상학습을 이용한 잡음환경에서의 한국어숫자음인식)

  • Lee, Ki-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.3
    • /
    • pp.25-32
    • /
    • 1994
  • This paper presents the Korean digit recognition method under noise environment using the spectral mapping training based on static supervised adaptation algorithm. In the presented recognition method, as a result of spectral mapping from one space of noisy speech spectrum to another space of speech spectrum without noise, spectral distortion of noisy speech is improved, and the recognition rate is higher than that of the conventional method using VQ (vector quatization) and DTW(dynamic time warping) without noise processing, and even when SNR level is 0dB, the recognition rate is 10 times of that using the conventional method. It has been confirmed that the spectral mapping training has an ability to improve the recognition performance for speech in noise environment.

  • PDF

On a Pitch Alteration Technique in the V/UV Spectrum for High Quality Speech Synthesis Technique (고음질 합성방식용 V/UV 스펙트럼상의 피치변경법에 관한 연구)

  • Jo, Wang-Rae;Bae, Myung-Jin;Kim, Dong-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.6
    • /
    • pp.99-103
    • /
    • 1996
  • Most waveform coding techniques attempt to reduce redundancy of speech signal while preserving the shape of the waveform. In speech synthesis, wavefrom coding methods are used to the synthesis by rule for high quality speech. However, it is difficult to apply the waveform coding to the synthesis by rule because the parameters of the wavefrom coding cannot be classified as either the excitation or the vocal tract parameters. The proposed method shows little spectrum distortion of 2.7% or less for 50% pitch changes. It also achieves smooth connection of wavefrom magnitudes among the frames by compensating the phase in time domain.

  • PDF

Enhanced Spectral Hole Substitution for Improving Speech Quality in Low Bit-Rate Audio Coding

  • Lee, Chang-Heon;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3E
    • /
    • pp.131-139
    • /
    • 2010
  • This paper proposes a novel spectral hole substitution technique for low bit-rate audio coding. The spectral holes frequently occurring in relatively weak energy bands due to zero bit quantization result in severe quality degradation, especially for harmonic signals such as speech vowels. The enhanced aacPlus (EAAC) audio codec artificially adjusts the minimum signal-to-mask ratio (SMR) to reduce the number of spectral holes, but it still produces noisy sound. The proposed method selectively predicts the spectral shapes of hole bands using either intra-band correlation, i.e. harmonically related coefficients nearby or inter-band correlation, i.e. previous frames. For the bands that have low prediction gain, only the energy term is quantized and spectral shapes are replaced by pseudo random values in the decoding stage. To minimize perceptual distortion caused by spectral mismatching, the criterion of the just noticeable level difference (JNLD) and spectral similarity between original and predicted shapes are adopted for quantizing the energy term. Simulation results show that the proposed method implemented into the EAAC baseline coder significantly improves speech quality at low bit-rates while keeping equivalent quality for mixed and music contents.

Design of a Quantization Algorithm of the Speech Feature Parameters for the Distributed Speech Recognition (분산 음성 인식 시스템을 위한 특징 계수 양자화 방식 설계)

  • Lee Joonseok;Yoon Byungsik;Kang Sangwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4
    • /
    • pp.217-223
    • /
    • 2005
  • In this paper, we propose a predictive block constrained trellis coded quantization (BC-TCQ) to quantize cepstral coefficients for the distributed speech recognition. For Prediction of the cepstral coefficients. the 1st order auto-regressive (AR) predictor is used. To quantize the prediction error signal effectively. we use a BC-TCQ. The performance is compared to the split vector quantizers used in the ETSI standard, demonstrating reduction in the cepstral distance and computational complexity.

Isolated-Word Speech Recognition in Telephone Environment Using Perceptual Auditory Characteristic (인지적 청각 특성을 이용한 고립 단어 전화 음성 인식)

  • Choi, Hyung-Ki;Park, Ki-Young;Kim, Chong-Kyo
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.2
    • /
    • pp.60-65
    • /
    • 2002
  • In this paper, we propose GFCC(gammatone filter frequency cepstrum coefficient) parameter which was based on the auditory characteristic for accomplishing better speech recognition rate. And it is performed the experiment of speech recognition for isolated word acquired from telephone network. For the purpose of comparing GFCC parameter with other parameter, the experiment of speech recognition are carried out using MFCC and LPCC parameter. Also, for each parameter, we are implemented CMS(cepstral mean subtraction)which was applied or not in order to compensate channel distortion in telephone network. Accordingly, we found that the recognition rate using GFCC parameter is better than other parameter in the experimental result.

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Using Voice Activity Detector(VAD) (음성활동영역검색을 사용하는 유색잡음에 오염된 음성의 향상을 위한 일반화 부공간 접근)

  • Son, Kyung-Sik;Kim, Hyun-Tae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.8
    • /
    • pp.1769-1776
    • /
    • 2013
  • In this paper, we proposed the modified YL(Yi and Loizou) algorithm, using a VAD(voice activity detector) for enhancing speech corrupted by colored noise. The performance of the proposed algorithm has been compared to the YL algorithm and LS(Lee and Son, etc.) algorithm by computer simulation. The colored noises used in the experiment were a car noise and multi-talker babble from the AURORA data base and the used voices from the TIMIT data base. It is confirmed that the proposed algorithm shows better performance from SNR(signal to noise ratio) and SSD(speech spectral distortion) viewpoint over the previous two approach.

A Study on Development of a Hearing Impairment Simulator considering Frequency Selectivity and Asymmetrical Auditory Filter of the Hearing Impaired (난청인의 주파수 선택도와 비대칭적 청각 필터를 고려한 난청 시뮬레이터 개발에 관한 연구)

  • Joo, Sang-Ick;Kang, Hyun-Deok;Song, Young-Rok;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.4
    • /
    • pp.831-840
    • /
    • 2010
  • In this paper, we propose a hearing impairment simulator considering reduced frequency selectivity and asymmetrical auditory filter of the hearing impaired, and we verified the reduced frequency selectivity and asymmetrical auditory filter affected in speech perception through experiments. The reduced frequency selectivity has made embodied by spectral smearing using LPC(linear prediction coding). The shapes of auditory filter are asymmetrical different with each center frequency. Hearing impaired person which has hearing loss was differently changed with that of normal hearing people and it has different value for speech of quality through auditory filter. The experiments confirmed subjective test and objective test. The subjective experiments are composed of 4 kinds of tests: pure tone test, SRT(speech reception threshold) test, and WRS(word recognition score) test without spectral smearing, and WRS test with spectral smearing. The experiment of the hearing impairment simulator was performed from 9 subjects who have normal ears. The amount of spectral smearing was controlled by LPC order. The asymmetrical auditory filter of proposed hearing impairment simulator was simulated and then some tests to estimate the filter's performance objectively were performed. The objective experiment as simulated auditory filter's performance evaluation method used PESQ(perceptual evaluation of speech quality) and LLR(log likelihood ratio) for speech through auditory filter. The processed speech was evaluated objective speech quality and distortion using PESQ and LLR value. When hearing loss processed, PESQ and LLR value have big difference according to asymmetrical auditory filter in hearing impairment simulator.

A Comparative Study of Speaker Adaptation Methods for HMM-Based Speech Recognition (HMM 음성인식 시스템을 위한 화자적응 방법들의 성능비교)

  • Koo, Myoung-Wan;Un, Chong-Kwan;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.3
    • /
    • pp.37-43
    • /
    • 1991
  • In this paper, we compare the performances of speaker adaptation which consist of two stages of processing for an HMM-based speech recognition system. We compare three kinds of VQ adaptation methods which may be used in the first stage to reduce the distortion error for a new speaker : label prototype adaptation, adaptation with a codebook from adaptation speech itself, and adaptation with a mapped codebook. We then compare the performance of four kinds of HMM parameter adaptation methods which may be used in the second stage to transform HMM parameters for a new speaker : adaptation by the Viterbi algorithm, that by the DTW algorithm, that by the iterative alignment algorithm. The results show that adaptation based on the fuzzy histogram algorithm yields the highest accuracy in an HMM-based speech recognition system.

  • PDF