• Title/Summary/Keyword: speech distortion

Search Result 227, Processing Time 0.028 seconds

A LSF Quantizer for the Wideband Speech Using the Predictive VQ-Pyramid VQ (예측 VQ-Pyramid VQ를 이용한 광대역 음성용 LSF 양자학기 설계)

  • 이강은;이인성;강상원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.4
    • /
    • pp.333-339
    • /
    • 2004
  • This Paper proposes the vector quantizer-pyramid vector quantizer(VQ-PVQ) structure. Also both predictive structure and safety-net concept are combined into the VQ-PVQ to quantize the IPC parameter of wideband speech codec. The Performance is compared to the LPC vector quantizer used in the AMR-WB(ITU-T G.722.2). demonstrating reduction in both spectral distortion and encoding memory.

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

  • Jung, Ho-Young
    • ETRI Journal
    • /
    • v.29 no.3
    • /
    • pp.300-304
    • /
    • 2007
  • This paper proposes a new data-driven method for high-pass approaches, which suppresses slow-varying noise components. Conventional high-pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information-maximization theory for each utterance. This is performed on an utterance-by-utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated-word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel-distorted speech recognition.

  • PDF

A Study on a Improvement of the Speech Quality by Spectrum Analysis with Variable Window in CELP Vocoder (가변 윈도우 스펙트럼 분석을 이용한 CELP 부호화기의 음질 향상에 관한 연구)

  • 나덕수;민소연;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.106-109
    • /
    • 2000
  • There have been proposed two types of low bit rate vocoder upto now : One is MBE type using the spectrum modeling and another is CELP type using the hybrid coding method. CELP type vocoder has mainly studied between them. Specially, much of intensity is concentrated in CELP vocoder due to the emergence of Internet Phone and PCS in a domestic. In order to improve the speech quality in CELP vocoder, in this paper, we proposed a new spectrum analysis algorithm with variable window, In CELP vocoder, the spectrum of the synthesised speech signal is distorted because the fixed size windows is used for spectrum analysis. So we have measured the spectral leakage and in order to minimize the spectral leakage have adjusted the window size. Applying this method G.723.1 ACELP, we can get SD(Spectral Distortion) reduction 0.084(dB), residual energy reduction 6.3% and MOS(Mean Opinion Score) improvement 0.1.

  • PDF

On A Pitch Alteration using the Waveform Symmetry with Time - Frequency Conversion (시간 - 주파수 변환에 의한 파형 대칭 피치변경법)

  • 박형빈
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.147-150
    • /
    • 1998
  • In the case of speech synthesis, the waveform coding method with high quality is mainly used to the synthesis by analysis. Because the parameters of this coding method are not classified as both excitation and vocal tract parameters, it is difficult to apply the waveform coding method to the synthesis by rule. Thus, in order to apply the waveform coding method to the synthesis by rule, a pitch alteration is required for the prosody control. In the speech synthesis method by the conventional PSOLA technique, applying symmetric window function to asymmetric speech waveform, it occurs the unbalance phenomenon of energy according to the overlapped degree of pitch interval adjustment. In this paper to overcome the unbalance phenomenon of energy, we proposed a new method that can convert asymmetric waveform to symmetric one by time-frequency conversion. As a result, we can obtain an average spectrum distortion ratio with 6.38% according to the pitch alteration ratio.

  • PDF

A New Variable Bit Rate Scheme for Waveform Interpolative Coders (파형보간 코더에서 파라미터간 거리차를 이용한 가변비트율 기법)

  • Yang, Hee-Sik;Jeong, Sang-Bae;Hahn, Min-Soo
    • MALSORI
    • /
    • no.65
    • /
    • pp.81-91
    • /
    • 2008
  • In this paper, we propose a new variable bit-rate speech coder based on the waveform interpolation concept. After the coder extracted all parameters, the amounts of the distortions between the current and the predicted parameters which are estimated by extrapolation using past two parameters are measured for all parameters. A parameter would not be transmitted unless the distortion exceeds the preset threshold. At the decoder side, the non-transmitted parameter is reconstructed by extrapolation with past two parameters used to synthesize signals. In this way, we can reduce 26% of the total bit rate while retaining the speech quality degradation below 0.1 PESQ score.

  • PDF

A Study on the Relation Between the LSF's and Spectral Distribution of Speech Signals (Line Spectral Frequency와 음성신호의 주파수 분포에 관한 연구)

  • 이동수;김영화
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.4
    • /
    • pp.430-436
    • /
    • 1988
  • LSF(Line Spectral Frequency) derived from LPC has known as a very useful transmission parameter of speech signals, for it has a good linear interpolation characteristics and a low spectrum distortion at low bit rates coding. This paper presents that it is possible to extract directly the formant frequencies of speech signals from LSF parameter without application of FFT algorithm by comparing the distribution of LSF parameter with the frequency distribution of analysis filter. This paper suggests the advanced algorithm that results in improving the speed of convergence at analytic solution method. Also, for the flexibility of parameters, the process that transforms from LSF to LPC is presented.

  • PDF

Representation of MFCC Feature Based on Linlog Function for Robust Speech Recognition (강인한 음성 인식을 위한 선형 로그 함수 기반의 MFCC 특징 표현 연구)

  • Yun, Young-Sun
    • MALSORI
    • /
    • no.59
    • /
    • pp.13-25
    • /
    • 2006
  • In previous study, the linlog(linear log) RASTA(J-RASTA) approach based on PLP was proposed to deal with both the channel effect and the additive noise. The extraction of PLP required generally more steps and computation than the extraction of widely used MFCC. Thus, in this paper, we apply the linlog function to the MFCC for investigating the possibility of simple compensation method that removes both distortion. With the experimental results, the proposed method shows the similar tendency to the linlog RASTA-PLP_ When the J value is set to le-6, the best ERR(Error Reduction Rate) of 33% is obtained. For applying the linlog function to the feature extraction process, the J value plays a very important role in compensating the corruption. Thus, the study for the adaptive J or noise dependent J estimation is further required.

  • PDF

An Experimental-Phonetic Study on V-CV Utterances by Korean Apraxia of Speech Patients (한국어 말실행증 환자의 V-CV 구조 발화에 관한 실험음성학적 연구)

  • Kim, Yoon-Ji;Jang, Tae-Yeoub
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.265-269
    • /
    • 2007
  • This paper reports an compared acoustic analysis on speech produced by two Korean groups, normal and AOS, focusing on utterances of V-CV structures. Major concerns include: 1) types of errors (distortion/substitution) according to the place of articulation, 2) duration of each syllable, 3) VOTs of stop sounds, and 4) F1 and F2 of vowels. In terms of the differences in these phonetic characteristics between the two groups, we aim to clarify some characteristics of AOS and to provide fundamental criteria for diagnosing and evaluating the disease.

  • PDF

On a Pitch Alteration Method Compensated with the Spectrum for High Quality Speech Synthesis (스펙트럼 보상된 고음질 합성용 피치 변경법)

  • 문효정
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.123-126
    • /
    • 1995
  • The waveform coding are concerned with simply preserving the wave shape of speech signal through a redundancy reduction process. In the case of speech synthesis, the wave form coding with high quality are mainly used to the synthesis by analysis. However, because the parameters of this coding are not classified as either excitation and vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In this paper, we proposed a new pitch alteration method that can change the pitch period in waveform coding by using scaling the time-axis and compensating the spectrum. This is a time-frequency domain method that is preserved in the phase components of the waveform and that has a little spectrum distortion with 2.5% and less for 50% pitch change.

  • PDF

Noise suppressor Using Psychoacoustic Model and Wavelet Packet Transform (심리음향 모델과 웨이블릿 패킷 변환을 이용한 잡음제거기)

  • Kim, Mi-Seon;Kim, Young-Ju;Lee, In-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.345-346
    • /
    • 2006
  • In this paper, we propose the noise suppressor with the psychoacoustic model and wavelet packet transform. The objective of the scheme is to enhance speech corrupted by colored or non-stationary noise. If corrupted noise is colored, subband approach would be more efficient than whole band one. To avoid serious residual noise and speech distortion, we must adjust the Wavelet Coefficient threshold. In this paper, the subband is designed matching with the critical band. And WCT is adapted by noise masking threshold(NMT) and segmental signal to noise ratio(seg_SNR). Consequently this work improve the PESQ-MOS about 0.23 in the case of coded speech.

  • PDF