• Title/Summary/Keyword: Speech quality estimation

Search Result 53, Processing Time 0.021 seconds

Speech Quality Measure in a Mobile Communication System Using PLP Cepstral Distance with CMS (심리 음향 켑스트럼 평균 차감법을 이용한 이동 전화망에서의 음질 평가)

  • Yun, J.J.;Park, S.W.;Park, Y.C.;Youn, D.H.;Cha, I.H.
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.163-179
    • /
    • 1999
  • For the set up, management and repair of a mobile communication system, continuous estimation of speech quality is required. Speech quality measurement can be conducted by listener's judgement in a subjective test such as MOS (Mean Opinion Score) test. However, this method is laborious, expensive and time-consuming, it is advisable to predict subjective speech quality via objective measures. This paper presents a robust objective speech quality measure, PLP-CMS (Perceptual Linear Predictive-Cepstral Mean Subtraction), which can predict subjective speech quality in mobile communication systems. PLP-CMS has a high correlation with subjective quality owing to PLP (Perceptual Linear Predictive) analysis and shows a robust performance not being influenced by PSTN (Public Switched Telephone Network) channel effects due to CMS (Cepstral Mean Subtraction). To prove the performance of our proposed algorithm, we carried out subjective and objective quality estimation on speech samples which are variously distorted in a real mobile communication system. As a result, we demonstrated that PLP-CMS has a higher correlation with subjective quality than PSQM (Perceptual Speech Quality Measure) and PLP-CD (Perceptual Linear Predictive-Cepstral Distance).

  • PDF

Optimum MVF Estimation-Based Two-Band Excitation for HMM-Based Speech Synthesis

  • Han, Seung-Ho;Jeong, Sang-Bae;Hahn, Min-Soo
    • ETRI Journal
    • /
    • v.31 no.4
    • /
    • pp.457-459
    • /
    • 2009
  • The optimum maximum voiced frequency (MVF) estimation-based two-band excitation for hidden Markov model-based speech synthesis is presented. An analysis-by-synthesis scheme is adopted for the MVF estimation which leads to the minimum spectral distortion of synthesized speech. Experimental results show that the proposed method significantly improves synthetic speech quality.

Non-Intrusive Speech Quality Estimation of G.729 Codec using a Packet Loss Effect Model (G.729 코덱의 패킷 손실 영향 모델을 이용한 비 침입적 음질 예측 기법)

  • Lee, Min-Ki;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.157-166
    • /
    • 2013
  • This paper proposes a non-intrusive speech quality estimation method considering the effects of packet loss to perceptual quality. Packet loss is a major reason of quality degradation in a packet based speech communications network, whose effects are different according to the input speech characteristics or the performance of the embedded packet loss concealment (PLC) algorithm. For the quality estimation system that involves packet loss effects, we first observe the packet loss of G.729 codec which is one of narrowband codec in VoIP system. In order to quantify the lost packet affects, we design a classification algorithm only using speech parameters of G.729 decoder. Then, the degradation values of each class are iteratively selected that maximizes the correlation with the degradation PESQ-LQ scores, and total quality degradation is modeled by the weighted sum. From analyzing the correlation measures, we obtained correlation values of 0.8950 for the intrusive model and 0.8911 for the non-intrusive method.

Robust speech quality enhancement method against background noise and packet loss at voice-over-IP receiver (배경잡음 및 패킷손실에 강인한 voice-over-IP 수신단 기반 음질향상 기법)

  • Kim, Gee Yeun;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.512-517
    • /
    • 2018
  • Improving voice quality is a major concern in telecommunications. In this paper, we propose a robust speech quality enhancement against background noise and packet loss at VoIP (Voice-over-IP) receiver. The proposed method combines network jitter estimation based on hybrid Markov chain, adaptive playout scheduling using the estimated jitter, and speech enhancement based on restoration of amplitude and phase to enhance the quality of the speech signal arriving at the VoIP receiver over IP network. The experimental results show that the proposed method removes the background noise added to the speech signal before encoding at the sender side and provides the enhanced speech quality in an unstable network environment.

A Selection Method of Reliable Codevectors using Noise Estimation Algorithm (잡음 추정 알고리즘을 이용한 신뢰성 있는 코드벡터 조합의 선정 방법)

  • Jung, Seungmo;Kim, Moo Young
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.7
    • /
    • pp.119-124
    • /
    • 2015
  • Speech enhancement has been required as a preprocessor for a noise robust speech recognition system. Codebook-based Speech Enhancement (CBSE) is highly robust in nonstationary noise environments compared with conventional noise estimation algorithms. However, its performance is severely degraded for the codevector combinations that have lower correlation with the input signal since CBSE depends on the trained codebook information. To overcome this problem, only the reliable codevector combinations are selected to be used to remove the codevector combinations that have lower correlation with input signal. The proposed method produces the improved performance compared to the conventional CBSE in terms of Log-Spectral Distortion (LSD) and Perceptual Evaluation of Speech Quality (PESQ).

Minima Controlled Speech Presence Uncertainty Tracking Method for Speech Enhancement (음성 향상을 위한 최소값 제어 음성 존재 부정확성의 추적기법)

  • Lee, Woo-Jung;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.668-673
    • /
    • 2009
  • In this paper, we propose the minima controlled speech presence uncertainty tracking method to improve a speech enhancement. In the conventional tracking speech presence uncertainty, we propose a method for estimating distinct values of the a priori speech absence probability for different frames and channels. This estimation is inherently based on a posteriori SNR and used in estimating the speech absence probability (SAP). In this paper, we propose a novel estimation of distinct values of the a priori speech absence probability, which is based on minima controlled speech presence uncertainty tracking method, for different frames and channels. Subsequently, estimation is applied to the calculation of speech absence probability for speech enhancement. Performance of the proposed enhancement algorithm is evaluated by ITU-T P. 862 perceptual evaluation of speech quality (PESQ) under various noise environments. We show that the proposed algorithm yields better results compared to the conventional tracking speech presence uncertainty.

Multi-Pulse Amplitude and Location Estimation by Maximum-Likelihood Estimation in MPE-LPC Speech Synthesis (MPE-LPC음성합성에서 Maximum- Likelihood Estimation에 의한 Multi-Pulse의 크기와 위치 추정)

  • 이기용;최홍섭;안수길
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.9
    • /
    • pp.1436-1443
    • /
    • 1989
  • In this paper, we propose a maximum-likelihood estimation(MLE) method to obtain the location and the amplitude of the pulses in MPE( multi-pulse excitation)-LPC speech synthesis using multi-pulses as excitation source. This MLE method computes the value maximizing the likelihood function with respect to unknown parameters(amplitude and position of the pulses) for the observed data sequence. Thus in the case of overlapped pulses, the method is equivalent to Ozawa's crosscorrelation method, resulting in equal amount of computation and sound quality with the cross-correlation method. We show by computer simulation: the multi-pulses obtained by MLE method are(1) pseudo-periodic in pitch in the case of voicde sound, (2) the pulses are random for unvoiced sound, (3) the pulses change from random to periodic in the interval where the original speech signal changes from unvoiced to voiced. Short time power specta of original speech and syunthesized speech obtained by using multi-pulses as excitation source are quite similar to each other at the formants.

  • PDF

Harmonic Peak Picking-based MVF Estimation for Improvement of HMM-based Speech Synthesis System Using TBE Model (TBE 모델을 사용하는 HMM 기반 음성합성기 성능 향상을 위한 하모닉 선택에 기반한 MVF 예측 방법)

  • Park, Jihoon;Hahn, Minsoo
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.79-86
    • /
    • 2012
  • In the two-band excitation (TBE) model, maximum voiced frequency (MVF) is the most important feature of the excitation parameter because the synthetic speech quality depends on MVF. Thus, this paper proposes an enhanced MVF estimation scheme based on the peak picking method. In the proposed scheme, the local peak and the peak lobe are picked from the spectrum of a linear predictive residual signal. The normalized distance between neighboring peak lobes is calculated and utilized as a feature to estimate MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves synthetic speech quality compared with that of the conventional one.

Speech Quality Estimation Algorithm using a Harmonic Modeling of Reverberant Signals (반향 음성 신호의 하모닉 모델링을 이용한 음질 예측 알고리즘)

  • Yang, Jae-Mo;Kang, Hong-Goo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.6
    • /
    • pp.919-926
    • /
    • 2013
  • The acoustic signal from a distance sound source in an enclosed space often produces reverberant sound that varies depending on room impulse response. The estimation of the level of reverberation or the quality of the observed signal is important because it provides valuable information on the condition of system operating environment. It is also useful for designing a dereverberation system. This paper proposes a speech quality estimation method based on the harmonicity of received signal, a unique characteristic of voiced speech. At first, we show that the harmonic signal modeling to a reverberant signal is reasonable. Then, the ratio between the harmonically modeled signal and the estimated non-harmonic signal is used as a measure of standard room acoustical parameter, which is related to speech clarity. Experimental results show that the proposed method successfully estimates speech quality when the reverberation time varies from 0.2s to 1.0s. Finally, we confirm the superiority of the proposed method in both background noise and reverberant environments.

An Improved Speech Absence Probability Estimation based on Environmental Noise Classification (환경잡음분류 기반의 향상된 음성부재확률 추정)

  • Son, Young-Ho;Park, Yun-Sik;An, Hong-Sub;Lee, Sang-Min
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.7
    • /
    • pp.383-389
    • /
    • 2011
  • In this paper, we propose a improved speech absence probability estimation algorithm by applying environmental noise classification for speech enhancement. The previous speech absence probability required to seek a priori probability of speech absence was derived by applying microphone input signal and the noise signal based on the estimated value of a posteriori SNR threshold. In this paper, the proposed algorithm estimates the speech absence probability using noise classification algorithm which is based on Gaussian mixture model in order to apply the optimal parameter each noise types, unlike the conventional fixed threshold and smoothing parameter. Performance of the proposed enhancement algorithm is evaluated by ITU-T P.862 PESQ (perceptual evaluation of speech quality) and composite measure under various noise environments. It is verified that the proposed algorithm yields better results compared to the conventional speech absence probability estimation algorithm.