• Title/Summary/Keyword: Speech quality measure

Search Result 55, Processing Time 0.026 seconds

Speech Quality of a Sinusoidal Model Depending on the Number of Sinusoids

  • Seo, Jeong-Wook;Kim, Ki-Hong;Seok, Jong-Won;Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.17-29
    • /
    • 2000
  • The STC(Sinusoidal Transform Coding) is a vocoding technique that uses a sinusoidal speech model to obtain high- quality speech at low data rate. It models and synthesizes the speech signal with fundamental frequency and its harmonic elements in frequency domain. To reduce the data rate, it is necessary to represent the sinusoidal amplitudes and phases with as small number of peaks as possible while maintaining the speech quality. As a basic research to develop a low-rate speech coding algorithm using the sinusoidal model, in this paper, we investigate the speech quality depending on the number of sinusoids. By varying the number of spectral peaks from 5 to 40 speech signals are reconstructed, and then their qualities are evaluated using spectral envelope distortion measure and MOS(Mean Opinion Score). Two approaches are used to obtain the spectral peaks: one is a conventional STFT (Short-Time Fourier Transform), and the other is a multiresolutional analysis method.

  • PDF

Critical Banded Wavelet Packet-Based Spectral Subtractions for Speech Enhancement (음성신호개선을 위한 임계대역 웨이블렛 패킷 기반의 스펙트럼 차감법)

  • Chang, Sung-Wook;Yang, Sung-Il
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.4E
    • /
    • pp.125-133
    • /
    • 2004
  • In this paper, we propose a critical banded wavelet packet-based spectral subtraction for speech enhancement. Critical banded wavelet packet, which reflects the human auditory system, may lead to minimization of intelligibility loss and quality improvement of the enhanced speech in the spectral domain, when combined with an appropriate spectral subtraction gain function. The proposed method shows better performance than the conventional one in comparative assessments. We also show that, for effective evaluation of enhanced speech, it is essential to consider the characteristics of speech quality measures.

Speech Enhancement Based on Improved Minima Controlled Recursive Averaging Incorporating GSAP (전역 음성 부재 확률 기반의 향상된 최소값 제어 재귀평균기법을 이용한 음성 향상 기법)

  • Song, Ji-Hyun;Bang, Dong-Hyeouck;Lee, Sang-Min
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.1
    • /
    • pp.104-111
    • /
    • 2012
  • In this paper, we propose a novel method to improve the performance of the improved minima controlled recursive averaging (IMCRA). From an examination for various noise environment, it is shown that the IMCRA has a fundamental drawback for the noise power estimate at the offset region of continuity speech signals. Espectially, it is difficult to obtain the robust estimates of the noise power in non-stationary noisy environments that is rapidly changed the spectral characteristics such as babble noise. To overcome the drawback, we apply the global speech absence probability (GSAP) conditioned on both a priori SNR and a posteriori SNR to the speech detection algorithm of IMCRA. With the performance criteria of the ITU-T P.862 perceptual evaluation of speech quality (PESQ) and a composite measure test, we show that the proposed algorithm yields better results compared to the conventional IMCRA-based scheme under various noise environments. In particular, in the case of babble 5 dB, the proposed method produced a remarkable improvement compared to the IMCRA ( PESQ = 0.026, composite measure = 0.029 ).

A STUDY ON THE SPEECH SYNTHESIS-BY-RULE SYSTEM APPLIED MULTIBAND EXCITATION SIGNAL

  • Kyung, Younjeong;Kim, Geesoon;Lee, Hwangsoo;Lee, Yanghee
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1098-1103
    • /
    • 1994
  • In this paper, we design and implement the Korean speech synthesis by rule system. This system is applied the multiband excitation signal on voiced sounds. The multiband excitation signal is obtained by mixing impluse spectrum and which noise spectrum. We find that the quality of synthesized speech is improved using this application. Also, we classify the voiced sounds by cepstral euclidian distance measure for reducing overhead memory. The representative excitation signal of the same group's voiced sounds is used as excitation signal on synthesis. This method does not affect the quality of synthesized speech. As the result of experiment, this method eliminates the "buzziness" of synthesized speech and reduces the spectral distortion of synthesized speech.ed speech.

  • PDF

A Study on the Pitch Alteration Technique by Subband Scaling in Speech Signal (서브밴드 스케일링에 의한 음성신호의 피치변경법에 관한 연구)

  • Kim, Young-Kyu;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.137-147
    • /
    • 2003
  • Speech synthesis can classify by synthesis way, that is waveform coding, source coding and mixture coding. Specially, waveform coding is suitable for high quality synthesis. However, it is not desirable by synthesis techniques of syllable or phoneme unit because it do not separate and handles excitation and formant part. Therefore, there is a need for pitch alteration method applied in synthesis by the rule in waveform coding. This study propose about pitch alteration method that use spectrum scaling after do to flatten spectra by subband linear approximation to minimize spectrum distortion. This paper show evaluation whether show excellency of some measure compared with LPC, Cepstrum, lifter function and method that propose. estimation method seeks distribution of each flattened signal and measured degree of flattened spectra Signal flattened is normalized, So that highest point amounts to zero, and distribution of signal ,whose average is zero, is calculated. this show result that measure the spectrum distortion rate to estimate performance of method that propose. The average spectrum distortion rate was kept below the average 2.12%, so the method that propose is superiors than existent method.

  • PDF

An Improved Speech Absence Probability Estimation based on Environmental Noise Classification (환경잡음분류 기반의 향상된 음성부재확률 추정)

  • Son, Young-Ho;Park, Yun-Sik;An, Hong-Sub;Lee, Sang-Min
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.7
    • /
    • pp.383-389
    • /
    • 2011
  • In this paper, we propose a improved speech absence probability estimation algorithm by applying environmental noise classification for speech enhancement. The previous speech absence probability required to seek a priori probability of speech absence was derived by applying microphone input signal and the noise signal based on the estimated value of a posteriori SNR threshold. In this paper, the proposed algorithm estimates the speech absence probability using noise classification algorithm which is based on Gaussian mixture model in order to apply the optimal parameter each noise types, unlike the conventional fixed threshold and smoothing parameter. Performance of the proposed enhancement algorithm is evaluated by ITU-T P.862 PESQ (perceptual evaluation of speech quality) and composite measure under various noise environments. It is verified that the proposed algorithm yields better results compared to the conventional speech absence probability estimation algorithm.

A Study on Objective Quality Assessment of Synthesized Speech by Rule (규칙 합성음의 객관적 품질평가에 관한 연구)

  • 홍진우
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.67-72
    • /
    • 1991
  • This paper evaluates thequality of synthesized speech by rule using the LPC CD in the objective measure and then compares the result with the subjective analysis. By evaluating the quality of synthesized speech by rule objectively. We have tried to resolve the problems (Evaluation time or size expansion, variables within the analysis results) that arise when the evaluation is done subjectively. Also by comparing intelligibility-the index for the subjective quality evaluation of synthesized speech by rule-with evaluation results obtained using MOS and the objective evaluation. We have proved the validity of the objective analysis and thus provides a guide that would be useful when R&D and marketing of synthesis by rule method is done.

  • PDF

Speech Quality Measure for VoIP Using Wavelet Based Bark Coherence Function (웨이블렛 기반 바크 코히어런스 함수를 이용한 VoIP 음질평가)

  • 박상욱;박영철;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.4A
    • /
    • pp.310-315
    • /
    • 2002
  • The Bark Coherence Function (BCF) defies a coherence function within perceptual domain as a new cognition module, robust to linear distortions due to the analog interface of digital mobile system. Our previous experiments have shown the superiority of BCF over current measures. In this paper, a new BCF suitable for VoIP is developed. The unproved BCF is based on the wavelet series expansion that provides good frequency resolution while keeping good time locality. The proposed Wavelet based Bark Coherence function (WBCF) is robust to variable delay often observed in packet-based telephony such as Voice over Internet Protocol (VoIP). We also show that the refinement of time synchronization after signal decomposition can improve the performance of the WBCF. The regression analysis was performed with VoIP speech data. The correlation coefficients and the standard error of estimates computed using the WBCF showed noticeable improvement over the Perceptual Speech Quality Measure (PSQM) that is recommended by ITU-T.

PESQ-Based Selection of Efficient Partial Encryption Set for Compressed Speech

  • Yang, Hae-Yong;Lee, Kyung-Hoon;Lee, Sang-Han;Ko, Sung-Jea
    • ETRI Journal
    • /
    • v.31 no.4
    • /
    • pp.408-418
    • /
    • 2009
  • Adopting an encryption function in voice over Wi-Fi service incurs problems such as additional power consumption and degradation of communication quality. To overcome these problems, a partial encryption (PE) algorithm for compressed speech was recently introduced. However, from the security point of view, the partial encryption sets (PESs) of the conventional PE algorithm still have much room for improvement. This paper proposes a new selection method for finding a smaller PES while maintaining the security level of encrypted speech. The proposed PES selection method employs the perceptual evaluation of the speech quality (PESQ) algorithm to objectively measure the distortion of speech. The proposed method is applied to the ITU-T G.729 speech codec, and content protection capability is verified by a range of tests and a reconstruction attack. The experimental results show that encrypting only 20% of the compressed bitstream is sufficient to effectively hide the entire content of speech.

Phonation Type Index k (발성유형지수 k)

  • Park Hansang
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.77-80
    • /
    • 2002
  • This study proposes phonation type index k as a descriptor of the overall spectral tilt, which is free from the effects of fundamental frequency and vowel quality. The newly proposed phonation type index k presents a simple and single measure of the overall spectral tilt. Phonation type index k can be applied to speech technology. It can also be used in diagnosing patients voice qualities in speech pathology. The distribution of phonation type index k, which is speaker-dependent, may be useful in forensic phonetics and voice recognition as an indicator of speaker identity.

  • PDF