• Title/Summary/Keyword: Speech quality

Search Result 802, Processing Time 0.031 seconds

Objective Speech Quality Measurement for Naturalness Assessment (자연성 평가를 위한 객관적 음질 평가 방법)

  • 장경아;이희원;송종회;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2001.06a
    • /
    • pp.361-364
    • /
    • 2001
  • Speech quality measurement is sorted subjective and objective speech quality measurements based on the mathematics representation. Between two, subjective speech quality measurement is able to be evaluated more accurate speech quality then objective one but it has a demerit such as taking much more time and cost to performance it. However, using objective speech quality measurement being able to predict the result of evaluating the subjective speech quality compliments the demerit of subjective standards which is mentioned former. In this paper, we propose the objective speech quality measurement in order to evaluate naturalness assessment approximately like subjective speech quality measurement. We measured naturalness of speech with estimating speaking rate, variation of pitch and energy.

  • PDF

Speech Quality Measure in a Mobile Communication System Using PLP Cepstral Distance with CMS (심리 음향 켑스트럼 평균 차감법을 이용한 이동 전화망에서의 음질 평가)

  • Yun, J.J.;Park, S.W.;Park, Y.C.;Youn, D.H.;Cha, I.H.
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.163-179
    • /
    • 1999
  • For the set up, management and repair of a mobile communication system, continuous estimation of speech quality is required. Speech quality measurement can be conducted by listener's judgement in a subjective test such as MOS (Mean Opinion Score) test. However, this method is laborious, expensive and time-consuming, it is advisable to predict subjective speech quality via objective measures. This paper presents a robust objective speech quality measure, PLP-CMS (Perceptual Linear Predictive-Cepstral Mean Subtraction), which can predict subjective speech quality in mobile communication systems. PLP-CMS has a high correlation with subjective quality owing to PLP (Perceptual Linear Predictive) analysis and shows a robust performance not being influenced by PSTN (Public Switched Telephone Network) channel effects due to CMS (Cepstral Mean Subtraction). To prove the performance of our proposed algorithm, we carried out subjective and objective quality estimation on speech samples which are variously distorted in a real mobile communication system. As a result, we demonstrated that PLP-CMS has a higher correlation with subjective quality than PSQM (Perceptual Speech Quality Measure) and PLP-CD (Perceptual Linear Predictive-Cepstral Distance).

  • PDF

A Speech Enhancement Algorithm based on Human Psychoacoustic Property (심리음향 특성을 이용한 음성 향상 알고리즘)

  • Jeon, Yu-Yong;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.6
    • /
    • pp.1120-1125
    • /
    • 2010
  • In the speech system, for example hearing aid as well as speech communication, speech quality is degraded by environmental noise. In this study, to enhance the speech quality which is degraded by environmental speech, we proposed an algorithm to reduce the noise and reinforce the speech. The minima controlled recursive averaging (MCRA) algorithm is used to estimate the noise spectrum and spectral weighting factor is used to reduce the noise. And partial masking effect which is one of the human hearing properties is introduced to reinforce the speech. Then we compared the waveform, spectrogram, Perceptual Evaluation of Speech Quality (PESQ) and segmental Signal to Noise Ratio (segSNR) between original speech, noisy speech, noise reduced speech and enhanced speech by proposed method. As a result, enhanced speech by proposed method is reinforced in high frequency which is degraded by noise, and PESQ, segSNR is enhanced. It means that the speech quality is enhanced.

Speech Quality of a Sinusoidal Model Depending on the Number of Sinusoids

  • Seo, Jeong-Wook;Kim, Ki-Hong;Seok, Jong-Won;Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.17-29
    • /
    • 2000
  • The STC(Sinusoidal Transform Coding) is a vocoding technique that uses a sinusoidal speech model to obtain high- quality speech at low data rate. It models and synthesizes the speech signal with fundamental frequency and its harmonic elements in frequency domain. To reduce the data rate, it is necessary to represent the sinusoidal amplitudes and phases with as small number of peaks as possible while maintaining the speech quality. As a basic research to develop a low-rate speech coding algorithm using the sinusoidal model, in this paper, we investigate the speech quality depending on the number of sinusoids. By varying the number of spectral peaks from 5 to 40 speech signals are reconstructed, and then their qualities are evaluated using spectral envelope distortion measure and MOS(Mean Opinion Score). Two approaches are used to obtain the spectral peaks: one is a conventional STFT (Short-Time Fourier Transform), and the other is a multiresolutional analysis method.

  • PDF

Qualitative Classification of Voice Quality of Normal Speech and Derivation of its Correlation with Speech Features (정상 음성의 목소리 특성의 정성적 분류와 음성 특징과의 상관관계 도출)

  • Kim, Jungin;Kwon, Chulhong
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.71-76
    • /
    • 2014
  • In this paper voice quality of normal speech is qualitatively classified by five components of breathy, creaky, rough, nasal, and thin/thick voice. To determine whether a correlation exists between a subjective measure of voice and an objective measure of voice, each voice is perceptually evaluated using the 1/2/3 scale by speech processing specialists and acoustically analyzed using speech analysis tools such as the Praat, MDVP, and VoiceSauce. The speech parameters include features related to speech source and vocal tract filter. Statistical analysis uses a two-independent-samples non-parametric test. Experimental results show that statistical analysis identified a significant correlation between the speech feature parameters and the components of voice quality.

Correlation Analysis of PESQ and MOS Evaluation for HMM-based Synthetic Korean Speech (HMM 기반의 한국어 합성음에 대한 PESQ 및 MOS 평가의 상관도 분석)

  • Lin, Cang-Song;Bae, Keun-Sung
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.71-75
    • /
    • 2010
  • The PESQ is an objective speech quality evaluation measure that is known to have a high correlation with a subjective speech quality measure such as MOS. To examine whether it could be useful as an objective quality measure of synthetic speech, we carried out both subjective evaluation tests with MOS and DMOS and an objective evaluation test with PESQ for HMM-based Korean synthetic speech signals and analyzed the correlation between them. Experimental results have shown that the PESQ has correlations of 0.87 with MOS and 0.92 with DMOS. It means that the PESQ holds much promise for evaluating the quality of synthetic Korean speech.

  • PDF

Voice Quality Criteria for Heterogenous Network Communication Under Mobile-VoIP Environments

  • Choi, Jae-Hun;Seol, Soon-Uk;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3E
    • /
    • pp.99-108
    • /
    • 2009
  • In this paper, we suggest criteria for objective measurement of speech quality in mobile VoIP (Voice over Internet Protocol) services over wireless mobile internet such as mobile WiMAX networks. This is the case that voice communication service is available under other networks. When mobile VoIP service users in the mobile internet network based on packet call up PSTN and mobile network users, but there have not been relevant quality indexes and quality standards for evaluating speech quality of mobile VoIP. In addition, there are many factors influencing on the speech quality in packet network. Especially, if the degraded speech with packet loss transfers to the other network users through the handover, voice communication quality is significantly deteriorated by the transformation of speech codecs. In this paper, we eventually adopt the Gilbert-Elliot channel model to characterize packet network and assess the voice quality through the objective speech quality method of ITU-T P. 862. 1 MOS-LQO for the various call scenario from mobile VoIP service user to PSTN and mobile network users under various packet loss rates in the transmission channel environments. Our simulation results show that transformation of speech codecs results in the degraded speech quality for different transmission channel environments when mobile VoIP service users call up PSTN and mobile network users.

The effect of voice quality on speech intelligibility in children with spastic cerebral palsy (경직형 뇌성마비 아동의 음질이 말명료도에 미치는 영향)

  • Jeong, Pil Yeon;Sim, Hyun Sub
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.129-136
    • /
    • 2017
  • This study investigates the effect of voice quality on speech intelligibility and the relationship between voice quality and intelligibility for children with spastic CP. We recruited 36 children with spastic CP (mean age 10.43 year, 17 girls, 19 boys, spastic type 34, mixed 2) from a special school and a rehabilitation hospital. Voice samples for the perceptual analysis of voice quality were extracted from a sustained vowel /a/ and were rated on the GRBAS scales by two experienced speech language pathologists. Ten adult subjects with no hearing problems evaluated speech intelligibility for the 37 words listed in the Assessment of Phonology and Articulation for Children on a 7-point interval scale. The children with spastic CP were divided into three groups according to the rated G scores on the GRBAS scales (G1(n)=10, G2(n)=13, G3(n)=13). Analyses of ANCOVA and Pearson correlation showed that there was a significant difference in speech intelligibility among three groups. There was also a significant correlation in G scale (grade), A scale (asthenia), B scale (breathy) score, and speech intelligibility. These findings suggest that poor speech intelligibility of spastic CP might be related to asthenia and breathiness. Vocal intensity should be increased and vocal functioning should be improved for speech therapy to improve speech intelligibility of the children with spastic CP.

Speech Recognition Accuracy Prediction Using Speech Quality Measure (음성 특성 지표를 이용한 음성 인식 성능 예측)

  • Ji, Seung-eun;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.3
    • /
    • pp.471-476
    • /
    • 2016
  • This paper presents our study on speech recognition performance prediction. Our initial study shows that a combination of speech quality measures effectively improves correlation with Word Error Rate (WER) compared to each speech measure alone. In this paper we demonstrate a new combination of various types of speech quality measures shows more significantly improves correlation with WER compared to the speech measure combination of our initial study. In our study, SNR, PESQ, acoustic model score, and MFCC distance are used as the speech quality measures. This paper also presents our speech database verification system for speech recognition employing the speech measures. We develop a WER prediction system using Gaussian mixture model and the speech quality measures as a feature vector. The experimental results show the proposed system is highly effective at predicting WER in a low SNR condition of speech babble and car noise environments.

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

  • Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
    • MALSORI
    • /
    • no.54
    • /
    • pp.27-43
    • /
    • 2005
  • Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

  • PDF