• Title/Summary/Keyword: speech quality evaluation

Search Result 178, Processing Time 0.162 seconds

PESQ-Based Selection of Efficient Partial Encryption Set for Compressed Speech

  • Yang, Hae-Yong;Lee, Kyung-Hoon;Lee, Sang-Han;Ko, Sung-Jea
    • ETRI Journal
    • /
    • v.31 no.4
    • /
    • pp.408-418
    • /
    • 2009
  • Adopting an encryption function in voice over Wi-Fi service incurs problems such as additional power consumption and degradation of communication quality. To overcome these problems, a partial encryption (PE) algorithm for compressed speech was recently introduced. However, from the security point of view, the partial encryption sets (PESs) of the conventional PE algorithm still have much room for improvement. This paper proposes a new selection method for finding a smaller PES while maintaining the security level of encrypted speech. The proposed PES selection method employs the perceptual evaluation of the speech quality (PESQ) algorithm to objectively measure the distortion of speech. The proposed method is applied to the ITU-T G.729 speech codec, and content protection capability is verified by a range of tests and a reconstruction attack. The experimental results show that encrypting only 20% of the compressed bitstream is sufficient to effectively hide the entire content of speech.

Change in acoustic characteristics of voice quality and speech fluency with aging (노화에 따른 음질과 구어 유창성의 음향학적 특성 변화)

  • Hee-June Park;Jin Park
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.45-51
    • /
    • 2023
  • Voice issues such as voice weakness that arise with age can have social and emotional impacts, potentially leading to feelings of isolation and depression. This study aimed to investigate the changes in acoustic characteristics resulting from aging, focusing on voice quality and spoken fluency. To this end, tasks involving sustained vowel phonation and paragraph reading were recorded for 20 elderly and 20 young participants. Voice-quality-related variables, including F0, jitter, shimmer, and Cepstral Peak Prominence (CPP) values, were analyzed along with speech-fluency-related variables, such as average syllable duration (ASD), articulation rate (AR), and speech rate (SR). The results showed that in voice quality-related measurements, F0 was higher for the elderly and voice quality was diminished, as indicated by increased jitter, shimmer, and lower CPP levels. Speech fluency analysis also demonstrated that the elderly spoke more slowly, as indicated by all ASD, AR, and SR measurements. Correlation analysis between voice quality and speech fluency showed a significant relationship between shimmer and CPP values and between ASD and SR values. This suggests that changes in spoken fluency can be identified early by measuring the variations in voice quality. This study further highlights the reciprocal relationship between voice quality and spoken fluency, emphasizing that deterioration in one can affect the other.

Evaluation Performance of Speech Coder in Speech Signal Processing

  • Lee, Kwang-Seok
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.2
    • /
    • pp.177-180
    • /
    • 2007
  • We compared CS-ACELP with QCELP speech coder in CDMA cellular under channel error environment and experimented performance with its measured value under channel error environment. Also, we specified the effective coding scheme to overcome. CS-ACELP speech coder using a LSP vector quantizer shows transparent speech quality from the results that SD is 0.92dB and outlier frames over 2dB is 2.9% in the BER 0.10% condition. CS-ACELP speech coder which is utilizing MA predictor shows better results on SVR and SEGSNR than QCELP speech coder(IS-96) adopting DPCM type predictor when bit error occurs from BER 0.01% to 0.50%.

Two-Microphone Generalized Sidelobe Canceller with Post-Filter Based Speech Enhancement in Composite Noise

  • Park, Jinsoo;Kim, Wooil;Han, David K.;Ko, Hanseok
    • ETRI Journal
    • /
    • v.38 no.2
    • /
    • pp.366-375
    • /
    • 2016
  • This paper describes an algorithm to suppress composite noise in a two-microphone speech enhancement system for robust hands-free speech communication. The proposed algorithm has four stages. The first stage estimates the power spectral density of the residual stationary noise, which is based on the detection of nonstationary signal-dominant time-frequency bins (TFBs) at the generalized sidelobe canceller output. Second, speech-dominant TFBs are identified among the previously detected nonstationary signal-dominant TFBs, and power spectral densities of speech and residual nonstationary noise are estimated. In the final stage, the bin-wise output signal-to-noise ratio is obtained with these power estimates and a Wiener post-filter is constructed to attenuate the residual noise. Compared to the conventional beamforming and post-filter algorithms, the proposed speech enhancement algorithm shows significant performance improvement in terms of perceptual evaluation of speech quality.

A Study of Decision Tree Modeling for Predicting the Prosody of Corpus-based Korean Text-To-Speech Synthesis (한국어 음성합성기의 운율 예측을 위한 의사결정트리 모델에 관한 연구)

  • Kang, Sun-Mee;Kwon, Oh-Il
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.91-103
    • /
    • 2007
  • The purpose of this paper is to develop a model enabling to predict the prosody of Korean text-to-speech synthesis using the CART and SKES algorithms. CART prefers a prediction variable in many instances. Therefore, a partition method by F-Test was applied to CART which had reduced the number of instances by grouping phonemes. Furthermore, the quality of the text-to-speech synthesis was evaluated after applying the SKES algorithm to the same data size. For the evaluation, MOS tests were performed on 30 men and women in their twenties. Results showed that the synthesized speech was improved in a more clear and natural manner by applying the SKES algorithm.

  • PDF

Voice Quality of Dysarthric Speakers in Connected Speech (연결발화에서 마비말화자의 음질 특성)

  • Seo, Inhyo;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.5 no.4
    • /
    • pp.33-41
    • /
    • 2013
  • This study investigated the perceptual and cepstral/spectral characteristics of phonation and their relationships in dysarthria in connected speech. Twenty-two participants were divided into two groups; the eleven dysarthric speakers were paired with matching age and gender healthy control participants. A perceptual evaluation was performed by three speech pathologists using the GRBAS scale to measure the cepstrual/spectral characteristics of phonation between the two groups' connected speech. Correlations showed dysarthric speakers scored significantly worse (with a higher rating) with severities in G (overall dysphonia grade), B (breathiness), and S (strain), while the smoothed prominence of the cepstral peak (CPPs) was significantly lower. The CPPs were significantly correlated with the perceptual ratings, including G, B, and S. The utility of CPPs is supported by its high relationship with perceptually rated dysphonia severity in dysarthric speakers. The receiver operating characteristic (ROC) analysis showed that the threshold of 5.08 dB for the CPPs achieved a good classification for dysarthria, with 63.6% sensitivity and the perfect specificity (100%). Those results indicate the CPPs reliably distinguished between healthy controls and dysarthric speakers. However, the CPP frequency (CPP F0) and low-high spectral ratio (L/H ratio) were not significantly different between the two groups.

A Packet Loss Concealment Algorithm Robust to Burst Packet Losses for G.729 (연속적인 프레임 손실에 강인한 G.729 프레임 손실 은닉 알고리즘)

  • Cho, Choong-Sang;Lee, Young-Han;Kim, Hong-Kook
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.307-310
    • /
    • 2007
  • In this paper, a packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed to improve the quality of decoded speech under a burst packet loss condition. The proposed algorithm is based on the recovery of voiced excitation using an estimate of the voicing probability and the generation of random excitation by permutating the previously decoded excitation. The voicing probability is estimated from the correlation using the previous correctly decoded excitation and pitch. The proposed algorithm is implemented as a PLC algorithm for G.729 and its performance is compared with PLC employed in G.729 by means of perceptual evaluation of speech quality (PESQ) and an A-B preference test under the random and burst packet losses with rates of 3% and 5%. It is shown that the proposed algorithm provides better speech quality than the PLC of G.729, especially under burst pack losses.

  • PDF

A Nonlinear Regression Analysis Method for Frame Erasure Concealment in VoIP Networks (VoIP 망에서의 프레임손실은닉을 위한 비선형 회귀분석 기법)

  • Choi, Seung-Ho;Sung, Ho-Sang
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.5
    • /
    • pp.129-132
    • /
    • 2009
  • Frame erasure is one of the most difficult problems in voice over IP (VoIP) networks and is a major source of speech quality degradation. In this paper, a frame erasure concealment algorithm based on nonlinear regression analysis is presented to minimize speech quality deterioration in code-excited linear prediction (CELP) based coders. We applied the proposed scheme to the ITU-T G.729 standard and obtained improved perceptual evaluation of speech quality (PESQ) scores compared to the conventional methods.

  • PDF

Multiple Average Ratings of Auditory Perceptual Analysis for Dysphonia

  • Choi, Seong-Hee;Choi, Hong-Shik
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.165-170
    • /
    • 2009
  • This study was to investigate for comparison between single rating and average ratings from multiple presentations of the same stimulus for measuring the voice quality of dysphonia using 7-point equal-appearing interval (EAI) rating scale. Overall severity of voice quality for 46 /a/ vowel stimuli (23 stimuli from dysphonia, 23 stimuli from control) was rated by 3 experienced speech-language pathologists (averaged 19 years; range = 7 to 40 years). For average ratings, each stimulus was rated five times in random order and averaged from two to five times. Although higher inter-rater reliability was found in average ratings than in single rating, there were no significant differences in rating scores between single and multiple average ratings judged by experienced listeners, suggesting that auditory perceptual ratings judged by well-trained listeners have relatively good agreement with the same stimulus across the judgment. Larger variations in perceptual ratings were observed for moderate voices than for mild or severe voices, even in the average ratings.

  • PDF

New filter design to replace the post and perceptual weighting filter of transcoder and performance evaluation (상호부호화기의 후처리 필터와 인지가중 필터를 대신하는 새로운 필터 설계 및 성능 평가)

  • 최진규;윤성완;강홍구;윤대희
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2232-2235
    • /
    • 2003
  • In speech communication systems where two different speech codecs are interoperated, transcoding algorithm is a good approach because of its low complexity and improved synthesized speech quality. This paper proposes an efficient method to further improve the performance of transcoding algorithms as well as to reduce the complexity. In the conventional transcoding algorithms. a post-filter and a perceptual weighting filter should be operated sequentially because both decoding and encoding processes are needed. This results in the redundancy of the processing in terms of complexity and perceptual quality. Using the fact that their filter structures are similar, we replaced the two filters with one. The proposed algorithm requires 72.8% lower complexity than the conventional transcoding algorithm when we compare only the complexity of the filtering processes. The results of both objective and subjective tests verify that the proposed algorithm has slightly better quality than the conventional one.

  • PDF