• Title/Summary/Keyword: Speech quality measure

Search Result 55, Processing Time 0.027 seconds

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

AN ALGORITHM TO REDUCE THE PITCH SEARCHING TIME USING MODIFIED DELTA SEARCH IN CELP VOCODER (개선된 델타검색기법을 이용한 피치검색시간의 단축)

  • 이주헌
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.214-217
    • /
    • 1994
  • The major drawback in the Code Excited Linear Prediction type vocoders is their large computational requirements. In this paper, a simple method is proposed to reduce the pitch searching time in the pitch filter almost without degradation of quality. On the basis of the observational regularity of the correlation function of speech, only the limited numbers of pitch lags are considered to be an optimum pitch. This is done by skipping the negative envelope side of the correlation function and limiting the maximum number of lags to be considered preliminarily. By doing so, we can reduce the computational time of pitch searching more than 51% with negligible quality degradation. In addition to that, by combining that method with the conventional delta search technique, we can reduce the computational time requirements more than 60% without serious lowering the speech quality in segmental SNR measure compared to the conventional full search method.

  • PDF

VoIP Planning and Evaluation through the Analysis of Speech Transmission Quality Based on the E-Model (E-모델 기반 통화 품질 분석을 통한 VoIP Planning 및 평가)

  • Bae Seong Yong;Kim Kwang Hoon
    • Journal of Internet Computing and Services
    • /
    • v.5 no.6
    • /
    • pp.31-43
    • /
    • 2004
  • Voice over Internet Protocol (VoIP) is currently a popular research topic as a real time voice packet transmission method. But current Internet environment do not guarantee the quality of voice when we take a side view of delay, jitter and loss. Up to now, many voice based evaluation algorithms have been used to measure speech quality of VoIP systems. However, these algorithms have the defects that their results are different according to voice samples and some algorithms can not take network environment for speech transmission path. The E-model can be used to solve the problems of these algorithms. In this paper. we introduce VoIP planning guidelines through the various analysis of E-model which can model impairments of network quality as well as VoIP equipment quality systematically, We, also, show the evaluation method and results of speech transmission quality.

  • PDF

Performance Comparison for Objective Measures of Speech Quality Evaluation in PCS Wireless Telephone Network (PCS 이동전화망에서의 객관적인 음질평가척도별 성능비교)

  • Kim Nag-Cheol;Kim Kwang-Soo;Jung Ho-Youl;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.48-51
    • /
    • 1999
  • 본 연구에서는 PCS 이동전화의 객관적 통화품질평가 척도개발을 위한 기초연구로 기존의 CD(Cepstral Distance), MSD (Mel Spectral Distance), BSD(Bark Spectral Distance), PSQM (Perceptual Speech Quality Measure) 척도를 적용하여 그 성능을 비교 분석하였다. 이 척도들을 실제환경에서 수집된 PCS 음성데이터에 대해서 적용하였고 이 결과치와 청취자들의 평가 반응에 의해 얻어진 MOS 결과치와의 상관성을 조사하였다. 실험 결과, BSD와 PSQM 척도의 상관성이 0.81, 0.84로 나타나 CD, MSD보다 성능이 더 우수함을 보였다.

  • PDF

Performance Comparison of Objective Measures for Speech Quality for Evaluation in CDMA Mobile Telephone (CDMA 이동전화 통화품질평가를 위한 객관적 음질평가척도별 성능 비교)

  • 이준희;김광수;윤정오
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2001.05a
    • /
    • pp.256-260
    • /
    • 2001
  • 본 논문에서는 디지털 이동전화(CDMA) 채널환경을 통과한 왜곡된 전화음성에 대해 객관적 음질평가 척도의 개발을 위한 기초 연구로서 기존의 CD(Cepstral Distance), MSD(Mel Spectral Distance), BSD(Bark Spectral Distance), Modified BSD, PSQM(Perceptual Speech Quality Measure)를 대상으로 객관척도 알고리즘을 성능평가 하였다. 이 척도들은 실제 이동전화 환경에서 수집된 PCS 음성데이터에 대해서 적용하였으며 이 결과치를 주관적 음질평가 방법인 MU와 상관성을 비교 조사하였다. 실험 결과, BSD와 MBSD, 그리고 PSQM 척도의 상관성이 각각 0.80, 0.85, 0.84로 나타났으며 CD, MSD 보다 성능이 상대적으로 더 우수함을 보였다.

  • PDF

The Relationship between Acoustic Characteristics and Voice Handicap Index in Esophageal Speakers (식도발성 환자의 음향학적 특성과 음성장애지수의 상관성)

  • Jang, Hyo-Ryung;Shim, Hee-Jeong;Shin, Hee-Baek;Ko, Do-Heung;Kim, Hyun-Ki
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.115-121
    • /
    • 2014
  • This paper investigates the relationship between acoustic characteristics and voice handicap index for 29 males with esophageal speakers. Acoustic characteristics were measured by using a sustained vowel /a/ three times. The stable vocalization for 2 seconds was analyzed by MDVP program. Specifically, relationships between four VHI scores (total, functional, physical, and emotional) and three acoustic characteristics (jitter, shimmer, and NHR) were investigated using the Pearson correlation coefficient. As results, we found no relationship between NHR and VHI scores. However, both jitter and shimmer had statistically significant correlations with all four VHI scores. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation with esophageal speakers. Further research could be done to examine the overall quality of life survey, which is widely used as a subjective measure about voice for patients with esophageal speakers.

Improvement of VAD Performance for the Reduction of the Bit Rate Under the Noise Environment in the G.723.1 (잡음 환경에서의 전송률 감소를 위한 G.723.1 음성활동 검출기 성능 개선에 관한 연구)

  • 김정진;장경아;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.42-47
    • /
    • 2001
  • This paper improves the performance of VAD (Voice Activity Detector) in G.723.1 Annex A 6.3kbps/5.3kbps dual rate speech coder, which is developed for Internet Phone and videoconferencing. The VAD decision is based on a three-level energy threshold. We evaluates for processing time, speech quality, and bit rate. The processing time is reduced due to the accuracy of VAD decision on the silence period. On subjective quality test there is almost no difference compared with the G.723.1. In order to measure the bit rate we count the active speech frame (VAD=1) and we can reduce more bit rate as silence periods are shown.

  • PDF

Improvement of Overlapped Codebook Search in QCELP (QCELP에서 중첩된 코드북 검색의 개선)

  • 박광철;한승진;이정현
    • The KIPS Transactions:PartC
    • /
    • v.8C no.1
    • /
    • pp.105-112
    • /
    • 2001
  • In this paper, we present the advanced QCELP codebook search improving the qualification of speech, which can make QCELP vocoder used in noise robust system. While conventional QCELP usually searches stochastic codebook once, we can find that two times search is the most suitable for improving the quality of speech after we did 2-5 times search. Consequently, the advanced QCELP vocoder represents excitation signal in detail using two times precise quantization and so improve the qualification of speech. In our experiment, we use the speeches collected from circumstance (such as lecture room, house, street, laboratory etc.) without regarding noise as input dat and measure the speech Qualification using SNR, segSNR. As the result of the experiment, we find that the advanced QCELP makes SNR and segSNR improved by 38.35% and 65.51% respectively compared with conventional QCELP.

  • PDF

Improvement of Speech Intelligibility in Noisy Environments (잡음 환경에서의 음성 명료도 향상 기술)

  • Yoon, Jae-Yul;Kim, Jung-Hoe;Oh, Eun-Mi;Park, Ho-Chong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.1
    • /
    • pp.70-76
    • /
    • 2009
  • In speech communications in noisy environments, speech intelligibility is seriously degraded due to the masking effect of ambient noise. In this paper, a new method to improve speech intelligibility in noisy environments is proposed. Based on the perception theory that the temporal envelope plays a major role in determining intelligibility, the proposed method uses a novel operation that enhances the fluctuation of band-wise temporal envelope and also contains pitch enhancement for improving speech naturalness. In addition, a new subjective evaluation scheme employing binaural listening is proposed in order to measure more reliable performance. The subjective performance measured with the proposed scheme shows that the proposed method improves both intelligibility and naturalness in various environments, whereas a function parameter can control the performance trade-off between intelligibility and naturalness.

A preliminary study of sound quality evaluation of cochlear implant users (인공와우 사용자의 심리음향적 음질평가 예비연구)

  • Bahng, Junghwa;Oh, Soo Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.45-51
    • /
    • 2022
  • Sound quality evaluation is one of the psychoacoustic methods to measure subjective judgements for sound color. The purpose of this study is to investigate sound quality benefits of bimodal users by comparing sound quality scores between bimodal hearing condition and unilateral cochlear implant(CI) condition as a preliminary study. Thirteen bimodal users and seven unilateral CI users were participated in this study. Audiologists performed pure tone and speech audiometry and measured functional gain and real-ear insertion gain. Subjective assessment of sound quality was followed with four sounds including violin sound, male and female voices, and refrigerator noise. Participants judged the sound quality with six sound quality index. Bimodal users showed mean 0.8 points more sound quality improvements in bimodal condition than unilateral CI condition. Group comparison between bimodal and unilateral CI users showed no differences. A follow-up study of sound quality tools and methods should be considered to evaluate subjective bimodal benefits of cochlear implant users.