• Title/Summary/Keyword: Speech quality measure

Search Result 55, Processing Time 0.035 seconds

Voice Quality of Dysarthric Speakers in Connected Speech (연결발화에서 마비말화자의 음질 특성)

  • Seo, Inhyo;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.5 no.4
    • /
    • pp.33-41
    • /
    • 2013
  • This study investigated the perceptual and cepstral/spectral characteristics of phonation and their relationships in dysarthria in connected speech. Twenty-two participants were divided into two groups; the eleven dysarthric speakers were paired with matching age and gender healthy control participants. A perceptual evaluation was performed by three speech pathologists using the GRBAS scale to measure the cepstrual/spectral characteristics of phonation between the two groups' connected speech. Correlations showed dysarthric speakers scored significantly worse (with a higher rating) with severities in G (overall dysphonia grade), B (breathiness), and S (strain), while the smoothed prominence of the cepstral peak (CPPs) was significantly lower. The CPPs were significantly correlated with the perceptual ratings, including G, B, and S. The utility of CPPs is supported by its high relationship with perceptually rated dysphonia severity in dysarthric speakers. The receiver operating characteristic (ROC) analysis showed that the threshold of 5.08 dB for the CPPs achieved a good classification for dysarthria, with 63.6% sensitivity and the perfect specificity (100%). Those results indicate the CPPs reliably distinguished between healthy controls and dysarthric speakers. However, the CPP frequency (CPP F0) and low-high spectral ratio (L/H ratio) were not significantly different between the two groups.

A New Objective Speech Quality Measure Over Mobile Communication Using Bark Coherence Function (바크 코히어런스 함수를 이용한 이동 전화 음질 평가)

  • 박상옥;류승균;박영철;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.4B
    • /
    • pp.437-446
    • /
    • 2001
  • 음질 평가에는 주관적 음질 평가법과 객관적 음질 평가법이 있는데, 주관적 음질 평가법은 사람이 직접 듣고 평가하므로 실제 체감 음질을 나타낸다. 그러나 많은 사람들에 의하여 직접 평가되므로 비용과 시간이 많이 소모되는 단점이 있다. 객관적 음질 평가법은 수학적인 계산에 의하여 원음과 왜곡음의 유사성을 비교하는 것으로 빠르고 비용이 적게 되나 실제 체감 음질과는 거리과 있다. 본 논문에서는 객관적 음질 평가 척도로 BCF(Bark Coherence Function)을 제안한다. BCF는 심리 음향 영역에서 코히어런스 함수를 정의한 것으로 기존의 객관적 음질 평가법에 비하여 주관적 음질과 상관관계가 높고 계산량이 적다. CDMA 이동 전화 시스템의 음성 데이터와 회기분석 결과, BCF가 ITU-T 표준안의 PSQM(Perceptual Speech Quality Measure)와 MNB(Measuring Normalizing Block)에 비하여 높은 상관관계를 갖음을 입증하였다.

  • PDF

Objective Measure for Estimating Subjective Voice Quality in Wireless Communication (CDMA 이동통신 시스템에서의 주관적 음질을 추정하기 위한 객관적 척도)

  • 백금란
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.297-302
    • /
    • 1998
  • 본 논문에서는 CDMA(Code Division Multiple Access) 채널을 통과하면서 여러 가지 형태로 손상된 음성에 대한 주관적 음질 평가를 할 수 있는 객관적 척도에 대한 연구를 수행하였다. 즉, CDMA 채널을 통과한 음성 신호에 대하여 주관적 음질 평가 방법 중 가장 많이 사용되고 있는 MOS(Mean Opinion Score) 테스트를 수행하고, 이 MOS 테스트 결과를 추정할 수 있는 객관척도 알고리즘을 시뮬레이션 하였다. 이러한 연구 결과로 PSQM(Perceptual Speech Quality Measure)을 CDMA 채널 환경에 맞게 수정하여 우수한 성능의 객관적 음질 평가 방법을 얻었다.

  • PDF

Enhanced Adjustment Strategy of Masking Threshold for Speech Signals in Low Bit-Rate Audio Coding (저전송률 오디오 부호화에서 음성 신호의 성능 개선을 위한 마스킹 임계값 적응기법 향상)

  • Lee, Chang-Heon;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.62-68
    • /
    • 2010
  • This paper proposes a new masking threshold adjustment strategy to improve the performance for speech signals in low bit-rate audio coding. After determining formant regions, the masking threshold is adjusted by using the energy ratio of each sub-band to the average energy of each formant. More quantization noises are added to the bands that have relatively large energy, but less distortion is allowed in spectral valley regions by allocating more bits, which reflects the concept of perceptual weighting widely used in speech coding. From the results of objective speech quality measure, we verified that the proposed method improves quality for the speech input signals compared to the conventional one.

Voice quality transform using jitter synthesis (Jitter 합성에 의한 음질변환에 관한 연구)

  • Jo, Cheolwoo
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.121-125
    • /
    • 2018
  • This paper describes procedures of changing and measuring voice quality in terms of jitter. Jitter synthesis method was applied to the TD-PSOLA analysis system of the Praat software. The jitter component is synthesized based on a Gaussian random noise model. The TD-PSOLA re-synthesize process is used to synthesize the modified voice with artificial jitter. Various vocal jitter parameters are used to measure the change in quality caused by artificial systematic jitter change. Synthetic vowels, natural vowels and short sentences are used to check the change in voice quality through the synthesizer model. The results shows that the suggested method is useful for voice quality control in a limited way and can be used to alter the jitter component of voice.

A Study on the Optimal Mahalanobis Distance for Speech Recognition

  • Lee, Chang-Young
    • Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.177-186
    • /
    • 2006
  • In an effort to enhance the quality of feature vector classification and thereby reduce the recognition error rate of the speaker-independent speech recognition, we employ the Mahalanobis distance in the calculation of the similarity measure between feature vectors. It is assumed that the metric matrix of the Mahalanobis distance be diagonal for the sake of cost reduction in memory and time of calculation. We propose that the diagonal elements be given in terms of the variations of the feature vector components. Geometrically, this prescription tends to redistribute the set of data in the shape of a hypersphere in the feature vector space. The idea is applied to the speech recognition by hidden Markov model with fuzzy vector quantization. The result shows that the recognition is improved by an appropriate choice of the relevant adjustable parameter. The Viterbi score difference of the two winners in the recognition test shows that the general behavior is in accord with that of the recognition error rate.

  • PDF

Conversational Quality Measurement System for Mobile VoIP Speech Communication (모바일 VoIP 음성통신을 위한 대화음질 측정 시스템)

  • Cho, Jae-Man;Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.10 no.4
    • /
    • pp.71-77
    • /
    • 2011
  • In this paper, we propose a conversational quality measurement (CQM) system for providing the objective QoS of high quality mobile VoIP voice telecommunication. For measuring the conversational quality, the VoIP telecommunication system is implemented in two smart phones connected with VoIP. The VoIP telecommunication system consists of echo cancellation, noise reduction, speech encoding/decoding, packet generation with RTP (Real-Time Protocol), jitter buffer control and POS (Play-out Schedule) with LC (loss Concealment). The CQM system is connected to a microphone and a speaker of each smart phone. The voice signal of each speaker is recorded and used to measure CE (Conversational Efficiency), CS (Conversational Symmetry), PESQ (Perceptual Evaluation of Speech Quality) and CE-CS-PESQ correlation. We prove the CQM system by measuring CE, CS and PESQ under various SNR, delay and loss due to IP network environment.

A 4 kbps PSI-VSELP Speech Coding Algorithm (4 kbps PSI-VSELP 음성 부호화 알고리듬)

  • Choi, Yong-Soo;Kang, Hong-Goo;Park, Sang-Wook;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.6
    • /
    • pp.59-65
    • /
    • 1996
  • This paper proposes a 4 kbps PSI-VSELP(Pitch Synchronous Innovation-Vector Sum Excited Linear Prediction) speech coder which produces speech equivalent to that of the conventional 4.8 kbps VSELP. Since the 'half-rate' is differently defined from country to country, there may be a need to reduce the bit rate of conventional half-rate coder. To minimize the degradation of speech quality caused by bit-rate reduction, it is desirable to perform bit-allocation based on the carefull consideration of the effect of various transmission parameters. This paper adopts this analytical approach for bit-allocation at 4 kbps. To improve the quality of the VSELP coder at 4 kbps, basis vectors which play the most important role in the performance, are optimized by an iterative closed-loop training process and the PSI technique is employed in the VSELP performance, are optimized by an iterative closed-loop training process and the PSI technique is employed in the VSELP coder. To demonstrate the performance of the proposed speech coder, we peformed experiments under the noiseless and error free conditions. From experimental results, even though the proposed 4 kbps PSI-VSELP coder showed lower scores in the objective measure, higher scores in subjective measure was obtained compared with those of the conventional 4.8 kbps VSELp.

  • PDF

Speech Quality Estimation Algorithm using a Harmonic Modeling of Reverberant Signals (반향 음성 신호의 하모닉 모델링을 이용한 음질 예측 알고리즘)

  • Yang, Jae-Mo;Kang, Hong-Goo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.6
    • /
    • pp.919-926
    • /
    • 2013
  • The acoustic signal from a distance sound source in an enclosed space often produces reverberant sound that varies depending on room impulse response. The estimation of the level of reverberation or the quality of the observed signal is important because it provides valuable information on the condition of system operating environment. It is also useful for designing a dereverberation system. This paper proposes a speech quality estimation method based on the harmonicity of received signal, a unique characteristic of voiced speech. At first, we show that the harmonic signal modeling to a reverberant signal is reasonable. Then, the ratio between the harmonically modeled signal and the estimated non-harmonic signal is used as a measure of standard room acoustical parameter, which is related to speech clarity. Experimental results show that the proposed method successfully estimates speech quality when the reverberation time varies from 0.2s to 1.0s. Finally, we confirm the superiority of the proposed method in both background noise and reverberant environments.

Quality Assessment of Telephone Speech with ATM Circuit Emulation Services (ATM 망을 통한 Circuit Emulation 서비스에서 전화음성의 품질평가)

  • Cho, Young-Soon;Seo, Jeong-Wook;Bae, Keun-Sung
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.6
    • /
    • pp.156-163
    • /
    • 1998
  • The ATM network provides ATM CES(Circuit Emulation Services) with AAL1 for CBR(constant bit rate) services such as telephone speech. In this study, quality assessment of telephone speech with CES over ATM was performed and discussed. For this, interoperability between ATM network and structured/unstructured DS1 link was modeled for simulation. And for qualiy assessment of telephone speech, SNR and MOS were used as an objective and a subjective measure, respectively. Experimental results have shown that MOS score 4 as well as SNR 30dB could be obtained at CLR of $10^{-3}$ or below for speech signal.

  • PDF