• 제목/요약/키워드: Speech quality

검색결과 803건 처리시간 0.027초

자연성 평가를 위한 객관적 음질 평가 방법 (Objective Speech Quality Measurement for Naturalness Assessment)

  • 장경아;이희원;송종회;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2001년도 하계종합학술대회 논문집(1)
    • /
    • pp.361-364
    • /
    • 2001
  • Speech quality measurement is sorted subjective and objective speech quality measurements based on the mathematics representation. Between two, subjective speech quality measurement is able to be evaluated more accurate speech quality then objective one but it has a demerit such as taking much more time and cost to performance it. However, using objective speech quality measurement being able to predict the result of evaluating the subjective speech quality compliments the demerit of subjective standards which is mentioned former. In this paper, we propose the objective speech quality measurement in order to evaluate naturalness assessment approximately like subjective speech quality measurement. We measured naturalness of speech with estimating speaking rate, variation of pitch and energy.

  • PDF

심리 음향 켑스트럼 평균 차감법을 이용한 이동 전화망에서의 음질 평가 (Speech Quality Measure in a Mobile Communication System Using PLP Cepstral Distance with CMS)

  • 윤종진;박상욱;박영철;윤대희;차일환
    • 음성과학
    • /
    • 제6권
    • /
    • pp.163-179
    • /
    • 1999
  • For the set up, management and repair of a mobile communication system, continuous estimation of speech quality is required. Speech quality measurement can be conducted by listener's judgement in a subjective test such as MOS (Mean Opinion Score) test. However, this method is laborious, expensive and time-consuming, it is advisable to predict subjective speech quality via objective measures. This paper presents a robust objective speech quality measure, PLP-CMS (Perceptual Linear Predictive-Cepstral Mean Subtraction), which can predict subjective speech quality in mobile communication systems. PLP-CMS has a high correlation with subjective quality owing to PLP (Perceptual Linear Predictive) analysis and shows a robust performance not being influenced by PSTN (Public Switched Telephone Network) channel effects due to CMS (Cepstral Mean Subtraction). To prove the performance of our proposed algorithm, we carried out subjective and objective quality estimation on speech samples which are variously distorted in a real mobile communication system. As a result, we demonstrated that PLP-CMS has a higher correlation with subjective quality than PSQM (Perceptual Speech Quality Measure) and PLP-CD (Perceptual Linear Predictive-Cepstral Distance).

  • PDF

심리음향 특성을 이용한 음성 향상 알고리즘 (A Speech Enhancement Algorithm based on Human Psychoacoustic Property)

  • 전유용;이상민
    • 전기학회논문지
    • /
    • 제59권6호
    • /
    • pp.1120-1125
    • /
    • 2010
  • In the speech system, for example hearing aid as well as speech communication, speech quality is degraded by environmental noise. In this study, to enhance the speech quality which is degraded by environmental speech, we proposed an algorithm to reduce the noise and reinforce the speech. The minima controlled recursive averaging (MCRA) algorithm is used to estimate the noise spectrum and spectral weighting factor is used to reduce the noise. And partial masking effect which is one of the human hearing properties is introduced to reinforce the speech. Then we compared the waveform, spectrogram, Perceptual Evaluation of Speech Quality (PESQ) and segmental Signal to Noise Ratio (segSNR) between original speech, noisy speech, noise reduced speech and enhanced speech by proposed method. As a result, enhanced speech by proposed method is reinforced in high frequency which is degraded by noise, and PESQ, segSNR is enhanced. It means that the speech quality is enhanced.

Speech Quality of a Sinusoidal Model Depending on the Number of Sinusoids

  • Seo, Jeong-Wook;Kim, Ki-Hong;Seok, Jong-Won;Bae, Keun-Sung
    • 음성과학
    • /
    • 제7권1호
    • /
    • pp.17-29
    • /
    • 2000
  • The STC(Sinusoidal Transform Coding) is a vocoding technique that uses a sinusoidal speech model to obtain high- quality speech at low data rate. It models and synthesizes the speech signal with fundamental frequency and its harmonic elements in frequency domain. To reduce the data rate, it is necessary to represent the sinusoidal amplitudes and phases with as small number of peaks as possible while maintaining the speech quality. As a basic research to develop a low-rate speech coding algorithm using the sinusoidal model, in this paper, we investigate the speech quality depending on the number of sinusoids. By varying the number of spectral peaks from 5 to 40 speech signals are reconstructed, and then their qualities are evaluated using spectral envelope distortion measure and MOS(Mean Opinion Score). Two approaches are used to obtain the spectral peaks: one is a conventional STFT (Short-Time Fourier Transform), and the other is a multiresolutional analysis method.

  • PDF

정상 음성의 목소리 특성의 정성적 분류와 음성 특징과의 상관관계 도출 (Qualitative Classification of Voice Quality of Normal Speech and Derivation of its Correlation with Speech Features)

  • 김정민;권철홍
    • 말소리와 음성과학
    • /
    • 제6권1호
    • /
    • pp.71-76
    • /
    • 2014
  • In this paper voice quality of normal speech is qualitatively classified by five components of breathy, creaky, rough, nasal, and thin/thick voice. To determine whether a correlation exists between a subjective measure of voice and an objective measure of voice, each voice is perceptually evaluated using the 1/2/3 scale by speech processing specialists and acoustically analyzed using speech analysis tools such as the Praat, MDVP, and VoiceSauce. The speech parameters include features related to speech source and vocal tract filter. Statistical analysis uses a two-independent-samples non-parametric test. Experimental results show that statistical analysis identified a significant correlation between the speech feature parameters and the components of voice quality.

HMM 기반의 한국어 합성음에 대한 PESQ 및 MOS 평가의 상관도 분석 (Correlation Analysis of PESQ and MOS Evaluation for HMM-based Synthetic Korean Speech)

  • 임창송;배건성
    • 말소리와 음성과학
    • /
    • 제2권1호
    • /
    • pp.71-75
    • /
    • 2010
  • The PESQ is an objective speech quality evaluation measure that is known to have a high correlation with a subjective speech quality measure such as MOS. To examine whether it could be useful as an objective quality measure of synthetic speech, we carried out both subjective evaluation tests with MOS and DMOS and an objective evaluation test with PESQ for HMM-based Korean synthetic speech signals and analyzed the correlation between them. Experimental results have shown that the PESQ has correlations of 0.87 with MOS and 0.92 with DMOS. It means that the PESQ holds much promise for evaluating the quality of synthetic Korean speech.

  • PDF

Voice Quality Criteria for Heterogenous Network Communication Under Mobile-VoIP Environments

  • Choi, Jae-Hun;Seol, Soon-Uk;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • 제28권3E호
    • /
    • pp.99-108
    • /
    • 2009
  • In this paper, we suggest criteria for objective measurement of speech quality in mobile VoIP (Voice over Internet Protocol) services over wireless mobile internet such as mobile WiMAX networks. This is the case that voice communication service is available under other networks. When mobile VoIP service users in the mobile internet network based on packet call up PSTN and mobile network users, but there have not been relevant quality indexes and quality standards for evaluating speech quality of mobile VoIP. In addition, there are many factors influencing on the speech quality in packet network. Especially, if the degraded speech with packet loss transfers to the other network users through the handover, voice communication quality is significantly deteriorated by the transformation of speech codecs. In this paper, we eventually adopt the Gilbert-Elliot channel model to characterize packet network and assess the voice quality through the objective speech quality method of ITU-T P. 862. 1 MOS-LQO for the various call scenario from mobile VoIP service user to PSTN and mobile network users under various packet loss rates in the transmission channel environments. Our simulation results show that transformation of speech codecs results in the degraded speech quality for different transmission channel environments when mobile VoIP service users call up PSTN and mobile network users.

경직형 뇌성마비 아동의 음질이 말명료도에 미치는 영향 (The effect of voice quality on speech intelligibility in children with spastic cerebral palsy)

  • 정필연;심현섭
    • 말소리와 음성과학
    • /
    • 제9권4호
    • /
    • pp.129-136
    • /
    • 2017
  • This study investigates the effect of voice quality on speech intelligibility and the relationship between voice quality and intelligibility for children with spastic CP. We recruited 36 children with spastic CP (mean age 10.43 year, 17 girls, 19 boys, spastic type 34, mixed 2) from a special school and a rehabilitation hospital. Voice samples for the perceptual analysis of voice quality were extracted from a sustained vowel /a/ and were rated on the GRBAS scales by two experienced speech language pathologists. Ten adult subjects with no hearing problems evaluated speech intelligibility for the 37 words listed in the Assessment of Phonology and Articulation for Children on a 7-point interval scale. The children with spastic CP were divided into three groups according to the rated G scores on the GRBAS scales (G1(n)=10, G2(n)=13, G3(n)=13). Analyses of ANCOVA and Pearson correlation showed that there was a significant difference in speech intelligibility among three groups. There was also a significant correlation in G scale (grade), A scale (asthenia), B scale (breathy) score, and speech intelligibility. These findings suggest that poor speech intelligibility of spastic CP might be related to asthenia and breathiness. Vocal intensity should be increased and vocal functioning should be improved for speech therapy to improve speech intelligibility of the children with spastic CP.

음성 특성 지표를 이용한 음성 인식 성능 예측 (Speech Recognition Accuracy Prediction Using Speech Quality Measure)

  • 지승은;김우일
    • 한국정보통신학회논문지
    • /
    • 제20권3호
    • /
    • pp.471-476
    • /
    • 2016
  • 본 논문에서는 음성 특성 지표를 이용한 음성 인식 성능 예측 실험의 내용을 소개한다. 선행 실험에서 효과적인 음성 인식 성능 예측을 위해 대표적인 음성 인식 성능 지표인 단어 오인식률과 상관도가 높은 여러 가지 특성 지표들을 조합하여 새로운 성능 지표를 제안하였다. 제안한 지표는 각 음성 특성 지표를 단독으로 사용할 때 보다 단어 오인식률과 높은 상관도를 나타내 음성 인식 성능을 예측하는데 효과적임을 보였다. 본 실험에서는 이 결과를 근거하여 조합에 사용된 음성 특성 지표를 채택하여 4차원 특징 벡터를 생성하고 GMM 기반의 음성 인식 성능 예측기를 구축한다. 가우시안 요소를 증가시키며 실험한 결과 제안된 시스템은 babble 잡음, 자동차 잡음에서 모두 SNR이 낮을수록 단어 오인식률을 높은 확률로 예측함을 확인하였다.

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계 (A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments)

  • 이길호;윤재삼;오유리;김홍국
    • 대한음성학회지:말소리
    • /
    • 제54호
    • /
    • pp.27-43
    • /
    • 2005
  • Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

  • PDF