• Title/Summary/Keyword: Speech Quality Assessment

Search Result 49, Processing Time 0.023 seconds

Performance Evaluation of Novel AMDF-Based Pitch Detection Scheme

  • Kumar, Sandeep
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.425-434
    • /
    • 2016
  • A novel average magnitude difference function (AMDF)-based pitch detection scheme (PDS) is proposed to achieve better performance in speech quality. A performance evaluation of the proposed PDS is carried out through both a simulation and a real-time implementation of a speech analysis-synthesis system. The parameters used to compare the performance of the proposed PDS with that of PDSs that are based on either a cepstrum, an autocorrelation function (ACF), an AMDF, or circular AMDF (CAMDF) methods are as follows: percentage gross pitch error (%GPE); a subjective listening test; an objective speech quality assessment; a speech intelligibility test; a synthesized speech waveform; computation time; and memory consumption. The proposed PDS results in lower %GPE and better synthesized speech quality and intelligibility for different speech signals as compared to the cepstrum-, ACF-, AMDF-, and CAMDF-based PDSs. The computational time of the proposed PDS is also less than that for the cepstrum-, ACF-, and CAMDF-based PDSs. Moreover, the total memory consumed by the proposed PDS is less than that for the ACF- and cepstrum-based PDSs.

Implementation and Evaluation of an HMM-Based Speech Synthesis System for the Tagalog Language

  • Mesa, Quennie Joy;Kim, Kyung-Tae;Kim, Jong-Jin
    • MALSORI
    • /
    • v.68
    • /
    • pp.49-63
    • /
    • 2008
  • This paper describes the development and assessment of a hidden Markov model (HMM) based Tagalog speech synthesis system, where Tagalog is the most widely spoken indigenous language of the Philippines. Several aspects of the design process are discussed here. In order to build the synthesizer a speech database is recorded and phonetically segmented. The constructed speech corpus contains approximately 89 minutes of Tagalog speech organized in 596 spoken utterances. Furthermore, contextual information is determined. The quality of the synthesized speech is assessed by subjective tests employing 25 native Tagalog speakers as respondents. Experimental results show that the new system is able to obtain a 3.29 MOS which indicates that the developed system is able to produce highly intelligible neutral Tagalog speech with stable quality even when a small amount of speech data is used for HMM training.

  • PDF

Scoring Methods for Improvement of Speech Recognizer Detecting Mispronunciation of Foreign Language (외국어 발화오류 검출 음성인식기의 성능 개선을 위한 스코어링 기법)

  • Kang Hyo-Won;Kwon Chul-Hong
    • MALSORI
    • /
    • no.49
    • /
    • pp.95-105
    • /
    • 2004
  • An automatic pronunciation correction system provides learners with correction guidelines for each mispronunciation. For this purpose we develope a speech recognizer which automatically classifies pronunciation errors when Koreans speak a foreign language. In order to develope the methods for automatic assessment of pronunciation quality, we propose a language model based score as a machine score in the speech recognizer. Experimental results show that the language model based score had higher correlation with human scores than that obtained using the conventional log-likelihood based score.

  • PDF

An Objective Speech Quality Measure using Masking Effect under Digital Mobile Telephone Network Environment (디지털 이동통신망 환경 하에서 마스킹 효과를 이용한 객관적 음질 평가 척도)

  • 김광수;김민정;석수영;정호열;정현일
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.4
    • /
    • pp.405-414
    • /
    • 2002
  • In this paper, we propose a new objective speech quality measure using noise masking threshold for speech quality assessment of mobile telephone network environments, and verify the effectiveness of the proposed method through the experiments. For such a purpose, well known objective speech quality measures such as BSD and PSQM are first evaluated for digital mobile telephone network environments. However, these conventional methods does not have good performance under mobile networks environments compared to literary results. To be mote effective objective speech quality measure under mobile telephone environments, the proposed method employs human psychoacoustic masking effect. The DMOS, instead of MOS, is used as a subjective speech quality measure for performance evaluation. The performance comparison are carried out with speech data collected from digital mobile telephone environments. As results, the proposed measure have and average 4% higher performance, in terms of correlation, than existing objective speech quality measures such as BSD and PSQM.

  • PDF

A Study on Objective Quality Assessment of Synthesized Speech by Rule (규칙 합성음의 객관적 품질평가에 관한 연구)

  • 홍진우
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.67-72
    • /
    • 1991
  • This paper evaluates thequality of synthesized speech by rule using the LPC CD in the objective measure and then compares the result with the subjective analysis. By evaluating the quality of synthesized speech by rule objectively. We have tried to resolve the problems (Evaluation time or size expansion, variables within the analysis results) that arise when the evaluation is done subjectively. Also by comparing intelligibility-the index for the subjective quality evaluation of synthesized speech by rule-with evaluation results obtained using MOS and the objective evaluation. We have proved the validity of the objective analysis and thus provides a guide that would be useful when R&D and marketing of synthesis by rule method is done.

  • PDF

Comparative Studies on the Self Voice Assessment of Voice Disorder Patients and the Hearer Voice Assessment of a Comparative Group of normal subjects (음성장애인의 자가음성평가와 정상음성인의 청자음성평가 특성 비교)

  • Lee, Yu-Jin;Hwang, Young-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.105-114
    • /
    • 2012
  • This paper will discuss the difference between self assessment of voice disorders and the hearer voice assessment of a comparative group of normal subjects. The study was conducted on 25 voice disorder subjects and 32 hearers of a comparative group of normal subjects. The results are as follows. Firstly, in K-VHI and VHI-H, the hearers of the comparative group of normal subjects perceived more serious voice disorders than the voice disorder group in all sub-domains. Likewise, in K-VQOL and VRQOL-H, the hearers of the comparative group of normal subjects perceived more serious voice disorders than the voice disorder group in all sub-domains. Secondly, the hearer voice assessment of the comparative group of normal subjects showed no difference in gender regarding the perception of the severity of voice disorder issues. Thirdly, the hearer voice assessment of the comparative group of normal subjects states that in the emotional aspects of VHI-H, professional voice users perceive more serious voice disorders than others. Accordingly, in VRQOL-H, there was no difference in use of the voice between professionals and others.

Machine Scoring Methods Highly-correlated with Human Ratings in Speech Recognizer Detecting Mispronunciation of Foreign Language (한국인의 외국어 발화오류검출 음성인식기에서 청취판단과 상관관계가 높은 기계 스코어링 기법)

  • Bae, Min-Young;Kwon, Chul-Hong
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.217-226
    • /
    • 2004
  • An automatic pronunciation correction system provides users with correction guidelines for each pronunciation error. For this purpose, we develop a speech recognition system which automatically classifies pronunciation errors when Koreans speak a foreign language. In this paper, we propose a machine scoring method for automatic assessment of pronunciation quality by the speech recognizer. Scores obtained from an expert human listener are used as the reference to evaluate the different machine scores and to provide targets when training some of algorithms. We use a log-likelihood score and a normalized log-likelihood score as machine scoring methods. Experimental results show that the normalized log-likelihood score had higher correlation with human scores than that obtained using the log-likelihood score.

  • PDF

Discussions on Auditory-Perceptual Evaluation Performed in Patients With Voice Disorders (음성장애 환자에서 시행되는 청지각적 평가에 대한 논의)

  • Lee, Seung Jin
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.32 no.3
    • /
    • pp.109-117
    • /
    • 2021
  • The auditory-perceptual evaluation of speech-language pathologists (SLP) in patients with voice disorders is often regarded as a touchstone in the multi-dimensional voice evaluation procedures and provides important information not available in other assessment modalities. Therefore, it is necessary for the SLPs to conduct a comprehensive and in-depth evaluation of not only voice but also the overall speech production mechanism, and they often encounter various difficulties in the evaluation process. In addition, SLPs should strive to avoid bias during the evaluation process and to maintain a wide and constant spectrum of severity for each parameter of voice quality. Lastly, it is very important for the SLPs to perform a team approach by documenting and delivering important information pertaining to auditory-perceptual characteristics in an appropriate and efficient way through close communication with the laryngologists.

Machine scoring method for speech recognizer detection mispronunciation of foreign language (외국어 발화오류 검출 음성인식기를 위한 스코어링 기법)

  • Kang, Hyo-Won;Bae, Min-Young;Lee, Jae-Kang;Kwon, Chul-Hong
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.239-242
    • /
    • 2004
  • An automatic pronunciation correction system provides users with correction guidelines for each pronunciation error. For this purpose, we propose a speech recognition system which automatically classifies pronunciation errors when Koreans speak a foreign language. In this paper, we also propose machine scoring methods for automatic assessment of pronunciation quality by the speech recognizer. Scores obtained from an expert human listener are used as the reference to evaluate the different machine scores and to provide targets when training some of algorithms. We use a log-likelihood score and a normalized log-likelihood score as machine scoring methods. Experimental results show that the normalized log-likelihood score had higher correlation with human scores than that obtained using the log-likelihood score.

  • PDF