• Title/Summary/Keyword: prosody evaluation

Search Result 25, Processing Time 0.022 seconds

Evaluation of English speaking proficiency under fixed speech rate: Focusing on utterances produced by Korean child learners of English

  • Narah Choi;Tae-Yeoub Jang
    • Phonetics and Speech Sciences
    • /
    • v.15 no.1
    • /
    • pp.47-54
    • /
    • 2023
  • This study attempted to test the hypothesis that Korean evaluators can score L2 speech appropriately, even when speech rate features are unavailable. Two perception experiments-preliminary and main-were conducted sequentially. The purpose of the preliminary experiment was to categorize English-as-a-foreign-language (EFL) speakers into two groups-advanced learners and lower-level learners-based on the proficiency scores given by five human raters. In the main experiment, a set of stimuli was prepared such that the speech rate of all data tokens was modified to have a uniform speech rate. Ten human evaluators were asked to score the stimulus tokens on a 5-point scale. These scores were statistically analyzed to determine whether there was a significant difference in utterance production between the two groups. The results of the preliminary experiment confirm that higher-proficiency learners speak faster than lower-proficiency learners. The results of the main experiment indicate that under controlled speech-rate conditions, human raters can appropriately assess learner proficiency, probably thanks to the linguistic features that the raters considered during the evaluation process.

SPEECH SYNTHESIS IN THE TIME DOMAIN BY PITCH CONTROL USING LAGRANGE INTERPOLATION(TD-PCULI)

  • Kang, Chan-Hee;Shin, Yong-Jo;Kim, Yun-Seok-;Kang, Dae-Soo;Lee, Jong-Heon-;Kwon, Ki-Hyung;An, Jeong-Keun;Sea, Sung-Tae;Chin, Yong-Ohk
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.984-990
    • /
    • 1994
  • In this paper a new speech synthesis method in the time domain using mono-syllables is proposed. It is to overcome the degradation of the synthetic speech quality by the synthesis method in the frequency domain and to develop an algorithm in the time domain for the prosodic control. In particular when we use a method in a time domain with mono-syllable as a synthesis unit it will be the main issues which are to control th pitch period and to smooth the energy pattern. As a solution to the pitch control, a method using Lagrange interpolation is suggested. As a solution to the other problem, an algorithm which can control the amplitude envelop shape of mono-syllable is proposed. As the results of experiments it was possible to synthesize unlimited Korean speeches including the prosody control. Accoding to the MOS evaluation the quality and the naturality in them was improved to be a good level.

  • PDF

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

Speech Evaluation Tasks Related to Subthalamic Nucleus Deep Brain Stimulation in Idiopathic Parkinson's Disease: A Review (특발성 파킨슨병의 시상밑부핵 심부뇌자극술 관련 말 평가 과제에 대한 문헌연구)

  • Kim, Sun Woo;Kim, Hyang Hee
    • 재활복지
    • /
    • v.18 no.4
    • /
    • pp.237-255
    • /
    • 2014
  • Idiopathic Parkinson disease(IPD) is an neurodegenerative disease caused by the loss of dopamine cells in the substantia nigra, a region of midbrain. Its major symptoms are muscular rigidity, bradykinesia, resting tremor, and postural instability. An estimated 70~90% of patients with IPD also have hypokinetic dysarthria. Subthalamic nucleus deep brain stimulation (STN-DBS) has been reported to be successful in relieving the core motor symptoms of IPD in the advanced stages of the disease. However, data on the effects of STN-DBS on speech performance are inconsistent. A medline literature search was done to retrieve articles published from 1987 to 2012. The results were narrowed down to focus on speech performance under STN-DBS based perceptual, acoustic, and/or aerodynamic analyses. Among the 32 publications which dealt with speech performance after STN-DBS indicated improvement(42%), deterioration(29%), mixed results(26%), or no change(3%). The most favorite method was found to be based upon acoustic analysis by using a vowel prolongation and Unified Parkinson's Disease Rating Scale(UPDRS). For the purpose of verifying the effect of the STN-DBS, speech evaluation should be undertaken on all speech components such as articulation, resonance, phonation, respiration, and prosody by using a contextual speech task.

An analysis of emotional English utterances using the prosodic distance between emotional and neutral utterances (영어 감정발화와 중립발화 간의 운율거리를 이용한 감정발화 분석)

  • Yi, So-Pae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.25-32
    • /
    • 2020
  • An analysis of emotional English utterances with 7 emotions (calm, happy, sad, angry, fearful, disgust, surprised) was conducted using the measurement of prosodic distance between 672 emotional and 48 neutral utterances. Applying the technique proposed in the automatic evaluation model of English pronunciation to the present study on emotional utterances, Euclidean distance measurement of 3 prosodic elements such as F0, intensity and duration extracted from emotional and neutral utterances was utilized. This paper, furthermore, extended the analytical methods to include Euclidean distance normalization, z-score and z-score normalization resulting in 4 groups of measurement schemes (sqrF0, sqrINT, sqrDUR; norsqrF0, norsqrINT, norsqrDUR; sqrzF0, sqrzINT, sqrzDUR; norsqrzF0, norsqrzINT, norsqrzDUR). All of the results from perceptual analysis and acoustical analysis of emotional utteances consistently indicated the greater effectiveness of norsqrF0, norsqrINT and norsqrDUR, among 4 groups of measurement schemes, which normalized the Euclidean measurement. The greatest acoustical change of prosodic information influenced by emotion was shown in the values of F0 followed by duration and intensity in descending order according to the effect size based on the estimation of distance between emotional utterances and neutral counterparts. Tukey Post Hoc test revealed 4 homogeneous subsets (calm