• Title/Summary/Keyword: neutral utterance

Search Result 7, Processing Time 0.018 seconds

A Study on the Declination According to Length of Utterance, Clause Boundary and Focus in Korean (한국어의 발화 길이 및 절 경계와 초점에 의한 점진하강(declination) 연구)

  • Kwak, Sook-Young
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.11-22
    • /
    • 2010
  • The present study attempts to investigate declination in Korean and its relevant aspects to the length of utterance, the clause boundary, and focus. More specifically, I examine the relation of declination with the length of utterance, the declination reset at the clause boundary, and the effect of focus on declination. Results showed that the length of utterance had no relation with the first and last pitch values of the utterance but that they were consistent regardless of the length of utterance. However, the declination slope changed to be relatively gentle from the fourth accentual phrase to the end of the whole intonational phrase. There was a reset of declination in such a way that the first pitch in the second phrase was always lower than that of the first phrase, but the first pitch in the third phrase was not always lower than that of the second phrase when the whole utterance was composed of three phrases. Finally, the pitch values of the focusing words decreased as their position went back in a sentence. One declination line was formed in the case of focused utterance, but in the case of an utterance that contained a clause boundary, a new declination line was formed at the start of each new clause. These findings can be applied to developing a Korean speech synthesizer that contains natural prosody; they can be also utilized for teaching Korean prosody.

  • PDF

Prosodic Characteristics of Politeness in Korean (한국어에서의 공손함을 나타내는 운율적 특성에 관한 연구)

  • Ko Hyun-ju;Kim Sang-Hun;Kim Jong-Jin
    • MALSORI
    • /
    • no.45
    • /
    • pp.15-22
    • /
    • 2003
  • This study is a kind of a preliminary study to develop naturalness of dialog TTS system. In this study, as major characteristics of politeness in Korean, temporal(total duration of utterances, speech rate and duration of utterance final syllables) and F0(mean F0, boundary tone pattern, F0 range) features were discussed through acoustic analysis of recorded data of semantically neutral sentences, which were spoken by ten professional voice actors under two conditions of utterance type - namely, normal and polite type. The results show that temporal characteristics were significantly different according to the utterance type but F0 characteristics were not.

  • PDF

Statistical Speech Feature Selection for Emotion Recognition

  • Kwon Oh-Wook;Chan Kwokleung;Lee Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4E
    • /
    • pp.144-151
    • /
    • 2005
  • We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.

Speech Emotion Recognition on a Simulated Intelligent Robot (모의 지능로봇에서의 음성 감정인식)

  • Jang Kwang-Dong;Kim Nam;Kwon Oh-Wook
    • MALSORI
    • /
    • no.56
    • /
    • pp.173-183
    • /
    • 2005
  • We propose a speech emotion recognition method for affective human-robot interface. In the Proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes Pitch, jitter, duration, and rate of speech. Finally a pattern classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5 different directions. Experimental results show that the proposed method yields $48\%$ classification accuracy while human classifiers give $71\%$ accuracy.

  • PDF

Speech Emotion Recognition by Speech Signals on a Simulated Intelligent Robot (모의 지능로봇에서 음성신호에 의한 감정인식)

  • Jang, Kwang-Dong;Kwon, Oh-Wook
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.163-166
    • /
    • 2005
  • We propose a speech emotion recognition method for natural human-robot interface. In the proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes pitch, jitter, duration, and rate of speech. Finally a patten classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5different directions. Experimental results show that the proposed method yields 59% classification accuracy while human classifiers give about 50%accuracy, which confirms that the proposed method achieves performance comparable to a human.

  • PDF

An analysis of emotional English utterances using the prosodic distance between emotional and neutral utterances (영어 감정발화와 중립발화 간의 운율거리를 이용한 감정발화 분석)

  • Yi, So-Pae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.25-32
    • /
    • 2020
  • An analysis of emotional English utterances with 7 emotions (calm, happy, sad, angry, fearful, disgust, surprised) was conducted using the measurement of prosodic distance between 672 emotional and 48 neutral utterances. Applying the technique proposed in the automatic evaluation model of English pronunciation to the present study on emotional utterances, Euclidean distance measurement of 3 prosodic elements such as F0, intensity and duration extracted from emotional and neutral utterances was utilized. This paper, furthermore, extended the analytical methods to include Euclidean distance normalization, z-score and z-score normalization resulting in 4 groups of measurement schemes (sqrF0, sqrINT, sqrDUR; norsqrF0, norsqrINT, norsqrDUR; sqrzF0, sqrzINT, sqrzDUR; norsqrzF0, norsqrzINT, norsqrzDUR). All of the results from perceptual analysis and acoustical analysis of emotional utteances consistently indicated the greater effectiveness of norsqrF0, norsqrINT and norsqrDUR, among 4 groups of measurement schemes, which normalized the Euclidean measurement. The greatest acoustical change of prosodic information influenced by emotion was shown in the values of F0 followed by duration and intensity in descending order according to the effect size based on the estimation of distance between emotional utterances and neutral counterparts. Tukey Post Hoc test revealed 4 homogeneous subsets (calm

Voicing and Tone Correlation in L2 English

  • Kim, Mi-Ryoung
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.113-128
    • /
    • 2005
  • The underlying premise of this study was that L1 production is easily transferred into L2 production. In neutral intonation, there is a consonant-tone correlation in Korean: High tone patterns are correlated with voiceless aspirated and tense consonants and Low-High tone patterns are correlated with lax or other voiced consonants. The purpose of this study was to see whether the correlation in Korean (L1) is transferred into English (L2) production and whether the degree of transfer differs depending on the degree of proficiency. Eight Korean speakers and two American speakers participated in the experiment. F0 contours of words and sentences were collected and analyzed. The results of the present study showed that there is a strong correlation between voicing and tone in L2 utterances. When utterance-initial consonant types were voiceless, the word or the sentence began with the H pattern; otherwise it had the LH pattern. The degree of interference differed depending on the degree of proficiency: less proficient speakers showed a stronger correlation in terms of the magnitude (Hz) and size (ms) of the effects on F0. The results indicate that the consonant-tone correlation in L1 is strongly transferred into L2 production and the correlation transfer can be one of the actual aspects that cause L2 speakers to produce deviant L2 accents and intonation.

  • PDF