Search | Korea Science

Synthesis and Evaluation of Prosodically Exaggerated Utterances

Yoon, Kyu-Chul
- Phonetics and Speech Sciences
- /
- v.1 no.3
- /
- pp.73-85
- /
- 2009
This paper introduces the technique of synthesizing and evaluating human utterances with exaggerated or atypical prosody. Prosody exaggeration can be implemented by manipulating either the fundamental frequency (F0) contour, the segmental durations, or the intensity contour of an utterance. Of these three prosodic elements, two or more can be exaggerated at the same time. The algorithms of synthesis and evaluation were suggested. Learner utterances exaggerated in each of the three prosodic features were evaluated with respect to their original native versions in terms of the differences in their F0 contours, the segmental durations, and the intensity contours. The measure of differences was the Euclidean distance metric between the matching points in their F0 and intensity contours. The measure was calculated after the exaggerated learner utterances were aligned by the segments and rendered identical to their native version in terms of their segmental durations. For the evaluation of the segmental durations, no prior modifications were made in durations and the same measure was used. The results from the pilot experiment suggest the viability of this measure in the evaluation of learner utterances with atypical prosody with respect to their native versions.
PDF

A Neural Network Based Korean Segmental Duration Modeling Using Tonal Information of Phonemes (음소별 성조 정보를 이용한 신경망 기반의 한국어 음소 지속시간 모델링)

김은경;이상호;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.6
- /
- pp.84-88
- /
- 1999
The accurate estimation of segmental duration is crucial for natural-sounding text-to-speech synthesis. For predicting Korean segmental durations, conventional methods utilized phonemic context, part-of-speech context and locational information in prosodic phrase. In this paper, the tonal information of phonemes is employed for more accurate prediction. After defining two non-boundary tones and six boundary tones, we annotated the tonal label on each syllable of 400 sentences. To predict segmental duration using tonal information, we constructed neural networks with a real-valued output node predicting phonemic duration and trained them by backpropagation algorithm. Experimental results showed that the proposed features are effective for predicting Korean segmental durations, and we got 0.863 correlation coefficient of the observed durations and predicted ones.
PDF

SWAPPING NATIVE AND NON-NATIVE SPEAKERS' PROSODY USING THE PSOLA ALGORITHM

Yoon Kyu-Chul
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.77-81
- /
- 2006
This paper presents a technique of imposing the prosodic features of a native speaker's utterance onto the same sentence uttered by a non-native speaker. Three acoustic aspects of the prosodic features were considered: the fundamental frequency (F0) contour, segmental durations, and the intensity contour. The fundamental frequency contour and the segmental durations of the native speaker's utterance were imposed on the non-native speaker's utterance by using the PSOLA (pitch-synchronous overlap and add) algorithm [1] implemented in Praat[2]. The intensity contour transfer was also done in Praat. The technique of transferring one or more of these prosodic features was elaborated and its implications in the area of language education were discussed.
PDF

The Role of Prosody in Dialect Synthesis and Authentication

Yoon, Kyu-Chul
- Phonetics and Speech Sciences
- /
- v.1 no.1
- /
- pp.25-31
- /
- 2009
The purpose of this paper is to examine the viability of synthesizing Masan dialect with Seoul dialect and to examine the role of prosody in the authentication of the synthesized Masan dialect. The synthesis was performed by transferring one or more of the prosodic features of the Masan utterance onto the Seoul utterance. The hypothesis is that, given an utterance composed of the phonemes shared by both dialects, as more prosodic features of the Masan utterance are transferred onto the Seoul utterance, the Seoul utterance will be identified as more authentic Masan utterance. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The synthesized Masan utterances were evaluated by thirteen native speakers of Masan dialect. The result showed that the fundamental frequency contour and the segmental durations had main effects on the perceptual shift from Seoul to Masan dialect.
PDF

Segmental timing of young children and adults

Kim Min-Jung;Carol Stoel-Gammon
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.59-62
- /
- 2006
Young children's speech is compared to adult-to-adult speech and adult-to-child speech by measuring durations and variability of each segment in CVC words. The results demonstrate that child speech exhibits an inconsistent timing relationship between consonants and vowels within a word. In contrast, consonant and vowel durations in adult-to-adult speech and adult-to-child speech exhibit significant relationships across segments, despite variability of segments when speaking rate is decreased. The results suggest that temporal patterns of young children are quite different from those of adults, and provide some evidence for lack of motor control capability and great variance in articulatory coordination.
PDF

Analysis of the Timing of Spoken Korean Using a Classification and Regression Tree (CART) Model

Chung, Hyun-Song;Huckvale, Mark
- Speech Sciences
- /
- v.8 no.1
- /
- pp.77-91
- /
- 2001
This paper investigates the timing of Korean spoken in a news-reading speech style in order to improve the naturalness of durations used in Korean speech synthesis. Each segment in a corpus of 671 read sentences was annotated with 69 segmental and prosodic features so that the measured duration could be correlated with the context in which it occurred. A CART model based on the features showed a correlation coefficient of 0.79 with an RMSE (root mean squared prediction error) of 23 ms between actual and predicted durations in reserved test data. These results are comparable with recent published results in Korean and similar to results found in other languages. An analysis of the classification tree shows that phrasal structure has the greatest effect on the segment duration, followed by syllable structure and the manner features of surrounding segments. The place features of surrounding segments only have small effects. The model has application in Korean speech synthesis systems.
PDF

Language Specific Variations of Domain-initial Strengthening and its Implications on the Phonology-Phonetics Interface: with Particular Reference to English and Hamkyeong Korean

Kim, Sung-A
- Speech Sciences
- /
- v.11 no.3
- /
- pp.7-21
- /
- 2004
The present study aims to investigate domain-initial strengthening phenomenon, which refers to strengthening of articulatory gestures at the initial positions of prosodic domains. More specifically, this paper presents the result of an experimental study of initial syllables with onset consonants (initial-syllable vowels henceforth) of various prosodic domains in English and Hamkyeong Korean, a pitch accent dialect spoken in the northern part of North Korea. The durations of initial-syllable vowels are compared to those of second vowels in real-word tokens for both languages, controlling both stress and segmental environment. Hamkyeong Korean, like English, tuned out to strengthen the domain-initial consonants. With regard to vowel durations, no significant prosodic effect was found in English. On the other hand, Hamkyeong Korean showed significant differences between the durations of initial and non-initial vowels in the higher prosodic domains. The theoretical implications of the findings are as follows: The potentially universal phenomenon of initial strengthening is shown to be subject to language specific variations in its implementation. More importantly, the distinct phonetics- phonology model (Pierrehumbert & Beckman, 1998; Keating, 1990; Cohn, 1993) is better equipped to account for the facts in the present study.
PDF

A Study on Human Evaluators Using the Evaluation Model of English Pronunciation (영어 발음 평가 모델을 활용한 수동 평가자 연구)

Yoon, Kyuchul
- Phonetics and Speech Sciences
- /
- v.5 no.4
- /
- pp.109-119
- /
- 2013
The purpose of this paper is to show the tendency of evaluators in the pronunciation evaluation of English utterances. The tendency was visualized using the evaluation model of English pronunciation proposed in [1]. One hundred fifty female university students and four evaluators participated in the study. Students read eight English sentences aloud as evaluators evaluated English pronunciation by their own criteria. The models based on their pronunciation evaluation proved to be efficient in showing their evaluation tendency in terms of the fundamental frequency, intensity, segmental durations, and segmental spectra as compared to those of the five native speakers of English chosen for building the models. However, human evaluators were not always consistent in their evaluation and sometimes gave conflicting scores to the same students.
https://doi.org/10.13064/KSSS.2013.5.4.109 인용 PDF

The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments

Yoon, Kyu-Chul
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.234-239
- /
- 2007
The purpose of this paper is to examine the viability of simulating one dialect with the speech segments of another dialect through prosody cloning. The hypothesis is that, among Korean regional dialects, it is not the segmental differences but the prosodic differences that play a major role in authentic dialect perception. This work intends to support the hypothesis by simulating Masan dialect with the speech segments from Seoul dialect. The dialect simulation was performed by transplanting the prosodic features of Masan utterances unto the same utterances produced by a Seoul speaker. Thus, the simulated Masan utterances were composed of Seoul speech segments but their prosody came from the original Masan utterances. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The simulated Masan utterances were evaluated by four native Masan speakers and the role of prosody in dialect authentication and speech synthesis was discussed.
PDF

A Study on Automatic Measurement of Pronunciation Accuracy of English Speech Produced by Korean Learners of English (한국인 영어 학습자의 발음 정확성 자동 측정방법에 대한 연구)

Yun, Weon-Hee;Chung, Hyun-Sung;Jang, Tae-Yeoub
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.17-20
- /
- 2005
The purpose of this project is to develop a device that can automatically measure pronunciation of English speech produced by Korean learners of English. Pronunciation proficiency will be measured largely in two areas; suprasegmental and segmental areas. In suprasegmental area, intonation and word stress will be traced and compared with those of native speakers by way of statistical methods using tilt parameters. Durations of phones are also examined to measure speakers' naturalness of their pronunciations. In doing so, statistical duration modelling from a large speech database using CART will be considered. For segmental measurement of pronunciation, acoustic probability of a phone, which is a byproduct when doing the forced alignment, will be a basis of scoring pronunciation accuracy of a phone. The final score will be a feedback to the learners to improve their pronunciation.
PDF

Search Result 19, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)