• Title/Summary/Keyword: Break indices (BI)

Search Result 16, Processing Time 0.021 seconds

Break Predicting Methods Using Phonetic Symbols Combined with Accents Information in a Japanese Speech Synthesizer (일본어 합성기에서 악센트 정보가 결합된 발음기호를 이용한 Break 예측 방법)

  • Na, Deok-Su;Lee, Jong-Seok;Kim, Jong-Kuk;Bae, Myung-Jin
    • MALSORI
    • /
    • no.62
    • /
    • pp.69-84
    • /
    • 2007
  • Japanese is a language having intonations, which are indicated by the relative differences in pitch heights and the accentual phrases (APs) are placed according to the changes of the accents while a break occurs on a boundary of the APs. Although a break can be predicted by using J-ToBI, which is a rule-based or statistical approach, it is very difficult to predict a break exactly due to the flexibility. Therefore, in this paper, a method which can enhance the quality of synthesized speech by reducing the errors in predicting break indices (BI), are proposed. The method is to use a new definition for the phonetic symbols, which combine the phonetic values of Japanese words with the accents information. Since a stream of defined phonetic symbols includes the information on the changes in intonations, the BI can be easily predicted by dividing the intonation phrase (IP) into several APs. As a result of an experiment, the accuracy of break generations was 98 % and the proposed method contributed itself to enhance the naturalness of synthesized speeches.

  • PDF

K-ToBI (Korean ToBI) Labelling Conventions (Version 3.0)

  • Juo, Suo-Ah
    • Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.143-169
    • /
    • 2000
  • This chapter presents an overview of Korean intonational structure and proposes a revised version of K -ToBI (Korean TOnes and Break Indices), a prosodic transcription convention for Seoul Korean. In the new version of K-ToBI, a tone tier is separated into two tiers: a phonological tone tier and a phonetic tone tier. A phonological tone tier labels tones marking the prosodic structure of an utterance, and a phonetic tone tier labels individual tones of an AP and an IP conforming to the surface pitch contour. Labelling surface tonal patterns will provide us data to test the underlying tonal patterns and to build phonetic implementation rules.

  • PDF

ToBI Based Prosodic Representation of the Kyungnam Dialect of Korean

  • Cho, Yong-Hyung
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.159-172
    • /
    • 1997
  • This paper proposes a prosodic representation system of the Kyungnam dialect of Korean, based on the ToBI system. In this system, diverse intonation patterns are transcribed on the four parallel tiers: a tone tier, a break index tier, an orthographic tier, and a miscellaneous tier. The tone tier employs pitch accents, phrase accents, and boundary tones marked with diacritics in order to represent various pitch events. The break index tier uses five break indices, numbered from 0 to 4, in order to represent degrees of connectiveness in speech by associating each inter-word position with a break index. In this, each break index represents a boundary of some kind of constituent. This system can contribute not only to a more detailed theory connecting prosody, syntax, and intonation, but also to current text-to-speech synthesis approaches, speech recognition, and other quantitative computational modellings.

  • PDF

A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System (가변 Break를 이용한 코퍼스 기반 일본어 음성 합성기의 성능 향상 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.155-163
    • /
    • 2009
  • In text-to-speech systems, the conversion of text into prosodic parameters is necessarily composed of three steps. These are the placement of prosodic boundaries. the determination of segmental durations, and the specification of fundamental frequency contours. Prosodic boundaries. as the most important and basic parameter. affect the estimation of durations and fundamental frequency. Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries, However. an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally. unit-selection is conducted using multiple prosodic targets. In the MOS test result. the original speech scored a 4,99. while proposed method scored a 4.25 and conventional method scored a 4.01. The experimental results show that the proposed method improves the naturalness of synthesized speech.

A Prosodic Labeling System of Intonation Patterns and Prosodic Structures in Korean

  • Cho, Yong-Hyung
    • Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.113-133
    • /
    • 1998
  • The system proposed in this paper prosodically transcribes the intonation patterns, prosodic structures, phrasings, and other prosodic aspects of Korean utterances, on four parallel tiers: a tone tier, an orthographic tier, a break index tier, and a miscellaneous tier. The tone tier employs two phrase accents (L* and H *), three accentual phrase boundary tones (L-, H-, LH-), and four intonational phrase boundary tones (L%,H%,LH%,LHL%) in order to provide a phonological transcription of pitch events associated with accented syllables and phrase boundaries. The break index tier uses five break indices, numbered from 0 to 4, which mark a prosodic grouping of words and its prosodic structure in an utterance. Among the five indices, the break index 3 and the break index 4 align with an accentual phrase boundary tone and an intonational phrase boundary tone, respectively, in the tone tier.

  • PDF

Prediction of Break Indices in Korean Read Speech (국어 낭독체 발화의 운율경계 예측)

  • Kim Hyo Sook;Kim Chung Won;Kim Sun Ju;Kim Seoncheol;Kim Sam Jin;Kwon Chul Hong
    • MALSORI
    • /
    • no.43
    • /
    • pp.1-9
    • /
    • 2002
  • This study aims to model Korean prosodic phrasing using CART(classification and regression tree) method. Our data are limited to Korean read speech. We used 400 sentences made up of editorials, essays, novels and news scripts. Professional radio actress read 400sentences for about two hours. We used K-ToBI transcription system. For technical reason, original break indices 1,2 are merged into AP. Differ from original K-ToBI, we have three break index Zero, AP and IP. Linguistic information selected for this study is as follows: the number of syllables in ‘Eojeol’, the location of ‘Eojeol’ in sentence and part-of-speech(POS) of adjacent ‘Eojeol’s. We trained CART tree using above information as variables. Average accuracy of predicting NonIP(Zero and AP) and IP was 90.4% in training data and 88.5% in test data. Average prediction accuracy of Zero and AP was 79.7% in training data and 78.7% in test data.

  • PDF

A Comparative Study on Intonation between Korean, French and English: a ToBI approach

  • Lee, Jung-Won
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.89-110
    • /
    • 2002
  • Intonation is very difficult to describe and it is furthermore difficult to compare intonation between different languages because of their differences of intonation systems. This paper aims to compare some intonation phenomena between Korean, French and English. In this paper I will refer to ToBI (the Tone and Break Indices) which is a prosodic transcription model proposed originally by Pierrehumbert (1980) as a description tool. In the first part, I will summarize different ToBI systems, namely, K-ToBI (Korean ToBI), F-ToBI (French ToBI) and ToBI itself (English ToBI) in order to compare the differences of three languages within prosody. In the second part, I will analyze some tokens registered by Korean, French and American in different languages to show the difficulties of learning other languages and to find the prosodic cues to pronounce correctly other languages. The point of comparison in this study is the Accentual Phrase (AP) in Korean and in French and the intermediate phrase (ip) in English, which I will call ' subject phrase ' in this study for convenience.

  • PDF

A Unit Selection Methods using Flexible Break in a Japanese TTS (일본어 합성기에서 유동 Break를 이용한 합성단위 선택 방법)

  • Song, Young-Hwan;Na, Deok-Su;Kim, Jong-Kuk;Bae, Myung-Jin;Lee, Jong-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.403-408
    • /
    • 2007
  • In a large corpus-based speech synthesizer, a break, which is a parameter influencing the naturalness and intelligibility, is used as an important feature during a unit selection process. Japanese is a language having intonations, which ate indicated by the relative differences in pitch heights and the APs(Accentual Phrases) are placed according to the changes of the accents while a break occurs on a boundary of the APs. Although a break can be predicted by using J-ToBI(Japanese-Tones and Break Indices), which is a rule-based or statistical approach, it is very difficult to predict a break exactly due to the flexibility. Therefore, in this paper, a method is to conduct a unit search by dividing breaks into two types, such as a fixed break and a flexible break, in order to use the advantages of a large-scale corpus, which includes various types of prosodies. As a result of an experiment, the proposed unit selection method contributed itself to enhance the naturalness of synthesized speeches.

Annotation of a Non-native English Speech Database by Korean Speakers

  • Kim, Jong-Mi
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.111-135
    • /
    • 2002
  • An annotation model of a non-native speech database has been devised, wherein English is the target language and Korean is the native language. The proposed annotation model features overt transcription of predictable linguistic information in native speech by the dictionary entry and several predefined types of error specification found in native language transfer. The proposed model is, in that sense, different from other previously explored annotation models in the literature, most of which are based on native speech. The validity of the newly proposed model is revealed in its consistent annotation of 1) salient linguistic features of English, 2) contrastive linguistic features of English and Korean, 3) actual errors reported in the literature, and 4) the newly collected data in this study. The annotation method in this model adopts the widely accepted conventions, Speech Assessment Methods Phonetic Alphabet (SAMPA) and the TOnes and Break Indices (ToBI). In the proposed annotation model, SAMPA is exclusively employed for segmental transcription and ToBI for prosodic transcription. The annotation of non-native speech is used to assess speaking ability for English as Foreign Language (EFL) learners.

  • PDF

ToBI and beyond: Phonetic intonation of Seoul Korean ani in Korean Intonation Corpus (KICo)

  • Ji-eun Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • This study investigated the variation in the intonation of Seoul Korean interjection ani across different meanings ("no" and "really?") and speech levels (Intimate and Polite) using data from Korean Intonation Corpus (KICo). The investigation was conducted in two stages. First, IP-final tones in the dataset were categorized according to the K-ToBI convention (Jun, 2000). While significant relationships were observed between the meaning of ani and its IP-final tones, substantial overlap between groups was notable. Second, the F0 characteristics of the final syllable of ani were analyzed to elucidate the apparent many-to-many relationships between intonation and meaning/speech level. Results indicated that these seemingly overlapping relationships could be significantly distinguished. Overall, this study advocates for a deeper analysis of phonetic intonation beyond ToBI-based categorical labels. By examining the F0 characteristics of the IP-final syllable, previously unclear connections between meaning/speech level and intonation become more comprehensible. Although ToBI remains a valuable tool and framework for studying intonation, it is imperative to explore beyond these categories to grasp the "distinctiveness" of intonation, thereby enriching our understanding of prosody.