Search | Korea Science

The Korean Text-to-speech Using Syllable Units (음절 단위를 이용한 한국어 음성 합성)

김병수;윤기선;박성한
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.27 no.1
- /
- pp.143-150
- /
- 1990
In this paper, a rule-based method for improving the intelligibility of synthetic speech is proposed. A 12-pole linear prediction coding method is used to model syllable speech signals. A syllable concatenation rule for pause and frame rejection between syllables is developed to improve the naturalness of the synthetic speech. In addition, phonoligical structure transform rule and prosody rule are applied to the synthetic speech by LPC. The illustrative results demonstrate that the synthetic speech obtained by applying these rules has better naturalness than the synthetic speech by LPC.
PDF

Synchronizationof Synthetic Facial Image Sequences and Synthetic Speech for Virtual Reality (가상현실을 위한 합성얼굴 동영상과 합성음성의 동기구현)

최장석;이기영
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.7
- /
- pp.95-102
- /
- 1998
This paper proposes a synchronization method of synthetic facial iamge sequences and synthetic speech. The LP-PSOLA synthesizes the speech for each demi-syllable. We provide the 3,040 demi-syllables for unlimited synthesis of the Korean speech. For synthesis of the Facial image sequences, the paper defines the total 11 fundermental patterns for the lip shapes of the Korean consonants and vowels. The fundermental lip shapes allow us to pronounce all Korean sentences. Image synthesis method assigns the fundermental lip shapes to the key frames according to the initial, the middle and the final sound of each syllable in korean input text. The method interpolates the naturally changing lip shapes in inbetween frames. The number of the inbetween frames is estimated from the duration time of each syllable of the synthetic speech. The estimation accomplishes synchronization of the facial image sequences and speech. In speech synthesis, disk memory is required to store 3,040 demi-syllable. In synthesis of the facial image sequences, however, the disk memory is required to store only one image, because all frames are synthesized from the neutral face. Above method realizes synchronization of system which can real the Korean sentences with the synthetic speech and the synthetic facial iage sequences.
PDF

Distribution of Korean Syllables by Characters

Lee, Soon-Hyang
- Speech Sciences
- /
- v.9 no.1
- /
- pp.185-192
- /
- 2002
This study classifies Korean syllables into various types and investigates the distribution of syllables by each type. Korean syllables are classified into four or eight types. In this study, they are classified into thirty-two types based on character combination in order to evaluate the intelligibility of Korean synthetic syllables. Among those Korean syllables derived from the possible combinations of Korean characters, only currently used syllables were selected. Based on this classification and distribution, representative and diagnosable testing materials can be made. These testing materials can be applicable to intelligibility tests of Korean synthetic syllables.
PDF

The Rule of Korean Pitch Variation for a Natural Synthetic Female Voice (자연스러운 여성 합성음을 위한 한국어의 피치 변화 법칙)

Kim, Chung-Won;Park, Dae-Duck;Kim, Boh-Hyun;Kwon, Cheol-Hong
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.6
- /
- pp.26-32
- /
- 1996
In this paper we make a rule of pitch variation for a natural synthetic female voice. Intonation phrase, which is the basic unit the rule is applied to, mostly consists of a syllable or syllables. The pitch values of the first, second, and final syllables make up the pitch contour of the intonation phrase. Those of the first and second syllable are determined by the initial consonants of the respective syllables, and that of the final syllable by the type of the function word. There are two kinds of boundaries between intonation phrases. One is a boundary with pause, and the other is a boundary without pause. The pitch contour of the intonation phrase with the boundary phenomena determines the pitch pattern of a sentence.
PDF

GENERATION OF MULTI-SYLLABLE NONSENSE WORDS FOR THE ASSESSMENT OF KOREAN TEXT-TO SPEECH SYSTEM (한국어 문장음성합성 시스템의 평가를 위한 다음절 무의미단어의 생성 및 평가에 관한 연구)

조철우
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06c
- /
- pp.338-341
- /
- 1994
In this paper we propose a method to generate a multisyllable onsense wordest for the purpose of synthetic speech assessment and applies th ewordest to assess one commercial text-to-speech system. Some results about the experiment is suggested and it is verified that the generated nonsense wordset can be used to assess the intelligibility of the synthesizer in phoneme level or in phonemic environmental level. From the experimental results it is verified that such multi-syllable nonsense wordset can be useful for the assessment of synthesized speech.
PDF

Acoustic Features Determining the Comprehension of Wh and Yes-no Questions in Standard Korean (한국어 의문사 의문문과 예-아니오 의문문의 의미 구별에 관여하는 음향 자질)

Min, Kwang-Joon
- Speech Sciences
- /
- v.4 no.1
- /
- pp.35-46
- /
- 1998
In this paper production and perception data were examined to discover what acoustic features are used in distinguishing wh-questions and yes/no-questions. Production data show that the two question types are distinguished by different accentual phrasing, pitch ranges in wh-phrases, and initial lenis stop voicing of the first syllable in verb phrases. Perception data by synthetic intonation show that the two question types are distinguished by the width of pitch ranges between the first and the second syllable in wh-phrases. Initial lenis stop voicing of the first syllable in verb phrases produces a strong effect on the perceptual discrimination of the two question types.
PDF

A Study on the Korean Text-to-Speech Using Demisyllable Units (반음절단위를 이용한 한국어 음성합성에 관한 연구)

Yun, Gi-Sun;Park, Sung-Han
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.27 no.10
- /
- pp.138-145
- /
- 1990
This paper present a rule-based speech synthesis method for improving the naturalness of synthetic speech and using the small data base based on demisyllable units. A 12-pole Linear Prediction Coding method is used to analyses demisyllable speech signals. A syllable and vowel concatenation rule is developed to improve the naturalness and intelligibility of the synthetic speech. in addiion, phonological structure transform rule using neural net and prosody rules are applied to the synthetic speech.
PDF

SPEECH SYNTHESIS IN THE TIME DOMAIN BY PITCH CONTROL USING LAGRANGE INTERPOLATION(TD-PCULI)

Kang, Chan-Hee;Shin, Yong-Jo;Kim, Yun-Seok-;Kang, Dae-Soo;Lee, Jong-Heon-;Kwon, Ki-Hyung;An, Jeong-Keun;Sea, Sung-Tae;Chin, Yong-Ohk
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.984-990
- /
- 1994
In this paper a new speech synthesis method in the time domain using mono-syllables is proposed. It is to overcome the degradation of the synthetic speech quality by the synthesis method in the frequency domain and to develop an algorithm in the time domain for the prosodic control. In particular when we use a method in a time domain with mono-syllable as a synthesis unit it will be the main issues which are to control th pitch period and to smooth the energy pattern. As a solution to the pitch control, a method using Lagrange interpolation is suggested. As a solution to the other problem, an algorithm which can control the amplitude envelop shape of mono-syllable is proposed. As the results of experiments it was possible to synthesize unlimited Korean speeches including the prosody control. Accoding to the MOS evaluation the quality and the naturality in them was improved to be a good level.
PDF

EVALUATION OF THE SYNTHETIC SPEECH QUALITY BY THE TD-PCULI METHOD

Kang, Chan-Hee;Shin, Yong-Jo;Kim, Yun-Seok;Kwon, Ki-Hyung;Chin, Yong-Ohk
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.977-983
- /
- 1994
In this paper we have evaluated the synthetic speech quality by the proposed TD-PCULI speech synthesis method. For the synthesis we have extracted parameters from the Korean monosyllables through the analysis of speech waveforms in the time domain. We have constructed the Korean data format dictionary for the synthesis-by-rule depending upon the frequencies of the Korean pronunciation large vocabulary dictionary, in which V type syllables are 19, CV type's are 80, VC type's are 30 and CVC type's are 100. And using them we have synthesized various Korean monosyllables, words and sentences. We have tested each 10 syllables selected according to the 4 Korean syllable types with the objective MOS(Mean Opinion Score) evluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. And also we have tested the possibility to modify a duration and F0 into another forms with changing a duration (i.e., 150msec, 300msec, 500msec, 700msec and 1sec) and a central fundamental frequency(i.e., 80Hz, 118Hz, 140Hz, 170Hz, and 200Hz). As the results of experiments the noises occurred in the course of synthesizing the speech by the rules are removed to be a very clear level and we can find that the prosodic elements can be controled as a good condition.
PDF

Search Result 9, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)