통합 검색 | Korea Science

음절 단위를 이용한 한국어 음성 합성 (The Korean Text-to-speech Using Syllable Units)

김병수;윤기선;박성한
- 대한전자공학회논문지
- /
- 제27권1호
- /
- pp.143-150
- /
- 1990
In this paper, a rule-based method for improving the intelligibility of synthetic speech is proposed. A 12-pole linear prediction coding method is used to model syllable speech signals. A syllable concatenation rule for pause and frame rejection between syllables is developed to improve the naturalness of the synthetic speech. In addition, phonoligical structure transform rule and prosody rule are applied to the synthetic speech by LPC. The illustrative results demonstrate that the synthetic speech obtained by applying these rules has better naturalness than the synthetic speech by LPC.
PDF

가상현실을 위한 합성얼굴 동영상과 합성음성의 동기구현 (Synchronizationof Synthetic Facial Image Sequences and Synthetic Speech for Virtual Reality)

최장석;이기영
- 전자공학회논문지S
- /
- 제35S권7호
- /
- pp.95-102
- /
- 1998
This paper proposes a synchronization method of synthetic facial iamge sequences and synthetic speech. The LP-PSOLA synthesizes the speech for each demi-syllable. We provide the 3,040 demi-syllables for unlimited synthesis of the Korean speech. For synthesis of the Facial image sequences, the paper defines the total 11 fundermental patterns for the lip shapes of the Korean consonants and vowels. The fundermental lip shapes allow us to pronounce all Korean sentences. Image synthesis method assigns the fundermental lip shapes to the key frames according to the initial, the middle and the final sound of each syllable in korean input text. The method interpolates the naturally changing lip shapes in inbetween frames. The number of the inbetween frames is estimated from the duration time of each syllable of the synthetic speech. The estimation accomplishes synchronization of the facial image sequences and speech. In speech synthesis, disk memory is required to store 3,040 demi-syllable. In synthesis of the facial image sequences, however, the disk memory is required to store only one image, because all frames are synthesized from the neutral face. Above method realizes synchronization of system which can real the Korean sentences with the synthetic speech and the synthetic facial iage sequences.
PDF

Distribution of Korean Syllables by Characters

Lee, Soon-Hyang
- 음성과학
- /
- 제9권1호
- /
- pp.185-192
- /
- 2002
This study classifies Korean syllables into various types and investigates the distribution of syllables by each type. Korean syllables are classified into four or eight types. In this study, they are classified into thirty-two types based on character combination in order to evaluate the intelligibility of Korean synthetic syllables. Among those Korean syllables derived from the possible combinations of Korean characters, only currently used syllables were selected. Based on this classification and distribution, representative and diagnosable testing materials can be made. These testing materials can be applicable to intelligibility tests of Korean synthetic syllables.
PDF

자연스러운 여성 합성음을 위한 한국어의 피치 변화 법칙 (The Rule of Korean Pitch Variation for a Natural Synthetic Female Voice)

김중원;박대덕;김보현;권철홍
- 한국음향학회지
- /
- 제15권6호
- /
- pp.26-32
- /
- 1996
본 논문은 자연스러운 여성 합성음을 위한 피치 변화 법칙을 세웠다. 피치 변화 법칙이 적용되는 기본 단위, 즉 억양구는 주로 어절(들)로 이것의 첫번째, 두번째, 마지막 음절의 피치값을 연결해 피치 변화 곡선을 형성하였는데, 첫번째, 두번째 음절의 피치값은 각 음절의 초성에 따라, 마지막 음절의 피치값은 기능어의 종류에 따라 결정되었다. 억양구 사이에는 '쉼(pause)이 있는 경계' 또는 '쉼이 없는 경계'가 오며, 쉼이 있는 경계에는 relaxation이 있다. 이러한 억양구의 피치 변화 곡선, 경계 현상들이 모여 한 문장의 피치 턴을 만들었다.
PDF

한국어 문장음성합성 시스템의 평가를 위한 다음절 무의미단어의 생성 및 평가에 관한 연구 (GENERATION OF MULTI-SYLLABLE NONSENSE WORDS FOR THE ASSESSMENT OF KOREAN TEXT-TO SPEECH SYSTEM)

조철우
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1994년도 제11회 음성통신 및 신호처리 워크샵 논문집 (SCAS 11권 1호)
- /
- pp.338-341
- /
- 1994
In this paper we propose a method to generate a multisyllable onsense wordest for the purpose of synthetic speech assessment and applies th ewordest to assess one commercial text-to-speech system. Some results about the experiment is suggested and it is verified that the generated nonsense wordset can be used to assess the intelligibility of the synthesizer in phoneme level or in phonemic environmental level. From the experimental results it is verified that such multi-syllable nonsense wordset can be useful for the assessment of synthesized speech.
PDF

한국어 의문사 의문문과 예-아니오 의문문의 의미 구별에 관여하는 음향 자질 (Acoustic Features Determining the Comprehension of Wh and Yes-no Questions in Standard Korean)

민광준
- 음성과학
- /
- 제4권1호
- /
- pp.35-46
- /
- 1998
In this paper production and perception data were examined to discover what acoustic features are used in distinguishing wh-questions and yes/no-questions. Production data show that the two question types are distinguished by different accentual phrasing, pitch ranges in wh-phrases, and initial lenis stop voicing of the first syllable in verb phrases. Perception data by synthetic intonation show that the two question types are distinguished by the width of pitch ranges between the first and the second syllable in wh-phrases. Initial lenis stop voicing of the first syllable in verb phrases produces a strong effect on the perceptual discrimination of the two question types.
PDF

반음절단위를 이용한 한국어 음성합성에 관한 연구 (A Study on the Korean Text-to-Speech Using Demisyllable Units)

윤기선;박성한
- 대한전자공학회논문지
- /
- 제27권10호
- /
- pp.138-145
- /
- 1990
본 논문에서는 합성단위를 반음절로 하여 적은 데이터 베이스를 차지하면서도, 합성음의 자연스러움을 향상 시키기 위한 한국어 규칙 합성법을 제시한다. 반음절 음성신호를 분석하기 위해 12차 선형 예측법을 사용하며, 합성음의 자연성과 명료성을 위해 음절간 접속 규칙, 모음부의 연결규칙을 개발한다. 또한 신경망 모델을 이용한 음운 변동 규칙과 운율규칙을 적용한다.
PDF

SPEECH SYNTHESIS IN THE TIME DOMAIN BY PITCH CONTROL USING LAGRANGE INTERPOLATION(TD-PCULI)

Kang, Chan-Hee;Shin, Yong-Jo;Kim, Yun-Seok-;Kang, Dae-Soo;Lee, Jong-Heon-;Kwon, Ki-Hyung;An, Jeong-Keun;Sea, Sung-Tae;Chin, Yong-Ohk
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1994년도 FIFTH WESTERN PACIFIC REGIONAL ACOUSTICS CONFERENCE SEOUL KOREA
- /
- pp.984-990
- /
- 1994
In this paper a new speech synthesis method in the time domain using mono-syllables is proposed. It is to overcome the degradation of the synthetic speech quality by the synthesis method in the frequency domain and to develop an algorithm in the time domain for the prosodic control. In particular when we use a method in a time domain with mono-syllable as a synthesis unit it will be the main issues which are to control th pitch period and to smooth the energy pattern. As a solution to the pitch control, a method using Lagrange interpolation is suggested. As a solution to the other problem, an algorithm which can control the amplitude envelop shape of mono-syllable is proposed. As the results of experiments it was possible to synthesize unlimited Korean speeches including the prosody control. Accoding to the MOS evaluation the quality and the naturality in them was improved to be a good level.
PDF

EVALUATION OF THE SYNTHETIC SPEECH QUALITY BY THE TD-PCULI METHOD

Kang, Chan-Hee;Shin, Yong-Jo;Kim, Yun-Seok;Kwon, Ki-Hyung;Chin, Yong-Ohk
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1994년도 FIFTH WESTERN PACIFIC REGIONAL ACOUSTICS CONFERENCE SEOUL KOREA
- /
- pp.977-983
- /
- 1994
In this paper we have evaluated the synthetic speech quality by the proposed TD-PCULI speech synthesis method. For the synthesis we have extracted parameters from the Korean monosyllables through the analysis of speech waveforms in the time domain. We have constructed the Korean data format dictionary for the synthesis-by-rule depending upon the frequencies of the Korean pronunciation large vocabulary dictionary, in which V type syllables are 19, CV type's are 80, VC type's are 30 and CVC type's are 100. And using them we have synthesized various Korean monosyllables, words and sentences. We have tested each 10 syllables selected according to the 4 Korean syllable types with the objective MOS(Mean Opinion Score) evluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. And also we have tested the possibility to modify a duration and F0 into another forms with changing a duration (i.e., 150msec, 300msec, 500msec, 700msec and 1sec) and a central fundamental frequency(i.e., 80Hz, 118Hz, 140Hz, 170Hz, and 200Hz). As the results of experiments the noises occurred in the course of synthesizing the speech by the rules are removed to be a very clear level and we can find that the prosodic elements can be controled as a good condition.
PDF

검색결과 9건 처리시간 0.024초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)