Browse > Article
http://dx.doi.org/10.7776/ASK.2009.28.2.155

A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System  

Na, Deok-Su (보이스웨어 기술연구소)
Min, So-Yeon (서일대학 정보통신과)
Lee, Jong-Seok (보이스웨어 기술연구소)
Bae, Myung-Jin (숭실대학교 정보통신 전자공학부)
Abstract
In text-to-speech systems, the conversion of text into prosodic parameters is necessarily composed of three steps. These are the placement of prosodic boundaries. the determination of segmental durations, and the specification of fundamental frequency contours. Prosodic boundaries. as the most important and basic parameter. affect the estimation of durations and fundamental frequency. Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries, However. an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally. unit-selection is conducted using multiple prosodic targets. In the MOS test result. the original speech scored a 4,99. while proposed method scored a 4.25 and conventional method scored a 4.01. The experimental results show that the proposed method improves the naturalness of synthesized speech.
Keywords
Text-to-Speech system; Break prediction and variable break;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 R. E. Donovan, "Trainable speech synthesis," PhD. Thesis, Cambridge University Engineering Department, pp. 1-28, 1996
2 X. Sun and T. H. Applebaum, "Intonational Phrase Break Prediction Using Decision Tree and N-Gram Model," Proc. EuROSPEECH2001, vol. 1, pp. 537-540, Sep. 2001
3 K. Maekawa, H. Kikuchi, Y. Igarashi, J. J. Venditti, "X-JToBI: an extended j-toBI for spontaneous speech," Proc. ICSLP-2002, pp. 1545-1548, Sep. 2002
4 나덕수, 민소연, 이광형, 이종석, 배명진, "일본어 악센트 특징을 이용한 합성단위 선택 기반 일본어 TTS의 후보 합성단위의 사전선택 방법," 한국음향학회지, 26권, 4호, 159-165쪽, 2007   과학기술학회마을
5 N. Campbell, "AutoIabeIing Japanese ToBI." Proc. ICSLP'96, vol. 4, pp. 2399-2402, Oct. 1996
6 D. S. Na and M. J. Bae, "A Variable Break Prediction Method using CART in a Japanese Text-to-Speech System," IEICE Trans. Inf. & Syst., vol. E92-D, no. 2, pp. 349-352, 2009   DOI   ScienceOn
7 Technical Standardization Committee on Speech Input/Output Systems, Speech synthesis system performance evaluation methods, JEITA IT-4001, pp. 42-45, 2003
8 이상호, 오영환, "CART를 이용한 운율구 추출 및 휴지기간 모델링," 한국음향학회 학술발표대회 논문집, 17권 1호, 81-86쪽, 1998   과학기술학회마을
9 A. Conkie, M. C. Beutnagel, A. K. Syrdal, P. E. Brown, "Pre-selection of candidate units in a unit selection-based text-to-speech synthesis system," Proc. ICSLP2000, vol. 3, pp. 314-317, Oct. 2000
10 S. Kiriyama, S. Kitazawa, "Evaluation of a prosodic labeling system utilizing linguistic information," Proc. INTERSPEECH 2004, pp. 2993-2996, Oct, 2004
11 J. J. Venditti, "The J_ToBI model of Japanese intonation," in Prosodic Typology: The Phonology of Intonation and Phrasing, ed. S. A., pp. 172-200, Oxford University Press, New York, 2005