Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech

음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성

  • 김승원 (포항공과대학교 컴퓨터공학과) ;
  • 정옥 (포항공과대학교 컴퓨터공학과) ;
  • 이근배 (포항공과대학교 컴퓨터공학과) ;
  • 김병창 (대구가톨릭대학교 컴퓨터정보통신공학부)
  • Published : 2005.03.01

Abstract

Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.

Keywords