Browse > Article
http://dx.doi.org/10.9708/jksci.2010.15.7.091

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS  

Lee, Jung-Chul (울산대학교 컴퓨터정보통신공학부)
Abstract
Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.
Keywords
TTS; phon clustering; speech synthesis;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 최승호, 엄기완, 강상기, 김진영, "코퍼스 기반 음성합성기의 데이터베이스 축소 방법," 한국음향학회지, 제22권, 제 8호, 703-710쪽, 2003년 11월.   과학기술학회마을
2 P. Tsiakoulis, et al, "A Statistical Method for Database Reduction for Embedded Unit Selection Speech Synthesis," pp. 4601-4604 in Proc. ICASSP, vol. 1, pp. 680-683, Apr. 2009.
3 S.J. Young, "Tree-Based State Tying for High Accuracy Acoustic Modeling," in Proc. ARPA Workshop on Human Language Technology, pp. 307-312, Mar. 1994.
4 A.W. Black and P. Taylor, "Automatically clustering similar units for unit selection in speech synthesis," in Proc. Eurospeech97, vol. 2, pp. 601-604, Sep. 1997.
5 A. Cronk and M. Macon, "Optimized stopping cirteria for tree-based unit selection in concatenative synthesis," in Proc. ICSLP, Vol. 1, pp. 680-683, Nov. 1998.
6 R. Donovan and P. Woodland, "A hidden Markov model based trainable speech synthesizer," Computer Speech and Language, Vol. 13, Issue 3, pp. 223-241, Jul. 1999.   DOI   ScienceOn
7 S.J. Young, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P, "The HTK Book," Entropic Research Laboratories Inc, 1999.
8 여상화, "한영 모바일 번역기를 위한 강건하고 경량화된 한국어 형태소 분석기," 한국컴퓨터정보학회논문지, 제14권, 제 2호, 191-199쪽, 2009년 2월.   과학기술학회마을
9 김상훈, 오승신, 정호영, 전형배, 김정세, "공통음성 DB 구축," 한국음향학회: 02년 춘계 학술대회지, 21-24쪽, 2002년 5월.
10 N. Campbell and A. Black, "Prosody and the selection of source units for concatenative synthesis," in "Progress in speech synthesis", editors: J. van Santen, R. Sproat, J. Olive, and J. Hirschberg, pp.279-282, Springer Verlag, 1996.
11 오영환, "음성합성기술의 현황 및 과제," 대한음성학회 학술대회논문집, 1-16쪽, 2000년 3월.
12 S. Narayanan, A. Alwan, "TEXT TO SPEECH SYNTHESIS New Paradigms and Advances," Prentice Hall, 2005.
13 이현창; 서정만, "문서-음성 변환 임베디드 시스템 구축에 관한 연구," 한국컴퓨터정보학회논문지, Vol. 13, No. 3, 77-83쪽, 2008년 5월.   과학기술학회마을
14 장경애, 정민화, 김재인, 구명완, "코퍼스기반 음성합성기의 데이터베이스 감축 방안," 대한음성학회지: 말소리, 제44호, 145-156쪽, 2002년 12월.