Browse > Article
http://dx.doi.org/10.7776/ASK.2009.28.6.572

A DB Pruning Method in a Large Corpus-Based TTS with Multiple Candidate Speech Segments  

Lee, Jung-Chul (울산대학교 컴퓨터정보통신공학부)
Kang, Tae-Ho (LG데이콤)
Abstract
Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. To prune the redundant speech segments in a large speech segment DB, we can utilize a decision-tree based triphone clustering algorithm widely used in speech recognition area. But, the conventional methods have problems in representing the acoustic transitional characteristics of the phones and in applying context questions with hierarchic priority. In this paper, we propose a new clustering algorithm to downsize the speech DB. Firstly, three 13th order MFCC vectors from first, medial, and final frame of a phone are combined into a 39 dimensional vector to represent the transitional characteristics of a phone. And then the hierarchically grouped three question sets are used to construct the triphone trees. For the performance test, we used DTW algorithm to calculate the acoustic similarity between the target triphone and the triphone from the tree search result. Experimental results show that the proposed method can reduce the size of speech DB by 23% and select better phones with higher acoustic similarity. Therefore the proposed method can be applied to make a small sized TTS.
Keywords
Phon clustering; TTS;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 오영환, "음성합성기술의 현황 및 과제", 대한음성학회 2000년 3월 학술대회논문집, 1-16쪽, 2000   과학기술학회마을
2 최승호, 엄기완, 강상기, 김진영, "코퍼스 기반 음성합성기의 데이터베이스 축소 방법", 한국음향학회지, 제22권 8호, 703-710쪽, 2003   과학기술학회마을
3 장경애, 정민화, 김재인, 구명완, "코퍼스기반 음성합성기의 데이터베이스 감축 방안", 대한음성학회지. 말소리, 제44호, 145-156쪽, 2002
4 S.J. Young, "Tree-Based State Tying for High Accuracy Acoustic Modeling", in Proceedings ARPA Workshop on Human Language Technology, pp.307-312, 1994   DOI
5 김상훈, 오승신, 정호영, 전형배, 김정세, "공통음성 DB 구축", 2002년 춘계학술대회지, 21권 1(S)호, 21-24쪽, 2002   과학기술학회마을
6 이호영, 국어 음성학, 태학사, 1996
7 A. Cronk and M. Macon, “Optimized stopping cirteria for tree-based unit selection in concatenative synthesis”, in Proc. ICSLP'98, vol. 1, pp. 680-683, Nov. 1998
8 R. Donovan and P. Woodland, "A hidden Markov model based trainable speech Synthesizer," Computer Speech and Language, pp. 223-241, 1999   DOI   ScienceOn
9 N. Campbell and A. Black, "Prosody and the selection of source units for concatenative synthesis," in J. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis, pp.279-282, Springer Verlag, 1996
10 김재홍, "고품질 한국어 음성합성 시스템을 위한 합성단위의 선택", 한국음향학회 학술발표대회 논문집 제17권 2호, pp.269-272, 1998   과학기술학회마을
11 W. Black and P. Taylor, “Automatically clustering similar units for unit selection in speech synthesis”, in Proc. Euro-speech'97, vol. 2, pp. 601-604, Sep. 1997
12 S.J. Young, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P, The HTK Book, Entropic Research Labora-tories Inc, 1999