[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7776/ASK.2009.28.6.572

A DB Pruning Method in a Large Corpus-Based TTS with Multiple Candidate Speech Segments

Lee, Jung-Chul (울산대학교 컴퓨터정보통신공학부)
Kang, Tae-Ho (LG데이콤)

Publication Information

The Journal of the Acoustical Society of Korea / v.28, no.6, 2009 , pp. 572-577 More about this Journal

Abstract

Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. To prune the redundant speech segments in a large speech segment DB, we can utilize a decision-tree based triphone clustering algorithm widely used in speech recognition area. But, the conventional methods have problems in representing the acoustic transitional characteristics of the phones and in applying context questions with hierarchic priority. In this paper, we propose a new clustering algorithm to downsize the speech DB. Firstly, three 13th order MFCC vectors from first, medial, and final frame of a phone are combined into a 39 dimensional vector to represent the transitional characteristics of a phone. And then the hierarchically grouped three question sets are used to construct the triphone trees. For the performance test, we used DTW algorithm to calculate the acoustic similarity between the target triphone and the triphone from the tree search result. Experimental results show that the proposed method can reduce the size of speech DB by 23% and select better phones with higher acoustic similarity. Therefore the proposed method can be applied to make a small sized TTS.

Keywords

Phon clustering; TTS;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	오영환, "음성합성기술의 현황 및 과제", 대한음성학회 2000년 3월 학술대회논문집, 1-16쪽, 2000 과학기술학회마을
2	최승호, 엄기완, 강상기, 김진영, "코퍼스 기반 음성합성기의 데이터베이스 축소 방법", 한국음향학회지, 제22권 8호, 703-710쪽, 2003 과학기술학회마을
3	장경애, 정민화, 김재인, 구명완, "코퍼스기반 음성합성기의 데이터베이스 감축 방안", 대한음성학회지. 말소리, 제44호, 145-156쪽, 2002
4	S.J. Young, "Tree-Based State Tying for High Accuracy Acoustic Modeling", in Proceedings ARPA Workshop on Human Language Technology, pp.307-312, 1994 DOI
5	김상훈, 오승신, 정호영, 전형배, 김정세, "공통음성 DB 구축", 2002년 춘계학술대회지, 21권 1(S)호, 21-24쪽, 2002 과학기술학회마을
6	이호영, 국어 음성학, 태학사, 1996
7	A. Cronk and M. Macon, “Optimized stopping cirteria for tree-based unit selection in concatenative synthesis”, in Proc. ICSLP'98, vol. 1, pp. 680-683, Nov. 1998
8	R. Donovan and P. Woodland, "A hidden Markov model based trainable speech Synthesizer," Computer Speech and Language, pp. 223-241, 1999 DOI ScienceOn
9	N. Campbell and A. Black, "Prosody and the selection of source units for concatenative synthesis," in J. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis, pp.279-282, Springer Verlag, 1996
10	김재홍, "고품질 한국어 음성합성 시스템을 위한 합성단위의 선택", 한국음향학회 학술발표대회 논문집 제17권 2호, pp.269-272, 1998 과학기술학회마을
11	W. Black and P. Taylor, “Automatically clustering similar units for unit selection in speech synthesis”, in Proc. Euro-speech'97, vol. 2, pp. 601-604, Sep. 1997
12	S.J. Young, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P, The HTK Book, Entropic Research Labora-tories Inc, 1999

KSCI

A DB Pruning Method in a Large Corpus-Based TTS with Multiple Candidate Speech Segments 대용량 복수후보 TTS 방식에서 합성용 DB의 감량 방법

A DB Pruning Method in a Large Corpus-Based TTS with Multiple Candidate Speech Segments