Browse > Article

Pruning Methodology for Reducing the Size of Speech DB for Corpus-based TTS Systems  

최승호 (동신대학교 정보통신공학부)
엄기완 (전남대학교 전자공학과)
강상기 (삼성전자)
김진영 (전남대학교 전자공학과)
Abstract
Because of their human-like synthesized speech quality, recently Corpus-Based Text-To-Speech(CB-TTS) have been actively studied worldwide. However, due to their large size speech database (DB), their application is very restricted. In this paper we propose and evaluate three DB reduction algorithms to which are designed to solve the above drawback. The first method is based on a K-means clustering approach, which selects k-representatives among multiple instances. The second method is keeping only those unit instances that are selected during synthesis, using a domain-restricted text as input to the synthesizer. The third method is a kind of hybrid approach of the above two methods and is using a large text as input in the system. After synthesizing the given sentences, the used unit instances and their occurrence information is extracted. As next step a modified K-means clustering is applied, which takes into account also the occurrence information of the selected unit instances, Finally we compare three pruning methods by evaluating the synthesized speech quality for the similar DB reduction rate, Based on perceptual listening tests, we concluded that the last method shows the best performance among three algorithms. More than this, the results show that the last method is able to reduce DB size without speech quality looses.
Keywords
Modified K-means Clustering; TTS; Speech synthesis; Reduction of DB size; Modified K-means clustering; TTS (Text-to-Speech);
Citations & Related Records
연도 인용수 순위
  • Reference
1 박상언, '코퍼스 기반 한국어 음성합성 시스템의 합성음 자연성 향상', 전남대학교 대학원 석사학위논문, 2001
2 S. Nakajima, and H. Hamada, 'Automatic generation of synthesis units based on context oriented clustering.' Proceedings of ICASSP 88, 659-662, 1988
3 A. Black and N. Campbell, 'Optimal selection of units from speech databases for concatenative synthesis,' EUROSPEECH 95, 1, 581-584, Madrid, Spain. 1995
4 A. Hunt and A. Black, 'Unit selection in a concatenative speech synthesis system using a large speech database.' ICASSP 96, 1, 373-376, Atlanta, 1996
5 A. Conkie and S, Isard, 'Optimal coupling of diphones,' in J. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis, 293-305, Springer Verlag, 1996
6 A. W. Black and P. Taylor 'Automatically clustering similar units for unit selection in speech synthesis.' Proc. EUROSPEECH 97, 2, 601-604, Rhodes, Greece, 1997
7 이호영, 국어 음성학, 태학사, 1996
8 N. Campbell and A. Black, 'Prosody and the selection of source units for concatenative synthesis,' in J. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis, 279-282, Springer Verlag, 1996