[KSCI] Korea Science Citation Index Service

Context-adaptive Phoneme Segmentation for a TTS Database

이기승 (건국대학교 정보통신대학 전자공학과)
김정수 (삼성전자㈜ 종합기술원 휴먼-컴퓨터 인터엑티브 연구실)

Publication Information

The Journal of the Acoustical Society of Korea / v.22, no.2, 2003 , pp. 135-144 More about this Journal

Abstract

A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

Keywords

Text-To-Speech synthesis; Automatic phoneme labelling; Multi-layer perceptron; Neural network training algorithm;

Citations & Related Records

Reference

1	A bootstaping training technique for obtaining demisyllable reference patterns / [ L.R.Rabiner;A.E.Rosenberg;J.G.Wilpon;T.M.Zampini ] / Journal of Acoust. Soc. Amer. ScienceOn
2	Nonlinear predictive vector quantization with recurrent neural nets / [ L.Wu;M.Niranjan;F.Fallside ] / Proc. IEEE-SP Workshop on Neural Networks for Signal Processing
3	Reducing audible spectral discontinuities / [ E.Klabbers;R.Veldhuis ] / IEEE Trans. on Speech and Audio Singal Processing ScienceOn
4	Diphone concatenation using a harmonic plus noise model of speech / [ Y.Stylianou;T.Dutoit;J.Schroeter ] / Proc. EUROSPEECH '97
5	Concatenative speech synthesis using units selected from a large speech database / [ A.J.Hunt;A.W.Black ] / Draft paper
6	Automatic segmentation of speech / [ Jan P. van Hermert ] / IEEE Trans. Signal Processing ScienceOn
7	Speech synthesis from text / [ Y.Sagisaka ] / IEEE Communications Magazine ScienceOn
8	The AT＼&T Next-Gen TTS system / [ M.Beutnagel;A.Conkie;J.Schroeter;Y.Stylianou;A.Syrdal ] / Proc. Joint Meeting of ASA, EAA, and DAGA
9	Unit selection in a concatenative speech synthesis system using a large speech database / [ A.J.Hunt;A.W.Black ] / Proc. ICASSP '96
10	Explicit segmentation of speech using gaussian models / [ A.Bonafonte;A.Nogueiras;A.R.Garrido ] / Proc. IEEE Int. Conf. Spoken Language Processing
11	An introduction to computing with neural nets / [ R.Lippmann ] / IEEE ASSP Magazine
12	Neural network boundary refining for automatic speech segmentation / [ D.T.Toledano ] / Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing
13	Spectral stability based event localizing temporal decomposition / [ A.C.R.Nandasena;M.Akagi ] / Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
14	Automatic speech synthesis unit generation with MLP based postprocessor against auto-segemented phoneme errors / [ E.Y.Park;S.H.Kim;J.H.Chung ] / Proc. International Joint Conference on Neural Networks

KSCI

Context-adaptive Phoneme Segmentation for a TTS Database 문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할

Context-adaptive Phoneme Segmentation for a TTS Database