Browse > Article

Context-adaptive Phoneme Segmentation for a TTS Database  

이기승 (건국대학교 정보통신대학 전자공학과)
김정수 (삼성전자㈜ 종합기술원 휴먼-컴퓨터 인터엑티브 연구실)
Abstract
A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.
Keywords
Text-To-Speech synthesis; Automatic phoneme labelling; Multi-layer perceptron; Neural network training algorithm;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A bootstaping training technique for obtaining demisyllable reference patterns /
[ L.R.Rabiner;A.E.Rosenberg;J.G.Wilpon;T.M.Zampini ] / Journal of Acoust. Soc. Amer.   ScienceOn
2 Nonlinear predictive vector quantization with recurrent neural nets /
[ L.Wu;M.Niranjan;F.Fallside ] / Proc. IEEE-SP Workshop on Neural Networks for Signal Processing
3 Reducing audible spectral discontinuities /
[ E.Klabbers;R.Veldhuis ] / IEEE Trans. on Speech and Audio Singal Processing   ScienceOn
4 Diphone concatenation using a harmonic plus noise model of speech /
[ Y.Stylianou;T.Dutoit;J.Schroeter ] / Proc. EUROSPEECH '97
5 Concatenative speech synthesis using units selected from a large speech database /
[ A.J.Hunt;A.W.Black ] / Draft paper
6 Automatic segmentation of speech /
[ Jan P. van Hermert ] / IEEE Trans. Signal Processing   ScienceOn
7 Speech synthesis from text /
[ Y.Sagisaka ] / IEEE Communications Magazine   ScienceOn
8 The AT\&T Next-Gen TTS system /
[ M.Beutnagel;A.Conkie;J.Schroeter;Y.Stylianou;A.Syrdal ] / Proc. Joint Meeting of ASA, EAA, and DAGA
9 Unit selection in a concatenative speech synthesis system using a large speech database /
[ A.J.Hunt;A.W.Black ] / Proc. ICASSP '96
10 Explicit segmentation of speech using gaussian models /
[ A.Bonafonte;A.Nogueiras;A.R.Garrido ] / Proc. IEEE Int. Conf. Spoken Language Processing
11 An introduction to computing with neural nets /
[ R.Lippmann ] / IEEE ASSP Magazine
12 Neural network boundary refining for automatic speech segmentation /
[ D.T.Toledano ] / Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing
13 Spectral stability based event localizing temporal decomposition /
[ A.C.R.Nandasena;M.Akagi ] / Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
14 Automatic speech synthesis unit generation with MLP based postprocessor against auto-segemented phoneme errors /
[ E.Y.Park;S.H.Kim;J.H.Chung ] / Proc. International Joint Conference on Neural Networks