Sums-of-Products Models for Korean Segment Duration Prediction

  • Published : 2003.12.01

Abstract

Sums-of-Products models were built for segment duration prediction of spoken Korean. An experiment for the modelling was carried out to apply the results to Korean text-to-speech synthesis systems. 670 read sentences were analyzed. trained and tested for the construction of the duration models. Traditional sequential rule systems were extended to simple additive, multiplicative and additive-multiplicative models based on Sums-of-Products modelling. The parameters used in the modelling include the properties of the target segment and its neighbors and the target segment's position in the prosodic structure. Two optimisation strategies were used: the downhill simplex method and the simulated annealing method. The performance of the models was measured by the correlation coefficient and the root mean squared prediction error (RMSE) between actual and predicted duration in the test data. The best performance was obtained when the data was trained and tested by ' additive-multiplicative models. ' The correlation for the vowel duration prediction was 0.69 and the RMSE. 31.80 ms. while the correlation for the consonant duration prediction was 0.54 and the RMSE. 29.02 ms. The results were not good enough to be applied to the real-time text-to-speech systems. Further investigation of feature interactions is required for the better performance of the Sums-of-Products models.

Keywords