Browse > Article
http://dx.doi.org/10.7776/ASK.2010.29.1.082

Spectrum Based Excitation Extraction for HMM Based Speech Synthesis System  

Lee, Bong-Jin (연세대학교 디지털 신호처리 연구실)
Kim, Seong-Woo (연세대학교 디지털 신호처리 연구실)
Baek, Soon-Ho (연세대학교 디지털 신호처리 연구실)
Kim, Jong-Jin (한국전자통신연구원 음성처리연구팀)
Kang, Hong-Goo (연세대학교 디지털 신호처리 연구실)
Abstract
This paper proposes an efficient method to enhance the quality of synthesized speech in HMM based speech synthesis system. The proposed method trains spectral parameters and excitation signals using Gaussian mixture model, and estimates appropriate excitation signals from spectral parameters during the synthesis stage. Both WB-PESQ and MUSHRA results show that the proposed method provides better speech quality than conventional HMM based speech synthesis system.
Keywords
Speech Synthesis; Gaussian Mixture Model; Excitation Signal;
Citations & Related Records
연도 인용수 순위
  • Reference
1 H, Zen, T, Toda, "An Overview of Nitech HMM-based Speech Synthesis System for Blizzard Challenge 2005," in Proc. INTERSPEECH 2005, pp, 93-96, 2005.
2 H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, "Hidden semi-Markov model based speech synthesis," in Proc. ICSLP, pp. 1185-1180, 2004.
3 K. Tokuda, T. Kobayashi, T. Masuko, S. Imai, "MEL-GENERALIZED CEPSTRAL ANALYSIS - A UNIFIED APPROACH TO SPEECH SPECTRAL ESTIMATION: in Proc. of ICASSP, pp. 1043-1046, 1994.
4 S, Imai, "Cepstral analysis synthesis on the mel frequency scale Acoustics," in Proc. of ICASSP '83., pp. 93-96, 1983.
5 J. S. Garofalo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S, Pallett, N. L. Dahlgren, "The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM," Linguistic Data Consortium, 1993.
6 ITU-R Recommendation BS.1534-1, Method for the Subjective Assessment of Intermediate Sound Quality (MUSHRA), International Telecommunications Union, Geneva, Switzerland, 2001.
7 T. Toda, K. Tokuda, "Speech parameter generation algorithm considering global variance for HMM-based speech synthesis," in Proc, of Interspeech, pp, 801-2804, 2005.
8 T, Kobayashi, T, S. Imai, "Spectral analysis using generalized cepstrum," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp. 1087-1089, 1984.
9 S. Lemmetly, Review of Speech Synthesis Technology, M. S. thesis, Helsinki Univ, Technol., Helsinki, Finland, 1999.
10 T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. of Eurospeech, pp, 2350-2374, 1999.
11 H. W. Strube, "Linear prediction on a warped frequency scale," J. Acoust. Soc. America, vol. 68, no. 4, pp. 1071-1076, 1980.   DOI   ScienceOn
12 ITU-T Q.9/12, Proposed modification to draft P.862 to allow PESQ to be used for quality assessment of wideband speech, 2004.
13 K. Park, H. S. Kim, "Narrowband to wideband conversion of speech using GMM based transformation," in Proc. of ICASSP, pp. 1843-1846, 2000.