DOI QR코드

DOI QR Code

Harmonic Peak Picking-based MVF Estimation for Improvement of HMM-based Speech Synthesis System Using TBE Model

TBE 모델을 사용하는 HMM 기반 음성합성기 성능 향상을 위한 하모닉 선택에 기반한 MVF 예측 방법

  • Received : 2012.07.26
  • Accepted : 2012.12.12
  • Published : 2012.12.31

Abstract

In the two-band excitation (TBE) model, maximum voiced frequency (MVF) is the most important feature of the excitation parameter because the synthetic speech quality depends on MVF. Thus, this paper proposes an enhanced MVF estimation scheme based on the peak picking method. In the proposed scheme, the local peak and the peak lobe are picked from the spectrum of a linear predictive residual signal. The normalized distance between neighboring peak lobes is calculated and utilized as a feature to estimate MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves synthetic speech quality compared with that of the conventional one.

Keywords

References

  1. Hunt, A. & Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database, Proc. IEEE ICASSP. Vol. 1, 959-962.
  2. Tokuda, K., Kobayasho, T. & Imai, S. (1995). Speech parameter generation form HMM using dynamic features, Proc. IEEE ICASSP. Vol. 1, 660-663.
  3. Tokuda, K., Masuko, T., Yamada, T., Kobayashi, T. & Imai, S. (1995). An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features, Proc. Eurospeech. Vol. 1, 757-760.
  4. Tokuda, K., Zen, H. & Black, A. W. (2002). An HMM-based speech synthesis system applied to English, Proc. IEEE Workshop on Speech Synthesis..227-230.
  5. Fukada, T., Tokuda, K., Kobayashi, T. & Imai, S.. (1992). An adaptive algorithm for mel-cepstral analysis of speech, Proc. ICASSP. Vol. 1, 137-140.
  6. Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T. & Kitamura, T. (2000). Speech parameter generation algorithm for HMM-based speech synthesis, Proc. ICASSP. Vol. 1, 1315-1318.
  7. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T. & Kitamura, T. (2001). Mixed excitation for HMM-based speech synthesis, Proc. Eurospeech. Vol. 3, 2263-2266.
  8. Kim, S., Kim, J. & Hahn, M. (2006). HMM-based Korean speech synthesis system for hand-held devices, IEEE Trans. Consumer Electronics. Vol. 52, No. 4, 1384-1390. https://doi.org/10.1109/TCE.2006.273160
  9. Kim, S., Kim, J. & Hahn, M. (2006). Implementation and evaluation of an HMM-based Korean speech synthesis system, IEICE Transactions on Information and Systems. Vol. E89-D, No.3, 1116-1119. https://doi.org/10.1093/ietisy/e89-d.3.1116
  10. Kim, S., Kim, J. & Hahn, M. (2007). Two-band excitation for HMM-based speech synthesis, IEICE Trans. Information and Systems. Vol. E90-D, No 1, 378-381. https://doi.org/10.1093/ietisy/e90-1.1.378
  11. Han, S., Jeong, S. & Hahn, M. (2009) Optimum MVF estimation-based two-band excitation for HMM-based speech synthesis, ETRI Journal, Vol. 31, No. 4, 457-459. https://doi.org/10.4218/etrij.09.0209.0112
  12. Zen, H., Toda, T., Nakamura, M. & Tokuda, K. (2007) Details of Nitech. HMM-based speech synthesis system for the Blizzard Challenge. 2005, IEICE Trans. Information and Systems, Vol. E90-D, 325-333. https://doi.org/10.1093/ietisy/e90-1.1.325
  13. Klabbers, E. & Veldhuis, R. (2001). Reducing audible spectral discontinuities, IEEE Trans. Speech and Audio Proc., Vol. 9, No. 1, 39-51. https://doi.org/10.1109/89.890070
  14. Huang, X., Acero, A. & Hon, H.-W. (2001). Spoken language processing: a guide to theory, algorithm, and system development, NY: Prentice Hall.