DOI QR코드

DOI QR Code

Decision Tree Based Context Clustering with Cross Likelihood Ratio for HMM-based TTS

HMM 기반의 TTS를 위한 상호유사도 비율을 이용한 결정트리 기반의 문맥 군집화

  • Jung, Chi-Sang (School of Electrical and Electronic Engineering, Yonsei University) ;
  • Kang, Hong-Goo (School of Electrical and Electronic Engineering, Yonsei University)
  • Received : 2012.12.03
  • Accepted : 2013.01.27
  • Published : 2013.03.31

Abstract

This paper proposes a decision tree based context clustering algorithm for HMM-based speech synthesis systems using the cross likelihood ratio with a hierarchical prior (CLRHP). Conventional algorithms tie the context-dependent HMM states that have similar statistical characteristics, but they do not consider the statistical similarity of split child nodes, which does not guarantee the statistical difference between the final leaf nodes. The proposed CLRHP algorithm improves the reliability of model parameters by taking a criterion of minimizing the statistical similarity of split child nodes. Experimental results verify the superiority of the proposed approach to conventional ones.

본 논문은 HMM 기반의 TTS 시스템을 위하여 상호유사도 비율을 이용한 결정트리 기반의 문맥 군집화 알고리즘을 제안한다. 기존의 알고리즘들은 유사한 통계적 특성을 가지는 문맥종속 HMM을 하나로 묶고 있다. 그러나 기존의 알고리즘들은 결정트리의 나누어진 노드간의 통계적 유사도를 고려하지 않음으로 인하여 최종 노드 사이의 통계적인 차이를 보장하지 못한다. 제안한 알고리즘은 분리된 노드들 간의 통계적 유사도를 최소화하여 모델 파라미터의 신뢰도를 향상시킨다. 실험 결과를 통해 제안한 알고리즘이 기존의 알고리즘들에 비해 우수한 성능을 나타낸다는 것을 확인할 수 있다.

Keywords

References

  1. K. Tokuda, H. Zen, and A.W. Black, "An HMM-based speech synthesis system applied to English," in Proc. IEEE Speech Synthesis Workshop (2002).
  2. K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," in Proc. Eurospeech, 99-102 (1997).
  3. K. Shinoda and T. Watanabe, "MDL-based context dependent sub-word modeling for speech recognition," (in Japanese), J. Acoust. Soc. Jpn, 21, 79-86, (2000). https://doi.org/10.1250/ast.21.79
  4. T. Shinozaki, "HMM state clustering based on efficient cross-validation," in Proc. ICASSP, 1157-1160 (2006).
  5. Y. Zhang, Z. Yan, and F. Soong, "Cross-validation based decision tree clustering for HMM-based TTS," in Proc. ICASSP, 4602-4605 (2010).
  6. H. Zen, and M. Gales, "Decision tree-based context clustering based on cross validation and hierarchical priors," in Proc. ICASSP, 4560-4563 (2011).
  7. K. Shinoda and C.-H. Lee, "A structural Bayes approach to speaker adaptation," IEEE Trans. SAP, 9, 276-287 (2001).
  8. D. Reynolds, "Speaker identification and verification using Gaussian mixture speaker models," Speech Communication, 17, 91-108 (1995). https://doi.org/10.1016/0167-6393(95)00009-D
  9. K. Tokuda, H. Zen, J. Yamagishi, T. Masuko, S. Sako, A.W. Black, and T. Nose, "The HMM-based speech synthesis system (HTS)," http://hts.ics.nitech/ac.jp.
  10. T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. of Eurospeech, 2347-2350 (1999).
  11. C. Jung, Y. Joo, and H. Kang, "Waveform interpolationbased speech analysis/synthesis for HMM-based TTS systems," IEEE SP Letters, 12, 809-812 (2012).