DOI QR코드

DOI QR Code

음소 결정트리의 노드 분할을 위한 임계치 자동 결정 알고리즘

The Automated Threshold Decision Algorithm for Node Split of Phonetic Decision Tree

  • 투고 : 2012.02.24
  • 심사 : 2012.04.01
  • 발행 : 2012.04.30

초록

본 논문에서는 코레일에서 운영중인 640개 기차역명의 음소기반의 음성인식을 위하여 트라이폰 단위의 음소 결정트리 구축 시 노드 분할 과정에서 사용되는 임계치의 결정에 있어 통계적 기법인 상관관계 분석과 회귀분석을 활용하여 군집화율을 추정하고 이를 이용한 평균 군집화율에 따른 임계치의 값에 의해 자동으로 결정하는 방법을 제안하였다. 제안된 방법의 유효성 검증을 위한 실험에서 기존의 일괄 적용된 Baseline 보다 1.4~2.3 %의 인식률 향상을 보였다.

In the paper, phonetic decision tree of the triphone unit was built for the phoneme-based speech recognition of 640 stations which run by the Korail. The clustering rate was determined by Pearson and Regression analysis to decide threshold used in node splitting. Using the determined the clustering rate, thresholds are automatically decided by the threshold value according to the average clustering rate. In the recognition experiments for verifying the proposed method, the performance improved 1.4~2.3 % absolutely than that of the baseline system.

키워드

참고문헌

  1. B. S. Kim, S. H. Kim, "A Study on Realization of Speech Recognition System based on VoiceXML for Railroad Reservation Service," Jounal of the Korea Society for Railway, vol. 14, no. 2, pp. 130-136, 2011. https://doi.org/10.7782/JKSR.2011.14.2.130
  2. B. S. Kim, S. H. Kim, "A Study on the Speech Recognition for Commands of Ticketing Machine using CHMM," Jounal of the Korea Society for Railway, vol. 12, no. 2, pp. 285-290, 2009.
  3. B. S. Kim, S. H. Kim, "A Study on Speech Recognition based on Phoneme for Korean Subway Station Names," Jounal of the Korea Society for Railway, vol. 14, no. 3, pp. 285-290, 2011. https://doi.org/10.7782/JKSR.2011.14.3.285
  4. A. Lazarides, Y. Normandin, and R. Kuhn, "Improving decision trees for acoustic modeling," in Proc. ICSLP, Philadelphia, October. 1996.
  5. D. B. Paul, "Extensions to phone-state decision-tree clustering: single tree and tagged clustering," in Proc. ICASSP, vol. 2, pp. 1487-490, 1997.
  6. L. Gu, K. Rose, "Sub-state tying in tied mixture hidden Markov models," Proc. IEEE, Acoustics, Speech, and Signal Processing, pp. 1062-1065, 2000.
  7. R. D. R. Fagundes, J. S. Correa, P. Dumouchel, "A New Phonetic model for continuous speech recognition systems," Proc. ICSP, pp. 572-575, 2002.
  8. S. J. Young, J. J. Odell, and P. C. Woodland, "Tree- Based State Tying for Hight Accuracy Accoustic Modelling," in Proceedings of the Workshop on Human Language Technology, Plainsboro, NJ, Mar. 1994.
  9. J. J. Odell, "The Use of Context in Large Vocabulary Speech Recognition," PhD's Dissertation, University of Cambridge, 1995.
  10. T. O. Ann, "A Study on the Optimization of State Tying Acoustic Models using Mixture Gaussian Clustering", Jounal of Electronics Engineers of Korea, vol. 42, no. 6, pp. 167-176, Nov. 2005.
  11. 김성호, 최태성, 사회과학을 위한 통계자료분석 (SPSS 11.0활용), 다산출판사, 2004.
  12. L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
  13. D. Jurafsky and J. H. Martin, Speech and Language Processing, PrenticeHall (2nd), 2008.
  14. S. Young, G. Evermmana, M. Gales, T. Hain, et al, "The HTK Book for HTK Version 3.4," 2006.