DOI QR코드

DOI QR Code

Improved Automatic Lipreading by Stochastic Optimization of Hidden Markov Models

은닉 마르코프 모델의 확률적 최적화를 통한 자동 독순의 성능 향상

  • 이종석 (한국과학기술원 전자전산학부) ;
  • 박철훈 (한국과학기술원 전자전산학부)
  • Published : 2007.12.31

Abstract

This paper proposes a new stochastic optimization algorithm for hidden Markov models (HMMs) used as a recognizer of automatic lipreading. The proposed method combines a global stochastic optimization method, the simulated annealing technique, and the local optimization method, which produces fast convergence and good solution quality. We mathematically show that the proposed algorithm converges to the global optimum. Experimental results show that training HMMs by the method yields better lipreading performance compared to the conventional training methods based on local optimization.

본 논문에서는 자동 독순(automatic lipreading)의 인식기로 쓰이는 은닉 마르코프 모델(HMM: hidden Markov model)의 새로운 확률적 최적화 기법을 제안한다. 제안하는 기법은 전역 최적화가 가능한 확률적 기법인 모의 담금질과 지역 최적화 기법을 결합하는 것으로써, 알고리즘의 빠른 수렴과 좋은 해로의 수렴을 가능하게 한다. 제안하는 알고리즘이 전역 최적해로 수렴함을 수학적으로 보인다. 제안하는 기법을 통해 HMM을 학습함으로써 기존의 알고리즘이 지역해만을 찾는 단점을 개선함으로써 향상된 독순 성능을 나타냄을 실험으로 보인다.

Keywords

References

  1. C. C. Chibelushi, F. Deravi, and J. S. D. Mason, 'A review of speech-based bimodal recognition,' IEEE. Trans. Multimedia, Vol. 4, No. 1, pp. 23-27, Mar. 2002 https://doi.org/10.1109/6046.985551
  2. L. Rabiner and B.-H. Juang, 'Fundamentals of Speech Recognition,' Prentice-Hall, Englewood Cliffs, NJ, 1993
  3. S. Kirkpatrick, C. D. Gerlatt, and M. P. Vecchi, 'Optimization by simulated annealing,' Science, Vol. 220, pp. 671-680, May 1983 https://doi.org/10.1126/science.220.4598.671
  4. D. Mitra, F. Romeo, and A. Sangiovanni-Vincentelli, 'Convergence and finite-time behavior of simulated annealing,' Advances in Applied Probability, Vol. 18, pp. 747-771, 1986 https://doi.org/10.2307/1427186
  5. R. L. Yang, 'Convergence of simulated annealing algorithm for continuous global optimization,' J. Optimization Theory and Applications, Vol. 104, No. 3, pp. 691-716, Mar. 2000 https://doi.org/10.1023/A:1004697811243
  6. D. Nam, J.-S. Lee, and C. H. Park, 'n-dimensional Cauchy neighbor generation for the fast simulated annealing,' IEICE Trans. Inf. Syst., Vol. E87-D, No. 11, pp. 2499-2502, Nov. 2004
  7. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, 'Equation of state calculations by fast computing machines,' J. Chem. Phys., Vol. 21, No. 6, pp. 1087-1092, 1953 https://doi.org/10.1063/1.1699114
  8. H. H. Szu and R. L. Hartley, 'Fast simulated annealing,' Phys. Lett. A, Vol. 122, No. 3-4, pp. 157-162, Jun. 1987 https://doi.org/10.1016/0375-9601(87)90796-1
  9. J.-S. Lee and C. H. Park, 'Training hidden Markov models by hybrid simulated annealing for visual speech recognition,' in Proc. IEEE Int. Conf. Systems, Man and Cybernetics, Taipei, Taiwan, pp. 198-202, Oct. 2006 https://doi.org/10.1109/ICSMC.2006.384382
  10. I. Matthews, G. Potamianos, C. Neti, and J. Luettin, 'A comparison of model and transform-based visual features for audio-visual LVCSR,' in Proc. Int. Conf. Multimedia and Expo, Tokyo, Japan, pp. 22-25, Apr. 2001
  11. B.-H. Juang, W. Chou, and C.-H. Lee, 'Minimum classification error rate methods for speech recognition,' IEEE Trans. Speech and Audio Processing, Vol. 5, No. 3, pp. 257-265, May 1997 https://doi.org/10.1109/89.568732
  12. A. Ben-Yishai and D. Burshtein, 'A discriminative training algorithm for hidden Markov models,' IEEE Trans. Speech and Audio Processing, Vol. 12, No. 3, pp. 204-216, May 2004 https://doi.org/10.1109/TSA.2003.822639