Browse > Article

A Study on the Optimization of State Tying Acoustic Models using Mixture Gaussian Clustering  

Ann, Tae-Ock (Division of Computer, Howon Univ.)
Publication Information
Abstract
This paper describes how the state tying model based on the decision tree which is one of Acoustic models used for speech recognition optimizes the model by reducing the number of mixture Gaussians of the output probability distribution. The state tying modeling uses a finite set of questions which is possible to include the phonological knowledge and the likelihood based decision criteria. And the recognition rate can be improved by increasing the number of mixture Gaussians of the output probability distribution. In this paper, we'll reduce the number of mixture Gaussians at the highest point of recognition rate by clustering the Gaussians. Bhattacharyya and Euclidean method will be used for the distance measure needed when clustering. And after calculating the mean and variance between the pair of lowest distance, the new Gaussians are created. The parameters for the new Gaussians are derived from the parameters of the Gaussians from which it is born. Experiments have been performed using the STOCKNAME (1,680) databases. And the test results show that the proposed method using Bhattacharyya distance measure maintains their recognition rate at $97.2\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. And the method using Euclidean distance measure shows that it maintains the recognition rate at $96.9\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. Then the methods can optimize the state tying model.
Keywords
Speech Recognition; Signal Processing; Acoustic Model; State Tying; Clustering;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 A. Karnnan, M. Ostendorf, J.R. Rohlicek, 'Maximum likelihood clustering of Gaussians for speech recognition', Speech and Audio Processing, IEEE Transactions on , Volume: 2 Issue: 3 pp.453-455, Jul. 1994   DOI   ScienceOn
2 J. J. Odell, 'The use of context in large vocabulary speech recognition', PhD's Dissertation. University of Cambridge. 1995
3 K. Fukunaga, 'Introduction to statistical pattern recognition', Morgan Kaufman, San Francisco, p.97-99, 1990
4 오세진, 황철준, 김범국, 정호열, 정현열, '결정트리 상태 클러스터링에 의한 HM-net 구조결정 알고리즘을 이용한 음성인식에 관한 연구', 한국음향학회지 제 21권 제2호, pp. 199-210, 2002   과학기술학회마을
5 J. Takami, S. Sagayama, 'A successive state splitting algorithm for efficient allophone modeling', ICASSP-92., p, 573-576, Mar., 1992   DOI
6 S. J. Young, J. J. Odell, and P. C. Woodland, 'Tree based state tying forhigh accuracy modeling,' in ARPA Workshop Human Language Technology,Princeton, NJ, pp. 286-291, Mar. 1994   DOI
7 S. Takahashi. S. Sagayama, 'Four-level tied-structure for efficient representation of acoustic modeling', ICASSP-95, International Conference on , Vol.: 1 , pp. 520-523, May 1995   DOI
8 J. R. Bellegarda, D. Nahamoo, 'Tied mixture continuous parameter modeling for speech recognition', Acoustics, Speech, and Signal Processing, IEEE Transactions on , Volume: 38 Issue: 12 pp. 2033-2045, Dec. 1990   DOI   ScienceOn
9 W. Reichl, Wu Chou, 'Robust decision tree state tying for continuous speech recognition', Speech and Audio Processing, IEEE Transactions on , Volume: 8 Issue: 5 pp. 555-566, Sep. 2000   DOI   ScienceOn
10 L. R. Rabiner, 'A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,' Pro. IEEE, vol 77, no. 2, pp. 257-286, 1989   DOI   ScienceOn
11 K. F. Lee,'Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition',Acoustics, Speech, and Signal Processing, IEEE Transactions on , Volume: 38 Issue: 4 pp. 599-609, Apr. 1990   DOI   ScienceOn
12 L. R. Rabiner, B.H. Juang, 'Fundamentals of speech recognition', Prentice Hall, New Jersey, chap. 6, 1993
13 S. Young, D. Kershaw, J. Odell, D. Ollason, Valtcher, P. Woodland, 'The HTK Book, Cambridge University Engineering Department, 2002