A Study on the Korean Broadcasting Speech Recognition

한국어 방송 음성 인식에 관한 연구

  • 김석동 (호서대학교 컴퓨터학부) ;
  • 송도선 (우송공업대학 전자정보계열) ;
  • 이행세 (아주대학교 전자공학과)
  • Published : 1999.01.01

Abstract

This paper is a study on the korean broadcasting speech recognition. Here we present the methods for the large vocabuary continuous speech recognition. Our main concerns are the language modeling and the search algorithm. The used acoustic model is the uni-phone semi-continuous hidden markov model and the used linguistic model is the N-gram model. The search algorithm consist of three phases in order to utilize all available acoustic and linguistic information. First, we use the forward Viterbi beam search to find word end frames and to estimate related scores. Second, we use the backword Viterbi beam search to find word begin frames and to estimate related scores. Finally, we use A/sup */ search to combine the above two results with the N-grams language model and to get recognition results. Using these methods maximum 96.0% word recognition rate and 99.2% syllable recognition rate are achieved for the speaker-independent continuous speech recognition problem with about 12,000 vocabulary size.

이 논문은 한국 방송 음성 인식에 관한 연구이다. 여기서 우리는 대규모 어휘를 갖는 연속 음성 인식을 위한 방법을 제시한다. 주요 관점은 언어 모델과 탐색 방법이다. 사용된 음성 모델은 기본음소 Semi-continuous HMM이고 언어 모델은 N-gram 방법이다. 탐색 방법은 음성과 언어 정보를 최대한 활용하기 위해 3단계의 방법을 사용하였다. 첫째로, 단어의 끝 부분과 그에 관련된 정보를 만들기 위한 순방향 Viterbi Beam탐색을 하였으며, 둘째로 단어 의 시작 부분과 그에 관련된 정보를 만드는 역방향 Viterbi Beam탐색, 그리고 마지막으로 이들 두 결과와 확률적인 언어 모델을 결합하여 최종 인식결과를 얻기 위해 A/sup */ 탐색을 한다. 이 방법을 사용하여 12,000개의 단어에 대한 화자 독립으로 최고 96.0%의 단어 인식률과 99.2%의 음절 인식률을 얻었다.

Keywords

References

  1. IEEE In ternational Conference on Acoustics,Speech,and Signal Processing The DARPA 1000-Word R esource Management Database for Continuous Speech Recognition Price,P.;Fisher,W.M.;Bernstein,J.;Pallet,D.S.
  2. DARPA Speech Recognition Workshop Design and prtparation of the 1996 HUB-4 Broadcast News Benchmark Test Corpora. John S.Garofolo,;Jonathan G.Fiscus,;William M.Fisher,
  3. Problem Solving Methods in Articial Intelligence. Nilsson,N.J.
  4. IEEE Transactions on Information Theory v.IT-13 Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. Viterbi,A.J.
  5. Ph.D.thesis,Computer Science Department The Harpy Speech Understanding System. Lowerre,B.
  6. IEEE Trans Speech and Audio Processing v.2 Improvements in Timesynchronous Beam Search for 10000-Word Continuous Speech Recognition. R.Haeb-Umbach,;H.Ney,
  7. IEEE Transactions on Pattern Analysis and Mac hine Intelligence v.PAMI-5 no.2 A Maximum Likelihood Approach to Continuous Speech Recognition. Bahl,L.R.;Jelinek,F.;Mercer,R.
  8. Proceedings of DARPA Speech and Natural Language Workshop An Effcient A Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model. Paul,Douglas B.
  9. In IEEE International Conference on Acoustics,Speech,and Signal Processing The Optimal N-Best Algorithm: An Effcient Proce dure for Finding Multiple Sentence Hypotheses. Schwartz,R.;Chow,Y.L.
  10. Computer Speech and Language v.3 Semi-continuous hidden Markov model for speech signals X.D.Huang,;M.A.Jack,
  11. Computer Speech Language v.8 On Structuring Probabilistic Dependences in Stochastic Language Modelling. H.Ney,;U.Essen,;R.Kneser,
  12. Proc.ICASSP-88 High Performance connected digit recognition using hidden markov models L.R.Rabiner,;J.G.Wilpon,;F.K.Soong,
  13. Proc ICASSP 93 v.2 Trigger-based Language Models: a Maximum Entropy Approach. R.Lau,;R.Rosenfield,;S.Roukos,
  14. IEEE International Conference on Acoustics, Speech, and Signal Processing An Improved Search Algorithm for Continuous Speech Recognition. Alleva,F.;Huang,X.;Hwang,M.
  15. Proc.DARPA Speech Recog.Workshop Test procedure for the March 1987 DARPA benchmark tests D.Pallett,