Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2004.11B.2.241

Language Model based on VCCV and Test of Smoothing Techniques for Sentence Speech Recognition  

Park, Seon-Hee (성균관대학교 대학원 정보통신공학부)
Roh, Yong-Wan (성균관대학교 대학원 정보통신공학부)
Hong, Kwang-Seok (성균관대학교 정보통신공학부)
Abstract
In this paper, we propose VCCV units as a processing unit of language model and compare them with clauses and morphemes of existing processing units. Clauses and morphemes have many vocabulary and high perplexity. But VCCV units have low perplexity because of the small lexicon and the limited vocabulary. The construction of language models needs an issue of the smoothing. The smoothing technique used to better estimate probabilities when there is an insufficient data to estimate probabilities accurately. This paper made a language model of morphemes, clauses and VCCV units and calculated their perplexity. The perplexity of VCCV units is lower than morphemes and clauses units. We constructed the N-grams of VCCV units with low perplexity and tested the language model using Katz, absolute, modified Kneser-Ney smoothing and so on. In the experiment results, the modified Kneser-Ney smoothing is tested proper smoothing technique for VCCV units.
Keywords
Language Model; Smoothing; VCCV; Perplexity; Speech Recognition;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 남지순, '한국어 전자사전' 전자공학회지, 제24권 제9호 pp.1103-1125, Sep., 1997   과학기술학회마을
2 Laurence Rabiner and Bing-Hwang Jang, 'Fundamantals of Speech Recognition,' Printice-Hall EngleWood Cliffs, NJ., 1993
3 윤재선, '한국어 음성 인식 dictation system의 구현', 성균관대학교 정보통신공학과 박사학위논문, 2001
4 P. R. Clarkson and R. Rosenfeld, 'Statistical Language Modeling using The CMU-Cambridge Toolkit,' ESCA Eurospeech, 1997
5 이진석, 박재득, 이근배, 'K-SLM toolkit을 이용한 한국어의 통계적 언어 모델링 비교', 한국전자통신연구원, 1999
6 Steve Young and Gerrit Bloothooft, 'Corpus-Based Methods in Language andspeech processing,' Kluwer Academic Publishers, 1997
7 北硏二, '音聲言語處理', 森北 出版株式會社, 1998
8 Jelinek and Frederick, 'Statistical Methods for Speech Recoginition,' MIT press, 1997
9 R. lyer and M. Ostendorf, 'Relevance weighting for combining multi-domain data for n-gram language modeling,' Computer Speech and Language, 13, pp.267-282, 1999   DOI   ScienceOn
10 오영환, '음성언어정보처리', 홍릉과학출판사, 1997
11 이건상, 양성일, 권성헌 공저, '음성인식', 한양대학교 출판부, 2001
12 Huang X., Acero A., Hon H.-W., 'Spoken language processing,' Prentice Hall PTR, October, 2001
13 Stanley F. Chen and Joshua Goodman, 'An Emperical Study of Smoothing Techniques for language modeling,' Technical Report TR-10-98, Computer Science Group, Harvard University, 1998