[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTB.2004.11B.2.241

Language Model based on VCCV and Test of Smoothing Techniques for Sentence Speech Recognition

Park, Seon-Hee (성균관대학교 대학원 정보통신공학부)
Roh, Yong-Wan (성균관대학교 대학원 정보통신공학부)
Hong, Kwang-Seok (성균관대학교 정보통신공학부)

Publication Information

The KIPS Transactions:PartB / v.11B, no.2, 2004 , pp. 241-246 More about this Journal

Abstract

In this paper, we propose VCCV units as a processing unit of language model and compare them with clauses and morphemes of existing processing units. Clauses and morphemes have many vocabulary and high perplexity. But VCCV units have low perplexity because of the small lexicon and the limited vocabulary. The construction of language models needs an issue of the smoothing. The smoothing technique used to better estimate probabilities when there is an insufficient data to estimate probabilities accurately. This paper made a language model of morphemes, clauses and VCCV units and calculated their perplexity. The perplexity of VCCV units is lower than morphemes and clauses units. We constructed the N-grams of VCCV units with low perplexity and tested the language model using Katz, absolute, modified Kneser-Ney smoothing and so on. In the experiment results, the modified Kneser-Ney smoothing is tested proper smoothing technique for VCCV units.

Keywords

Language Model; Smoothing; VCCV; Perplexity; Speech Recognition;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	남지순, '한국어 전자사전' 전자공학회지, 제24권 제9호 pp.1103-1125, Sep., 1997 과학기술학회마을
2	Laurence Rabiner and Bing-Hwang Jang, 'Fundamantals of Speech Recognition,' Printice-Hall EngleWood Cliffs, NJ., 1993
3	윤재선, '한국어 음성 인식 dictation system의 구현', 성균관대학교 정보통신공학과 박사학위논문, 2001
4	P. R. Clarkson and R. Rosenfeld, 'Statistical Language Modeling using The CMU-Cambridge Toolkit,' ESCA Eurospeech, 1997
5	이진석, 박재득, 이근배, 'K-SLM toolkit을 이용한 한국어의 통계적 언어 모델링 비교', 한국전자통신연구원, 1999
6	Steve Young and Gerrit Bloothooft, 'Corpus-Based Methods in Language andspeech processing,' Kluwer Academic Publishers, 1997
7	北硏二, '音聲言語處理', 森北出版株式會社, 1998
8	Jelinek and Frederick, 'Statistical Methods for Speech Recoginition,' MIT press, 1997
9	R. lyer and M. Ostendorf, 'Relevance weighting for combining multi-domain data for n-gram language modeling,' Computer Speech and Language, 13, pp.267-282, 1999 DOI ScienceOn
10	오영환, '음성언어정보처리', 홍릉과학출판사, 1997
11	이건상, 양성일, 권성헌 공저, '음성인식', 한양대학교 출판부, 2001
12	Huang X., Acero A., Hon H.-W., 'Spoken language processing,' Prentice Hall PTR, October, 2001
13	Stanley F. Chen and Joshua Goodman, 'An Emperical Study of Smoothing Techniques for language modeling,' Technical Report TR-10-98, Computer Science Group, Harvard University, 1998

KSCI

Language Model based on VCCV and Test of Smoothing Techniques for Sentence Speech Recognition 문장음성인식을 위한 VCCV 기반의 언어모델과 Smoothing 기법 평가

Language Model based on VCCV and Test of Smoothing Techniques for Sentence Speech Recognition