[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTB.2003.10B.1.047

Segmenting and Classifying Korean Words based on Syllables Using Instance-Based Learning

Kim, Jae-Hoon (한국해양대학교 컴퓨터공학과)
Lee, Kong-Joo (이화여자대학교 컴퓨터공학과)

Publication Information

The KIPS Transactions:PartB / v.10B, no.1, 2003 , pp. 47-56 More about this Journal

Abstract

Korean delimits words by white-space like English, but words In Korean Is a little different in structure from those in English. Words in English generally consist of one word, but those in Korean are composed of one word and/or morpheme or more. Because of this difference, a word between white-spaces is called an Eojeol in Korean. We propose a method for segmenting and classifying Korean words and/or morphemes based on syllables using an instance-based learning. In this paper, elements of feature sets for the instance-based learning are one previous syllable, one current syllable, two next syllables, a final consonant of the current syllable, and two previous categories. Our method shows more than 97% of the F-measure of word segmentation using ETRI corpus and KAIST corpus.

Keywords

Word segmentation; Instance-based learning;

Citations & Related Records

Times Cited By KSCI : 5 (Citation Analysis)

Reference
Cited By KSCI

1	연세대학교 언어정보개발연구원, 연세 한국어사전, 두산동아, 1998
2	이상주, 류원호, 김진동, 임해창, '품사태깅을 위한 어휘문맥 의존규칙의 발뭉치기반 중의성주도 학습', 한국정보과학회논문지(B), 제26권 제1호, pp.178-189, 1999
3	Brent, M., 'An efficient, probabilistically sound algorithm for segmentation and word discovery,' Machine Learning, Vol.34, pp.71-106, 1999 DOI
4	Venkatarman, A., 'A statistical model for word discovery in transcribed speech,' Computational Linguistics, Vol.27, No.3, pp.351-372, 2001 DOI ScienceOn
5	Brill, E., 'Transformation-based error-driven learning and natural language processing : A case study in part-of-see ech tagging,' Computational Linguistics, Vol.21, No.4. pp. 543-565, 1995
6	Lua, K.-T. and Gan, K.-W., 'An application of information theory in Chinese word segmentation,' Computer Processing of Chinese and Oriental Languages, Vol.8, No.1, pp, 115-124, 1994
7	Yao, Y. and Lua, K.-T., 'Splitting-merging model for Chinese word tokenization and segmentation,' Natural Language Engineering, Vol.4, part 4, pp.309-324, 1998 DOI ScienceOn
8	Teahan, W. J., Wen, Y., McNab, R J., Witten, I. H., 'A compression-based algorithm for Chinese word segmentation,' Computational Linguistics, Vol.26, No.3, pp.375- 393, 2000 DOI ScienceOn
9	이준호, 안정수, 박현주, 김명호, '한글 문서의 효과적인 검색을 위한 n-gram 기반의 색인 방법', 정보관리학회지, 제13호 제1호, pp.47-63, 1996 과학기술학회마을
10	강승식, 음절 정보와 복수어 단위 정보를 이용한 한국어 형태소 분석, 서울대학교 컴퓨터공학과 박사학위논문, 1993
11	신중호, 박혁로, '음절단위 bigram정보를 이용한 한국어 단어 인식모델', 제9회 한글 및 한국어 정보처리 학술대회 발표논문집, PP.255-260, 1997 과학기술학회마을
12	김재훈, '가중치망 모델을 이용한 한국어 품사 태깅', 정보과학회논문지, 제25권 제6호, pp.951-959, 1998
13	Quinlan, J. R., C4.5 : Programs for Machine Learning, Morgan Kaufmann Publishers, 1993
14	Daelemans, W., Zavrel, J., van der Sloot, K., and van den Bosch, A., TiMBL : Tilburg Memory Based Learner, version 4.0, Reference Guide, Technical Report 01-04, Induction of Linguistic Knowledge, Tilburg University, 2001
15	Rabiner, L. R., 'A tutorial on hidden Markov models and selected applications in speech recognition,' Proceedings of the IEEE, Vol.77, No.2, pp.257-286, 1989 DOI ScienceOn
16	Daelemans, W., van den Bosch, A., and Zavrel, J., 'Forgetting Exceptions is Harmful in Language Learning,' Machine Learning, Vol.34, No.1-3, pp.11-41, 1999 DOI
17	ETRI, 품사 태그 부착 말뭉치 구축 지침서, 한국전자통신연구원, 컴퓨터소프트웨어 기술연구소, 지식정보연구부, 1999
18	Jurafsky, D. and Martin, J. H., SPEECH and LANGUAGE PROCESSING : An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, 2000
19	Allan, J., Callan, J., and Croft, B., 'INQUERY at TREC-5,' Proceedings of The Fifth Text REtrieval Conference (TREC-5), pp.119-132, 1996
20	Sproat R., Shih C., Gale W., Chang N., 'A stochastic finite-state word-segmentation algorithm for Chinese,' Computational Linguistics, Vol.22, No.3, pp.377-404, 1996
21	Palmer, D. D., 'A trainable rule-based algorithm for word segmentation,' Proceedings of ACL -97, pp.321-328, 1997
22	Lee, G. G., Cha, J. and Lee, J.-H., 'Syllable-pattern-based unknown morpheme segmetation and estimation for hybrid part-of-speech tagging of Korean,' Computational Linguistics, Vol.28, No.1, pp.53-70, 2002 DOI ScienceOn
23	이현아, 이원일, 임선숙, 허은경, 이재성, 차건희, 박재득, '표준안에 따른 품사 부착 말뭉치 구축', 제11회 한글 및 한국어 정보처리 학술대회 및 제1회 형태소 분석기 및 품사태거 평가 워크숍논문집, 전북, pp.40-43, 1999 과학기술학회마을
24	Cardie, C. and Mooney, R. J., 'Introduction : Machine learning and natural language,' Machine Learning, Vol.34, nos.1/2/3, pp.5-10, 1999 DOI
25	Hammerton, J., Osborne, M., Armstrong, S., and Daelemans, W., 'Introduction to special issue on machine learning approaches to shallow parsing,' Journal of Machine Learning Research, Vol.2, pp.551-558, 2002 DOI
26	Ramshaw, L. and Marcus, M., 'Text chunking using transformation-based learning,' Proceedings of the Third Workshop on Very Large Corpora, pp.82-94, 1995
27	Sekine, S. and Grishman, R. and Shinnou, H., 'A decision tree method for finding and classifying names in Japanese texts,' Proceedings of the Sixth Workshop on Very Large Corpora, 1998
28	김재훈, 김길창, 한국어에서의 품사 부착 말뭉치의 작성 요령 : KAIST 말뭉치, 한국과학기술원, 전산학과, CS-TR-95-99, 1995
29	Aha, D. W. and Bankert, R. L., 'Feature selection for case-based classification of cloud types : An empirical comparison,' Proceedings of the 1994 AAAI Workshop on case-based reasoning, pp.106-112, 1994
30	Chinchor, N., Brown, E., Ferro, L. and Robinson, P., Named entity recognition task definition, version 1.4. 1999
31	김재호, 투표 방식의 비지도식 모델을 이용한 개체명 분류, 한국과학기술원 전산학과, 석사학위논문, 2002

1	Segmenting and Classifying Korean Words based on Syllables Using Instance-Based Learning / [Kim, Jae-Hoon;Lee, Kong-Joo;] / The KIPS Transactions:PartB
2	Segmenting and Classifying Korean Words based on Syllables Using Instance-Based Learning / [Kim, Jae-Hoon;Lee, Kong-Joo;] / The KIPS Transactions:PartB
3	Morpheme Recovery Based on Naïve Bayes Model / [Kim, Jae-Hoon;Jeon, Kil-Ho;] / The KIPS Transactions:PartB

KSCI

Segmenting and Classifying Korean Words based on Syllables Using Instance-Based Learning 사례기반 학습을 이용한 음절기반 한국어 단어 분리 및 범주 결정

Segmenting and Classifying Korean Words based on Syllables Using Instance-Based Learning