Browse > Article

A Korean Homonym Disambiguation Model Based on Statistics Using Weights  

김준수 (울산대학교 컴퓨터정보통신학과)
최호섭 (울산대학교 컴퓨터정보통신학과)
옥철영 (울산대학교 컴퓨터정보통신공학부)
Abstract
WSD(word sense disambiguation) is one of the most difficult problems in Korean information processing. The Bayesian model that used semantic information, extracted from definition corpus(1 million POS-tagged eojeol, Korean dictionary definitions), resulted in accuracy of 72.08% (nouns 78.12%, verbs 62.45%). This paper proposes the statistical WSD model using NPH(New Prior Probability of Homonym sense) and distance weights. We select 46 homonyms(30 nouns, 16 verbs) occurred high frequency in definition corpus, and then we experiment the model on 47,977 contexts from ‘21C Sejong Corpus’(3.5 million POS-tagged eojeol). The WSD model using NPH improves on accuracy to average 1.70% and the one using NPH and distance weights improves to 2.01%.
Keywords
Word Sense Disambiguation; WSD; Semantic Information; Homograph;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 P.O. Cho, C.Y. Ock, 'A Korean Noun Semantic Hirarchy based on Semantic Features,' Proceedings of the 18th ICCPOL Vol.1, 1999
2 박영자, 사전을 이용한 단어 의미 자동 클러스터링: 유전자 알고리즘 접근법, 박사 학위 논문, 연세대학교 대학원, 1998
3 이호, 단어 의미 중의성 해결을 위한 분류 정보 모형, 박사 학위 논문, 고려대학교 대학원, 1999
4 송영빈, 최기선, '동사의 애매성 해소를 위한 시소러스의 이용과 한계', 제12회 한글 및 한국어 정보처리 학술대회 발표논문, pp. 255-261, 2000
5 이창기, 이근배, '의미 애매성 해소를 이용한 Word-Net 자동 매핑', 제12회 한글 및 한국어 정보처리 학술대회 발표논문, pp. 262-268, 2000
6 Jun-Su Kim, Wang-Woo Lee, Chang-Hwan Kim and Cheol-Young Ock, 'A Korean Homonym Disambiguation System Based on Statistical Model Using Weights,' Proceedings of the 16th Pacific Asia conference, pp. 166-176, 2002
7 박성배, 장병탁, 김영택, '의미 부착이 없는 데이타로 부터의 학습을 통한 의미 중의성 해소', 한국 정보과학회 '2000 봄 학술 발표 논문집 B, 제27권 1호, pp. 330-332, 2000
8 S. F. Weiss, 'Learning to disambiguate,' Information Storage and Retrieval, Vol. 9, pp. 33-41, 1973   DOI   ScienceOn
9 Yarowsky, David, Word sense disambiguation using statistical models of Roget's categories trained on large corpora., in Proceedings of the 14th International Conference on Computational Linguistics, COLING'92, pp. 454-460, Nantes, France, August, 1992   DOI
10 P. Brown, V. Della Pietra, S. Della Pietra and R. Mercer, 'Word sense disambiguation using statistical methods,' Proceedings of the 29th Annual Conference of the Association for Computational Linguistics, pp. 264-270, 1991
11 E. Kelly and P. Stone, 'Computer Recognition of English Word Senses,' Amsterdam, The Netherlands: North-Holland, 1975
12 이호, 백대호, 임해창, '분류 정보를 이용한 단어 의미 중의성 해결', 한국정보과학회 논문지(B), Vol. 24, No. 7, pp. 779-789, 1997
13 허정, 옥철영, '사전의 뜻풀이말에서 추출한 의미정보에 기반한 동형이의어 중의성 해결 시스템', 한국정보과학회 논문지(소프트웨어 및 응용), Vol. 28, No. 9, pp. 688-698 2001   과학기술학회마을
14 Bruce, Rebecca and Janyce Wiebe. Word-sense disambiguation using decomposable models, in Proceedings of the 32nd Annual Meeting, pp. 139-145, Las Cruces, NM. Association for Computational Linguistics, 1994   DOI
15 Hwee Tou Ng and Hian Beng Lee, 'Integrating multiple Knowledge sources to disambiguate word sense: An examplar-based approach,' Proceeding on 34th Annual Meeting of the Association for Computational Linguistics, pp. 40-47, 1996
16 Alpha k, Luk, 'Statistical Sense Disambiguation with Relatively Small Corpora Using Dictionary Definitions,' Proceeding on 33rd Annual Meeting of the Association for Computational Linguistics, pp. 181-188, 1995   DOI
17 E. Brill, A Corpus-Based Approach to Language Learning. Ph.D. thesis Computer and Information Science, University of Pennsylvania, 1993
18 J. Markowitz, T. Ahlswede, and M. Evens, 'Semantically significant patterns in dictionary definitions,' Proceedings of the 24th Annual Conference of the Association for Computational Linguistics, New York, pp. 112-119, 1986   DOI
19 A. Alonge, 'Analysing dictionary definitions of motion verbs,' Proceedings of the 15th International Conference on Computational Linguistics, pp. 1315-1319, 1992
20 Nancy Ide and Jean Veronis, 'Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art,' Computational Linguistics, Vol. 24, No. I, pp. 1-40, 1998
21 이왕우 외, 'Bayes 정리에 기반한 개선된 동형이의어 분별 모델', 제13회 한글 및 한국어 정보처리 학술대회, pp. 465-471, 2001