Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary

상호정보량과 복합명사 의미사전에 기반한 동음이의어 중의성 해소

  • 허정 (한국전자통신연구원 지식마이닝연구팀) ;
  • 서희철 (한국전자통신연구원 지식마이닝연구팀) ;
  • 장명길 (한국전자통신연구원 지식마이닝연구팀)
  • Published : 2006.12.15

Abstract

The goal of Natural Language Processing(NLP) is to make a computer understand a natural language and to deliver the meanings of natural language to humans. Word sense Disambiguation(WSD is a very important technology to achieve the goal of NLP. In this paper, we describe a technology for automatic homonyms disambiguation using both Mutual Information(MI) and a Sense-Tagged Compound Noun Dictionary. Previous research work using word definitions in dictionary suffered from the problem of data sparseness because of the use of exact word matching. Our work overcomes this problem by using MI which is an association measure between words. To reflect language features, the rate of word-pairs with MI values, sense frequency and site of word definitions are used as weights in our system. We constructed a Sense-Tagged Compound Noun Dictionary for high frequency compound nouns and used it to resolve homonym sense disambiguation. Experimental data for testing and evaluating our system is constructed from QA(Question Answering) test data which consisted of about 200 query sentences and answer paragraphs. We performed 4 types of experiments. In case of being used only MI, the result of experiment showed a precision of 65.06%. When we used the weighted values, we achieved a precision of 85.35% and when we used the Sense-Tagged Compound Noun Dictionary, we achieved a precision of 88.82%, respectively.

자연언어처리의 목적은 컴퓨터가 자연어를 이해할 수 있도록 하여, 인간에게 다양한 정보를 정확하고 빠르게 전달할 수 있도록 하고자 하는 것이다. 이를 위해서는 언어의 의미를 정확히 파악하여야 하는데, 어휘 의미 중의성 해소가 필수적인 기술이다. 본 연구는 상호정보량과 기 분석된 복합명사 의미사전에 기반한 동음이의어 의미 중의성 해소를 위한 기술을 소개한다. 사전 뜻풀이를 이용하는 기존 기술들은 어휘들간의 정확한 매칭에 의존하기 때문에 자료 부족 현상이 심각하였다. 그러나, 본 연구에서는 어휘들간의 연관계수인 상호정보량을 이용함으로써 이 문제를 완화시켰다. 또한, 언어적인 특징을 반영하기 위해서 상호정보량을 가지는 어휘 쌍의 비율 가중치, 의미 별 비율 가중치와 뜻풀이의 길이 가중치를 사용하였다. 그리고, 복합명사를 구성하는 단일명사들은 서로의 의미를 제약한다는 것에 기반하여 고빈도 복합명사에 대해서 의미를 부착한 의미사전을 구축하였고, 이를 동음이의어 중의성 해소에 활용하였다. 본 시스템의 평가를 위해 질의응답 평가셋의 200 여 개의 질의와 정답단락을 대상으로 동음이의어 의미 중의성 해소 평가셋을 구축하였다. 평가셋에 기반하여 네 유형의 실험을 수행하였다. 실험 결과는 상호 정보량만을 이용하였을 때 65.06%의 정확률을 보였고, 가중치를 활용하였을 때 85.35%의 정확률을 보였다. 또한, 복합명사 의미분석 사전을 활용하였을 때는 88.82%의 정확률을 보였다.

Keywords

References

  1. Adam Kilgarriff, 'What is word sense disambiguation good for? ,' In Proceedings of NLP Pacific Rim Symposium, 1997
  2. Hyun-Kyu Kang, Se-Young Park, Key-Sun Choi, 'A Word Sense Disambiguation Model Using Two-level Document Ranking with Mutual Information in Natural Language Information Retrieval,' In Proceeding of ICCPOL, 1997
  3. M. Lesk, 'Automatic sense disambiguation using machine readable dictionaries : how to tell a pine cone from an ice cream cone.,' In Proceedings of ACM DIGDOC, 1986 https://doi.org/10.1145/318723.318728
  4. Adam Kilgarriff, 'SENSEV AL: An Exercise in Evaluating Word Sense Disambiguation Programs,' In Proceedings LREC, 1998
  5. Adam Kilgarriff, 'SENSEVAL: An Exercise in Evaluating Word Sense Disambiguation Programs,' In Proceedings LREC, 1998
  6. Hee-Cheol Seo, Sang-Zoo Lee, Hac-Chang Rim, Ho Lee, KUNLP system using Classification Information Model at SESENVAL-2,' In Proceedings of SENSEV AL -2, 200l
  7. Philip Edmonds, Scott Cotton, 'SENSEV AL-2: Overview,' In Proceedings of SENSEV AL-2, 2001
  8. Philip Edmonds, 'SENSEV AL: The evaluation of word sense disambiguation systems,' in the ELRA Newsletter, 2002
  9. Mark Stevenson, 'Word Sense Disambiguation : The Case for Combinations of Knowledge Sources,' CSU Publications, 2003
  10. Christiane Fellbaum, 'WORDNET:An Electronic Lexical Database,' The MIT Press, 1998
  11. 정영미, 이재윤, '한국어 텍스트 내 용어연관성 분석을 위한 기초 연구', 제5회 한국정보관리학회, 1998
  12. Cowie, J, L. Guthrie, J. Guthrie, 'Lexical disambiguation using simulated annealing,' In Proceedings of COLING, 1992 https://doi.org/10.3115/1075527.1075580
  13. Andrew Harley, Dominic Glennon 'Sense Tagging in action: Combining different tests with additive weights,' In Proceedings of the SIGLEX Workshop 'Tagging Text with Lexical Semantics,' 1997
  14. David Yarowsky, 'Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora,' In Proceeding of COLING, 1992 https://doi.org/10.3115/992133.992140
  15. Eneko Agirre, German Rigau, 'Word Sense Disambiguation Using Conceptual Density,' In proceedings of ACL, 1996 https://doi.org/10.3115/992628.992635
  16. Philip Resnik, 'Disambiguation Noun Groupings with Respect to WordNet Senses,' In Proceedings of the Third Workshop on Very Large Corpora, 1995
  17. Mauro Castillo, Real Francis, Jordi Asterias, Ger?man Rigau,' The TALP Systems for Disambiguating WordNet Glosses,' In Proceedings of SENSEV AL-3, 2004
  18. Ganesh Ramakrishnan, B.Prithviraj, Pushpak Bhattacharyya,' A Gloss-centered Algorithm for Disambiguation,' In Proceedings of SENSEV AL-3, 2004
  19. Hee-Cheol Seo, Hac-Chang Rim, Soo-Hong Kim, 'KUNLP System in SENSEV AL-3,' In Proceedings of SENSEV AL-3, 2004
  20. Armando Suarez, 'A Maximum Entropy-based Word Sense Disambiguation system,' In proceedings of COLING, 2002 https://doi.org/10.3115/1072228.1072343
  21. Carlo Strapparava, Alfio Gliozzo, Claudio Giuliano, 'Pattern Abstraction and Term Similarity for Word Sense Disambiguation: IRST at Senseval-S,' In Proceedings of SENSEV AL-3, 2004
  22. Eneko Agirre, David Martinez,' The Basque Country University system: English and Basque tasks,' In Proceedings of SENSE VAL-3, 2004
  23. Gerard Escudero, Lluis Marquez, German Rigau,' Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited,' In proceedings of ECAI, 2000
  24. Namhee Kwon, Michael Fleischman, Eduard Hovy, 'Senseval automatic labeling of semantic roles using Maximum Entropy models,' In Proceedings of SENSEV AL-3, 2004
  25. Yoong Keok Lee, Hwee Tou Ng, Tee Kiah Chia, 'Supervised Word Sense Disambiguation with Support Vector Machine and Multiple Knowledge Sources,' In Proceedings of SENSEV AL-3, 2004
  26. David Yarowsky, 'Unsupervised Word Sense Disambiguation Rivaling Supervised Mehtods,' In proceedings of ACL, 1995 https://doi.org/10.3115/981658.981684
  27. Kenneth C. Litkowski, 'SENSEVAL-3 TASK: Word-Sense Disambiguation of WordNet Glosses,' In Proceedings of SENSEVAL-3, 2004
  28. 이창기, 이근배, '의미 애매서 해소를 이용한 WordNet 자동 매핑', 제12회 한글 및 한국어 정보처리 학술대회, 1997
  29. 조평옥, 옥철영, '사전 뜻풀이에서 구축한 한국어 명사 의미계층구조', 인지과학회 논문지 제10권 제4호, 1999년
  30. 왕지현, 장명길, '정보검색을 위한 한국어 명사 개념망 구축에 관한 연구', 제1회 한국시소러스연구회 국제학술포럼, 2003
  31. Miran Choi, Jeong Hur, Myung-Gil Jang, 'Constructing Korean Lexical Concept Network for Encyclopedia Question-Answering System,' In proceedings of IECON, 2004