World Sense Disambiguation using Multiple Feature Decision Lists

다중 자질 결정 목록을 이용한 단어 의미 중의성 해결

  • Published : 2003.08.01

Abstract

This paper proposes a method of disambiguating the senses of words using decision lists, which consists of rules with confidence values. The rule of decision list is composed of a boolean function(=precondition) and a class(=sense). Decision lists classify the instance using the rule with the highest confidence value that is matched with it. Previous work disambiguated the senses using single feature decision lists, whose boolean function was composed of only one feature. However, this approach can be affected more severely by data sparseness problem and preprocessing errors. Hence, we propose multiple feature decision lists that have the boolean function consisting of more than one feature in order to identify the senses of words. Experiments are performed with 1 sense tagged corpus in Korean and 5 sense tagged corpus in English. The experimental results show that multiple feature decision lists are more effective than single feature decision lists in disambiguating senses.

본 논문에서는 결정 목록을 이용해서 단어 의미 중의성을 해결하는 방법을 제안한다. 결정 목록은 하나 이상의 규칙으로 구성되며, 각 규칙에는 신뢰도가 부여되어 있고, 규칙은 불린 함수(=조건, precondition 와 부류(=의미, class)로 구성되어 있다. 분류 대상이 만족하는 불린 함수를 가진 규칙들 중에서 가장 신뢰도가 높은 규칙에 의해서 분류 대상의 부류가 정해진다. 기존 방법에서는 하나의 자질로 하나의 불린 함수를 구성하는 단일 자질 결정 목록을 이용해서 단어 의미 중의성을 해결했다. 이 경우, 자료 부족 문제와 전처리 과정의 오류에 민감하게 반응한다는 문제점이 있다. 본 논문에서는 기존의 단일자질 결정 목록의 문제점을 해결하기 위해서, 하나 이상의 자질로 불린 함수를 구성하는 다중 자질 결정 목록을 제안하고, 다중 자질 결정 목록을 이용하여, 단어 의미 중의성을 해결하는 방법을 기술하고 있다. 단일 자질 결정 목록과 다중 자질 결정 목록을 비교하기 위해서, 1개의 한국어 의미 부착 말뭉치와 5개의 영어 의미 부착 말뭉치를 대상으로 단어 의미 중의성 해결 실험을 했다. 실험 결과 6개의 말뭉치 모두에 서 다중 자질 결정 목록이 단일 자질 결정 목록에 비해서 더 좋은 결과를 나타냈다.

Keywords

References

  1. 이상주, 자동 품사 부착을 위한 새로운 통계적 모형, 고려대학교 컴퓨터학과 박사학위 논문, 1999
  2. 이호, 단어 의미 중의성 해결을 위한 분류 정보 모형, 고려대학교 컴퓨터학과 박사학위 논문, 1999
  3. Yorick Wilks and Mark Stevenson, 'The Grammar of sense: Is word sense tagging much more than part-of-speech tagging?,' In Technical Report CS-96-05, University of Sheffield, 1996
  4. Kelly, E. and Philip S. Computer Recognition of English Word Senses, Amsterdam, North-Holland, 1975
  5. Weiss, S., Learning to Disambiguate, Information Storage and Retrieval, Vol. 9, pp. 33-41, 1973 https://doi.org/10.1016/0020-0271(73)90005-3
  6. Lesk, M., 'Automatic Sense Disambiguation: How to tell a Pine Cone from an Ice Cream Cone,' In Proceeding of the 1986 SIGDOC Conference, New York: Association for Computing Machinery, 1986
  7. Luk, K. A., 'Statistical sense disambiguation with relatively small corpora using dictionary definitions,' In Proceedings of the 33rd Annual Meetings of the Association for Computational Linguistics, 1995 https://doi.org/10.3115/981658.981683
  8. Veronis, J. and Ide, Nancy, 'Word Sense Disam biguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries,' In Proceedings COLING-90, pp. 389-394, 1990
  9. Nancy Ide and Jean Veronis, Introduction to the Special Issue on Word Sense Disambiguation: The State fo the Art, Computational Linguistics, Vol.24, No.1, pp.1-40, 1998
  10. Hinrich Schuetze, Automatic Word Sense Discrimination, Computational Linguistics, Vol.24, No.1, pp.97-123, 1998
  11. Claudia Leacock, Martin Chodorow, and George A. Miller, Using Corpus Statistics and WordNet Relations for Sense Identification, Computational Linguistics, Vol.24, No.1, pp.147-165, 1998
  12. Geoffrey Towell and Ellen M. Voorhees, Disambiguating Highly Ambiguous Words, Computational Linguistics, Vol.24, No.1, pp.125-1145, 1998
  13. Clara Cabezas, Philip Resnik and Jessica Stevens, Supervised Sense Tagging using Support Vector Machines,' In Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation System(SENSEVAL 2), pp. 59-62, 2001
  14. Hwee Tou Ng, 'Exemplar-Based Word Sense Disambiguation: Some Recent Improvement,' In Proceedings of the 2nd conference on Empirical Methods in Natural Language Processing, 1997
  15. Rada F. Mihalcea and Dan I. Moldovan, 'Pattern Learning and Active Feature Selection for Word Sense Disambiguation,' In Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation System(SENSEVAL 2), pp. 127-130, 2001
  16. Gerard Escudero, Lluis Marquez and German Rigau, 'A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation,' Proceedings of CoNLL 2000 and LLL 2000, 2000 https://doi.org/10.3115/1117601.1117609
  17. David Yarowsky, Hierarchical Decision Lists for Word Sense Disambiguation, Computers and the Humanities, Vol.34, No.1-2: pp. 179-186, 2000 https://doi.org/10.1023/A:1002674829964
  18. Eneko Agirre and David Martinez, 'Exploring automatic word sense disambiguation with decision lists and the Web,' In Proceedings of the Semantic Annotation And Intelligent Annotation workshop organized by COLING, Luxembourg 2000
  19. A. Kilgarriff and J. Rosenzweig, Framework and Results for English SENSEVAL, Computers and the Humanities, Vol.34, No.1-2: pp. 15-48, 2000 https://doi.org/10.1023/A:1002693207386
  20. A. Kilgarriff, 'English Lexical Sample Task Description,' In Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation System(SENSEVAL-2), pp.17-20, 2001
  21. David Yarowsky Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Proceedings on 33rd Annual Meeting of the Association for Computational Linguistics https://doi.org/10.3115/981658.981684
  22. Ronald L. Rivest, Learning decision lists, Machine Learning, Vol.2, No.3, pp.229-246, 1987 https://doi.org/10.1007/BF00058680
  23. 김진동, 임희석, 임해창, 'Twoply HMM : 한국어의 특성을 고려한 형태소 단위의 품사 태깅 모델', 한국정보과학회 논문지(B), 제24권, 제12호, pp.1502-1512, 1997
  24. David Yarowsky, 'Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French,' In Proceedings on 32nd Annual Meeting of the Association for Computational Linguistics, pp.88-95, 1994 https://doi.org/10.3115/981732.981745
  25. Eric Brill, 'Some Advances in rule-based part of speech tagging,' In Proceedings of the Twelfth National Conference on Artificial Intelligence(AAAI 94), 1994
  26. Hwee Tou Ng and Hian Beng Lee, 'Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar Based Approach,' In Proceedings on 34th Annual Meeting of the Association for Computational Linguistics, pp.88-95, 1996
  27. David Martinez and Eneko Agirre, 'One Sense per Collocation and Genre/Topic Variations,' In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 2000 https://doi.org/10.3115/1117794.1117820
  28. Rebecca Bruce and Janyce Wiebe, 'Word Sense Dismabiguation using Decomposable models,' In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 139-146, 1994
  29. Stanley F. Chen and Joshua Goodman, 'An Empirical Study of Smoothing Techniques for Language Modeling,' In Technical Report TR 10-98, Computer Science Group, Harvard University
  30. Martin Chodorow, Claudia Leacock, and George A. Miller, A Topical/Local Classifier for Word Sense Identification, Computers and the Humanities, Vol. 34, No. 1-2, pp. 115-120, 2000 https://doi.org/10.1023/A:1002463121011
  31. Jorn Veenstra, Antal van den Bosch, Sabine Buchholz, Walter Daelemans and Jakub Zavrel, Memory based Word Sense Disambiguation, Computers and the Humanities, Vol. 34, No.1-2, pp. 171-177, 2000 https://doi.org/10.1023/A:1002459020102
  32. David yarowsky, Silviu Cucerzan, Radu Florian, Charles Schafer, and Richard Wicentowski, 'The Johns Hopkins SENSEVAL2 system descriptions,' In Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation System(SENSEVAL 2), pp. 163-166, 2001
  33. Hee Cheol Seo, Sang Zoo Lee, Hae Chang Rim, and Ho Lee, 'KUNLP system using Classification Information Model at SENSEVAL 2,' In Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation System(SENSEVAL 2), pp. 147-150, 2001