DOI QR코드

DOI QR Code

Word Sense Disambiguation Using Embedded Word Space

  • Kang, Myung Yun (Department of Business Data Convergence, Chungbuk National University) ;
  • Kim, Bogyum (Department of Computer Science, Chungbuk National University) ;
  • Lee, Jae Sung (Department of Computer Science, Chungbuk National University)
  • 투고 : 2017.01.06
  • 심사 : 2017.03.15
  • 발행 : 2017.03.30

초록

Determining the correct word sense among ambiguous senses is essential for semantic analysis. One of the models for word sense disambiguation is the word space model which is very simple in the structure and effective. However, when the context word vectors in the word space model are merged into sense vectors in a sense inventory, they become typically very large but still suffer from the lexical scarcity. In this paper, we propose a word sense disambiguation method using word embedding that makes the sense inventory vectors compact and efficient due to its additive compositionality. Results of experiments with a Korean sense-tagged corpus show that our method is very effective.

키워드

참고문헌

  1. A. Kilgarriff and J. Rosenzweig, "Framework and results for English SENSEVAL," Computers and the Humanities, vol. 34, no. 1-2, pp. 15-48, 2000. https://doi.org/10.1023/A:1002693207386
  2. D. McCarthy, "Word sense disambiguation: an overview," Language and Linguistics compass, vol. 3, no. 2, pp. 537-558, 2009. https://doi.org/10.1111/j.1749-818X.2009.00131.x
  3. M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th Annual International Conference on Systems Documentation, Toronto, Canada, 1986, pp. 24-26.
  4. D. E. Walker, "Knowledge resource tools for accessing large text files," in Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Hamilton, NY, 1985, pp. 335-347.
  5. D. Yarowsky, "Word-sense disambiguation using statistical models of Roget's categories trained on large corpora," in Proceedings of the 14th Conference on Computational Linguistics (COLING), Nantes, France, 1992, pp. 454-460.
  6. Y. Wilks, D. Fass, C. M. Guo, J. E. McDonald, T. Plate, and B. M. Slator, "Providing machine tractable dictionary tools," Machine Translation, vol. 5, no. 2, pp. 99-154, 1990. https://doi.org/10.1007/BF00393758
  7. W. A. Gale, K. W. Church, and D. Yarowsky, "A method for disambiguating word senses in a large corpus," Computers and the Humanities, vol. 26, no. 5-6, pp. 415-439, 1992. https://doi.org/10.1007/BF00136984
  8. P. F. Brown, S. A. D. Pietra, V. J. D. Pietra, and R. L. Mercer, "Word-sense disambiguation using statistical methods," in Proceedings of the 29th Annual Meeting on Association for Computational Linguistics, Berkeley, CA, 1991, pp. 264-270.
  9. H. Schutze, "Automatic word sense discrimination," Computational Linguistics, vol. 24, no. 1, pp. 97-123, 1998.
  10. Word2Vec, https://code.google.com/p/word2vec/.
  11. Y. M. Park and J. S. Lee, "Word sense disambiguation using Korean word space model," Journal of the Korea Contents Association, vol. 12, no. 6, pp. 41-47, 2012. https://doi.org/10.5392/JKCA.2012.12.06.041
  12. M. Y. Kang, B. Kim, and J. S. Lee, "Word sense disambiguation using Word2Vec," in Proceedings of the 26th Annual Conference on Human and Cognitive Language Technology, Busan, 2016.
  13. A. Gliozzo, C. Giuliano, and C. Strapparava, "Domain kernels for word sense disambiguation," in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, 2005, pp. 403-410.
  14. J. F. Cai, W. S. Lee, and Y. W. Teh, "NUS-ML: improving word sense disambiguation using topic features," in Proceedings of the 4th International Workshop on Semantic Evaluations, Association for Computational Linguistics, Prague, Czech Republic, 2007, pp. 249-252.
  15. L. Li, B. Roth, and C. Sporleder, "Topic models for word sense disambiguation and token-based idiom detection," in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 1138-1147.
  16. T. Mikolov, W. T. Yih, and G. Zweig, "Linguistic regularities in continuous space word representations," in Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlanta, GA, 2013, pp. 746-751.
  17. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in Proceedings of ICLR workshop, 2013.
  18. I. Iacobacci, M. T. Pilehvar, and R. Navigli, "Embeddings for word sense disambiguation: an evaluation study," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, 2016, pp. 897-907.
  19. G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, no. 11, pp. 613-620, 1975. https://doi.org/10.1145/361219.361220
  20. J. S. Kim, H. S. Choe, and C. Y. Ock, "A Korean homonym disambiguation model based on statistics using weights," Journal of KIISE: Software and Applications, vol. 30, no. 11, pp. 1112-1123, 2003.
  21. The National Institute of the Korean Language, "21st Century Sejong Project Final Result, revised edition," 2011.
  22. C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999.
  23. H. Lee, D. H. Baek, and H. C. Rim, "Word sense disambiguation using classification information," Journal of KIISE: Software and Applications, vol. 24, no. 7, pp. 779-789, 1997.
  24. J. Heo, H. C. Seo, and M. G. Jang, "Homonym disambiguation based on mutual information and sense-tagged compound noun dictionary," Journal of KIISE: Software and Applications, vol. 33, no. 12, pp. 1073-1089, 2006.
  25. J. C, Shin, "Korean morphological analysis based on preanalyzed partial word-phrase dictionary and part-of-speech/ homograph tagging based on syllable-morpheme transition probability," Ph.D. dissertation, Ulsan University, 2013.

피인용 문헌

  1. Effect of Word Sense Disambiguation on Neural Machine Translation: A Case Study in Korean vol.6, pp.2169-3536, 2018, https://doi.org/10.1109/ACCESS.2018.2851281