Browse > Article
http://dx.doi.org/10.6109/jicce.2019.17.3.191

KNE: An Automatic Dictionary Expansion Method Using Use-cases for Morphological Analysis  

Nam, Chung-Hyeon (Department of Computer Engineering, Korea University of Technology and Education)
Jang, Kyung-Sik (Department of Computer Engineering, Korea University of Technology and Education)
Abstract
Morphological analysis is used for searching sentences and understanding context. As most morpheme analysis methods are based on predefined dictionaries, the problem of a target word not being registered in the given morpheme dictionary, the so-called unregistered word problem, can be a major cause of reduced performance. The current practical solution of such unregistered word problem is to add them by hand-write into the given dictionary. This method is a limitation that restricts the scalability and expandability of dictionaries. In order to overcome this limitation, we propose a novel method to automatically expand a dictionary by means of use-case analysis, which checks the validity of the unregistered word by exploring the use-cases through web crawling. The results show that the proposed method is a feasible one in terms of the accuracy of the validation process, the expandability of the dictionary and, after registration, the fast extraction time of morphemes.
Keywords
Dictionary expansion; Natural language processing; Unknown word detection; Use-case analysis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. Marquez, L. Padro and H. Rodriquez, "A machine learning approach to PoS tagging," Machine Learning, vol. 39, no. 1, pp. 59-91, 2000. DOI: 10.1023/A:1007673816718.   DOI
2 A. V. Aho and M. J. Corasick, "Efficient string matching: An aid to bibliographic search," Communications of the ACM, vol. 18, no. 6, pp. 333-340, 1975. DOI: 10.1145/360825.360855.   DOI
3 J. V. Gael, A. Vlachos and Z. Ghahramani, "The infinite HMM for unsupervised PoS tagging," in Proceeding of 2009 Conference on Empirical Methods in Natural Langue Processing, Singapore, vol. 2, pp. 678-687, 2009.
4 C. N. D. Santos and B. Zadrozny, "Learning character-level representations for part-of-speech tagging." in Proceeding of the 31st International Conference on Machine Learning, Beijing, China, vol. 32, pp. 1818-1826, 2014.
5 A. Y. Aikhenvald, "The Art of Grammar: A Practical Guide." Oxford University Press; UK ed. Edition, pp. 99, 2015.
6 R. Hiraoka, H. Tanaka, S. Sakti, G. Neubing and S. Nakamura, "Personalized unknown word detection in non-native language reading using eye gaze." in Proceeding of the 18th ACM International Conference on Multimodal Interaction, pp. 66-70, 2016. DOI: 10.1145/2993148.2993167.   DOI
7 C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou and Y. Bengio, "Pointing the unknown words," in Proceeding of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 140-149, 2016. DOI: 10.18653/v1/P16-1014.   DOI
8 A. Mikheev, "Automatic rule induction for unknown-word guessing." Computational Linguistic, vol. 23, pp. 405-423, 1997.
9 K. Erk, "Unknown word sense detection as outlier detection," in Proceeding of the Human Language Technology Conference of the North American Chapter of the ACL, pp. 128-135, 2006. DOI: 10.3115/1220835.1220852.   DOI
10 W. Pang, X. Fan, Y. Gu and J. Yu, "Chinese unknown words extraction based on word-level characteristics." in Proceeding of the Ninth International Conference on Hybrid Intelligent Systems, vol. 39, pp. 361-366, 2009. DOI: 10.1109/HIS.2009.77.   DOI
11 T. Nakagawa, T. Kudoh and Y. Matsumoto, "Unknown word guessing and part-of-speech tagging using support vector machines," in Proceeding of the Sixth Natural Language Processing Pacific Rim Symposium, 2001.
12 G. S. Orphanos and D. N. Christodoulakis, "POS Disambiguation and unknown word guessing with decision trees," in Proceeding of the Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 134-141, 1999. DOI: 10.3115/977035.977054.   DOI
13 National Institute of Korean Language and Information Sharing Center [Internet], Available: https://ithub.korean.go.kr/user/main.do.
14 K-ICT Big Data Center [Internet], Available: https://kbig.kr/portal/kbin/knowledge/files/bigdata_report.page?bltnNo=10000000016451.