Browse > Article
http://dx.doi.org/10.7236/JIIBC.2018.18.5.25

Korean Language Clustering using Word2Vec  

Heu, Jee-Uk (Dept. of Computer Engineering, Hanyang University)
Publication Information
The Journal of the Institute of Internet, Broadcasting and Communication / v.18, no.5, 2018 , pp. 25-30 More about this Journal
Abstract
Recently with the development of Internet technology, a lot of research area such as retrieval and extracting data have getting important for providing the information efficiently and quickly. Especially, the technique of analyzing and finding the semantic similar words for given korean word such as compound words or generated newly is necessary because it is not easy to catch the meaning or semantic about them. To handle of this problem, word clustering is one of the technique which is grouping the similar words of given word. In this paper, we proposed the korean language clustering technique that clusters the similar words by embedding the words using Word2Vec from the given documents.
Keywords
Word2Vec; WordEmbedding; Korean; Clustering;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 M. Sun, H, Um, "The Study on Recent Research Trend in Korean Tourism Using Keyword Network Analysis," Journal of the Korea Academia- Industrial cooperation Society(JKAIS), Vol. 17, No. 9, pp. 68-73, 2016.   DOI
2 E. Bae, S. Yu, "Keyword-based Recommender System Dataset Construction and Analysis, "Journal of KIIT. Vol. 16, No. 6, pp. 91-99, 2018. DOI : 10.14801/jkiit.2018.16.6.91.
3 http://www.bloter.net/archives/260569
4 Jae-Young Chang, "A Study on Research Trends of Graph-Based Text Representations for Text Mining", The Journal of The Institute of Internet, Broadcasting and Communication, Vol. 13, No. 5, pp. 37-47, Oct 2013. DOI: http://dx.doi.org/10.7236/JIIBC.2013.13.5.37   DOI
5 Shirai, Kiyoaki, and Makoto Nakamura. "JAIST: Clustering and classification based approaches for Japanese WSD." Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, pp. 379-382, 2010.
6 Chen, Qian, Zengru Jiang, and Jinqiang Bian. "Chinese keyword extraction using semantically weighted network." In Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2014 Sixth International Conference on, Vol. 2, pp. 83-86. IEEE, 2014.
7 Xu, G. X., W. Sun, and X. P. Peng. "Clustering Research across Tibetan and Chinese Texts." Journal of Digital Information Management Vol. 13, No. 3, pp. 163-168, 2015
8 Abuaiadah, Diab, Dileep Rajendran, and Mustafa Jarrar. "Clustering Arabic tweets for sentiment analysis." In Computer Systems and Applications (AICCSA), 2017 IEEE/ACS 14th International Conference on, pp. 449-456. IEEE, 2017.
9 Copara, Jenny, Jose Ochoa, Camilo Thorne, and Goran Glavas. "Exploring unsupervised features in Conditional Random Fields for Spanish Named Entity Recognition." In Intelligent Systems (BRACIS), 2016 5th Brazilian Conference, pp. 283-288. IEEE, 2016.
10 Sahmoudi, Issam, and Abdelmonaime Lachkar. "Formal Concept Analysis for Arabic Web Search Results Clustering." Journal of King Saud University-Computer and Information Sciences 29, No. 2, pp 196-203. 2017
11 https://ithub.korean.go.kr
12 https://ilis.yonsei.ac.kr
13 http://www.sejong21.org
14 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," In Proceedings of workshop at ICLR, pp. 1-12, 2013.
15 M. Kim, T. Kang,"Proposal and Analysis of Various Link Architectures in Multilayer Neural Network,"Journal of KIIT. Vol. 16, No. 4, pp. 11-19, 2018. DOI : 10.14801/jkiit.2018.16.4.11
16 Park, Eunjeong L., and Sungzoon Cho. "KoNLPy: Korean natural language processing in Python." Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology. pp. 133-136, 2014.