Browse > Article
http://dx.doi.org/10.12673/jant.2016.20.6.617

Query Extension of Retrieve System Using Hangul Word Embedding and Apriori  

Shin, Dong-Ha (Department of Energy IT, Gachon University)
Kim, Chang-Bok (Department of Energy IT, Gachon University)
Abstract
The hangul word embedding should be performed certainly process for noun extraction. Otherwise, it should be trained words that are not necessary, and it can not be derived efficient embedding results. In this paper, we propose model that can retrieve more efficiently by query language expansion using hangul word embedded, apriori, and text mining. The word embedding and apriori is a step expanding query language by extracting association words according to meaning and context for query language. The hangul text mining is a step of extracting similar answer and responding to the user using noun extraction, TF-IDF, and cosine similarity. The proposed model can improve accuracy of answer by learning the answer of specific domain and expanding high correlation query language. As future research, it needs to extract more correlation query language by analysis of user queries stored in database.
Keywords
Word embedding; Word2vec; Apriori; Cosine similarity; TF-IDF;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Y. A Kim, G. W. Park, "An efficient extended query suggestion system using the analysis of users' query patterns," Korea Institute of Communication Sciences, Vol. 37, No. 7, pp. 619-626, June. 2012.   DOI
2 Z. Mai, G. Pant, and O. R. Liu Sheng, "Interest-based personalized search," ACM Transactions on Information systems, Vol. 25, No. 1, pp. 1-38, Feb. 2007.   DOI
3 C. Buckley, G. Salton, and J. Allan, "The effect of adding relevance information in a relevance feedback environment," in Proceedings of 17th annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dublin : Ireland, pp. 292-300, July. 1994.
4 J. Garten, K. Sagae, V. Ustun, "Combining distributed vector representations for words," in Proceedings of NAACL-HLT, Denver: CO, pp. 95-101, May. 2015.
5 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in proceeding of Neural Information Processing Systems 26, Lake Tahoe: NV, pp. 3111-3119, Dec. 2013.
6 M. Tomas, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in Proceeding of International Conference on Learning Representations, Scottsdale: AZ, pp. 01-09, May. 2013.
7 [Internet]. Available: https://github.com/bmschmidt/wordVectors
8 S. J. Ko, and J. H. Lee, "Weighted bayesian automatic document categorization based on association word knowledge base by apriori algorithm," Journal of the Korea Multimedia society, Vol. 4, No. 2, pp. 171-181, Apr. 2001.
9 M. Andriy, and G. Hinton. "A scalable hierarchical distributed language mode," in Proceeding of Neural Information Processing Systems 21, Vancouver: British Columbia, pp.1081-1088, Dec. 2008.
10 Y. Kim, "A study on design and implementation of personalized information recommendation system based on apriori algorithm," Journal of Korean BIBLIA Society for Library and Information Science, Vol. 23, No. 4, pp. 283-308, Dec. 2012.   DOI
11 H. S. Kim, S. C. Park, and S. H. Kim, "Measurement of document similarity using term/term-pair features and neural Network," Journal of Korean Institute of Information Scientists and Engineers, Vol. 31 No. 12, pp. 1660-1671, Oct. 2004.
12 D. Y. Park, "Pushing ahead context and project of capability education using national competency standards," Korea Research Institute for Vocational Education and Training, The Human Resources Development Review, Vol. 16, No. 3, pp. 52-71, Sep. 2013.
13 B. Chris, Web application with R using shiny, 1st ed. Birmingham, England: Packt Publishing, pp.47-72, Oct. 2013.