DOI QR코드

DOI QR Code

Query Extension of Retrieve System Using Hangul Word Embedding and Apriori

한글 워드임베딩과 아프리오리를 이용한 검색 시스템의 질의어 확장

  • 신동하 (가천대학교 에너지 IT학과) ;
  • 김창복 (가천대학교 에너지 IT학과)
  • Received : 2016.11.25
  • Accepted : 2016.12.27
  • Published : 2016.12.31

Abstract

The hangul word embedding should be performed certainly process for noun extraction. Otherwise, it should be trained words that are not necessary, and it can not be derived efficient embedding results. In this paper, we propose model that can retrieve more efficiently by query language expansion using hangul word embedded, apriori, and text mining. The word embedding and apriori is a step expanding query language by extracting association words according to meaning and context for query language. The hangul text mining is a step of extracting similar answer and responding to the user using noun extraction, TF-IDF, and cosine similarity. The proposed model can improve accuracy of answer by learning the answer of specific domain and expanding high correlation query language. As future research, it needs to extract more correlation query language by analysis of user queries stored in database.

한글 워드임베딩은 명사 추출과정을 거치지 않으면, 학습에 필요하지 않은 단어까지 학습하게 되어 효율적인 임베딩 결과를 도출할 수 없다. 본 연구는 한글 워드임베딩, 아프리오리, 텍스트 마이닝을 이용하여, 특정 도메인에서 질의어 확장에 의해 보다 효율적으로 답변을 검색할 수 있는 모델을 제안하였다. 워드임베딩과 아프리오리는 질의어에 대해서 의미와 맥락에 따라 연관 단어를 추출하여, 질의어를 확장하는 단계이다. 한글 텍스트 마이닝은 명사 추출, TF-IDF, 코사인 유사도를 이용하여, 유사답변 추출과 사용자에게 답변하는 단계이다. 제안모델은 특정 도메인의 답변을 학습하고, 연관성 높은 질의어를 확장함으로서 답변의 정확성을 높일 수 있다. 향후 연구과제로서, 데이터베이스에 저장된 사용자 질의를 분석하고, 보다 연관성 높은 질의어를 추출하는 연구가 필요하다.

Keywords

References

  1. Y. A Kim, G. W. Park, "An efficient extended query suggestion system using the analysis of users' query patterns," Korea Institute of Communication Sciences, Vol. 37, No. 7, pp. 619-626, June. 2012. https://doi.org/10.7840/KICS.2012.37.7C.619
  2. Z. Mai, G. Pant, and O. R. Liu Sheng, "Interest-based personalized search," ACM Transactions on Information systems, Vol. 25, No. 1, pp. 1-38, Feb. 2007. https://doi.org/10.1145/1198296.1198297
  3. C. Buckley, G. Salton, and J. Allan, "The effect of adding relevance information in a relevance feedback environment," in Proceedings of 17th annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dublin : Ireland, pp. 292-300, July. 1994.
  4. J. Garten, K. Sagae, V. Ustun, "Combining distributed vector representations for words," in Proceedings of NAACL-HLT, Denver: CO, pp. 95-101, May. 2015.
  5. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in proceeding of Neural Information Processing Systems 26, Lake Tahoe: NV, pp. 3111-3119, Dec. 2013.
  6. M. Tomas, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in Proceeding of International Conference on Learning Representations, Scottsdale: AZ, pp. 01-09, May. 2013.
  7. B. Chris, Web application with R using shiny, 1st ed. Birmingham, England: Packt Publishing, pp.47-72, Oct. 2013.
  8. [Internet]. Available: https://github.com/bmschmidt/wordVectors
  9. M. Andriy, and G. Hinton. "A scalable hierarchical distributed language mode," in Proceeding of Neural Information Processing Systems 21, Vancouver: British Columbia, pp.1081-1088, Dec. 2008.
  10. Y. Kim, "A study on design and implementation of personalized information recommendation system based on apriori algorithm," Journal of Korean BIBLIA Society for Library and Information Science, Vol. 23, No. 4, pp. 283-308, Dec. 2012. https://doi.org/10.14699/kbiblia.2012.23.4.283
  11. S. J. Ko, and J. H. Lee, "Weighted bayesian automatic document categorization based on association word knowledge base by apriori algorithm," Journal of the Korea Multimedia society, Vol. 4, No. 2, pp. 171-181, Apr. 2001.
  12. H. S. Kim, S. C. Park, and S. H. Kim, "Measurement of document similarity using term/term-pair features and neural Network," Journal of Korean Institute of Information Scientists and Engineers, Vol. 31 No. 12, pp. 1660-1671, Oct. 2004.
  13. D. Y. Park, "Pushing ahead context and project of capability education using national competency standards," Korea Research Institute for Vocational Education and Training, The Human Resources Development Review, Vol. 16, No. 3, pp. 52-71, Sep. 2013.