DOI QR코드

DOI QR Code

Query Expansion based on Word Sense Community

유사 단어 커뮤니티 기반의 질의 확장

  • Received : 2014.07.21
  • Accepted : 2014.09.29
  • Published : 2014.12.15

Abstract

In order to assist user's who are in the process of executing a search, a query expansion method suggests keywords that are related to an input query. Recently, several studies have suggested keywords that are identified by finding domains using a clustering method over the documents that are retrieved. However, the clustering method is not relevant when presenting various domains because the number of clusters should be fixed. This paper proposes a method that suggests keywords by finding various domains related to the input queries by using a community detection algorithm. The proposed method extracts words from the top-30 documents of those that are retrieved and builds communities according to the word graph. Then, keywords representing each community are derived, and the represented keywords are used for the query expansion method. In order to evaluate the proposed method, we compared our results to those of two baseline searches performed by the Google search engine and keyword recommendation using TF-IDF in the search results. The results of the evaluation indicate that the proposed method outperforms the baseline with respect to diversity.

질의 확장은 입력된 질의와 관련된 키워드를 사용자에게 제시하여 검색 활동에 도움을 주는 방법이다. 최근에는 사용자가 검색한 내용에서 군집화 방법을 이용하여 도메인을 찾고 키워드를 제시하는 연구가 많이 이루어졌다. 하지만 군집화 방법은 군집의 개수를 정해야하기 때문에 다양한 도메인을 나타내는데 적절하지 않다. 따라서 본 논문은 커뮤니티 인지 알고리즘으로 검색 문서에서 질의마다 다양한 수의 도메인을 찾고 키워드로 선택하여 제시하는 방법을 제안한다. 이를 위해 사용자가 검색한 결과 중 상위 30개 문서를 대상으로 단어를 추출하여 그래프 기반의 커뮤니티를 만들고, 각 커뮤니티에서 키워드를 추출하여 이를 질의 확장에 이용하였다. 본 논문에서 제안한 방법은 구글 검색 엔진과 검색된 문서의 tf-idf를 이용한 키워드 추천 방법과 비교하였다. 제안한 방법이 다른 비교 대상들에 비해 더 다양한 키워드를 추천할 수 있었다.

Keywords

Acknowledgement

Grant : 모바일 플랫폼 기반 계획 및 학습 인지 모델 프레임워크 기술 개발

Supported by : 지식경제부

References

  1. R. Baeza-Yates, C. Hurtado, and M. Mendoza, "Query Recommendation using Query Logs in Search Engines," Proc. of EDBT Workshops, pp. 588-596, 2004.
  2. H. Cui, J. R. Wen, J. Y. Nie, and W. Y. Ma, "Probabilistic Query Expansion Using Query Logs," Proc. of the 11th WWW, pp. 325-332, 2002.
  3. D. Bermhard, "Query Expansion based on Pseudo Relevance Feedback from Definition Clusters," Proc. of the 23rd Coling, pp. 54-62, 2010.
  4. H. Hu, M. Zhang, Z. He, P. Wang, and W. Wang, "Diversifying Query Suggestions by using Topics from Wikipedia," Proc. of WI-IAT, pp. 139-146, 2013.
  5. Y. Xu, G. JF. Jones, and B. Wang, "Query Dependent Pseudo-Relevance Feedback based on Wikipedia," Proc. of the 32nd SIGIR, pp. 59-66, 2009.
  6. L. Zhao, L. Wu, and X Huang, "Using query expansion in graph-based approach for query-focused multi-document summarization," Journal of Information Processing & Management, Vol. 45, No. 1, pp. 35-41, 2009. https://doi.org/10.1016/j.ipm.2008.07.001
  7. D. Andrzejewski and D. Buttler, "Latent Topic Feedback for Information Retrieval," Proc. of the 17th SIGKDD, pp. 600-608, 2011.
  8. Z. Liu, S. Natarajan, and Y. Chen, "Query Expansion Based on Clustered Results," Proc. of the 37th VLDB, Vol. 4, No. 6, pp. 350-361, 2011.
  9. J. Chen, O. R. Zaiane, and R. Goebel, "An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities," Proc. of WI-AIT, Vol. 01, pp. 725-729, 2008.
  10. C.-U. Kwak, H.-G. Yoon, and S.-B. Park, "Query Expansion based on Word Sense Community," Proc. of KCC, pp. 656-658, 2014. (in Korean)
  11. A. Clauset, M. EJ. Newman, and C. Moore, "Finding community structure in very large networks," Journal of Physical review E, Vol. 70, No. 6, pp. 66-111, 2004.
  12. R. Mihalcea, and P. Tarau, "TextRank: Bringing Order into Texts," Proc. of EMNLP, pp. 404-411, 2004.
  13. G. Koutrika, Z. M. Zadeh, and H. Garcia-Molina, "Data Clouds: Summarizing Keyword Search Results over Structured Data," Proc. of EDBT, pp. 391-402, 2009.