DOI QR코드

DOI QR Code

Mining Search Keywords for Improving the Accuracy of Entity Search

엔터티 검색의 정확성을 높이기 위한 검색 키워드 마이닝

  • 이선구 (다음소프트 마이닝랩) ;
  • 온병원 (군산대학교 통계컴퓨터과학과) ;
  • 정수목 (삼육대학교 컴퓨터학부)
  • Received : 2016.02.01
  • Accepted : 2016.05.11
  • Published : 2016.09.30

Abstract

Nowadays, entity search such as Google Product Search and Yahoo Pipes has been in the spotlight. The entity search engines have been used to retrieve web pages relevant with a particular entity. However, if an entity (e.g., Chinatown movie) has various meanings (e.g., Chinatown movies, Chinatown restaurants, and Incheon Chinatown), then the accuracy of the search result will be decreased significantly. To address this problem, in this article, we propose a novel method that quantifies the importance of search queries and then offers the best query for the entity search, based on Frequent Pattern (FP)-Tree, considering the correlation between the entity relevance and the frequency of web pages. According to the experimental results presented in this paper, the proposed method (59% in the average precision) improved the accuracy five times, compared to the traditional query terms (less than 10% in the average precision).

최근 Google Product Search와 Yahoo Pipes와 같은 엔터티 검색이 각광을 받고 있다. 특정 엔터티와 관련 있는 웹 페이지를 검색하기 위해 엔터티 검색이 사용된다. 그러나 엔터티(예를 들면, 차이나타운 영화)가 다양한 의미(예를 들면, 차이나타운 영화, 차이나타운 음식점, 인천 차이나타운 등)을 포함하고 있다면 엔터티 검색의 정확성은 크게 떨어진다. 이러한 문제를 해결하기 위해, 본 논문에서는 웹 페이지의 빈도수와 엔터티 관련성 간의 상관관계를 고려하여, Frequent Pattern (FP)-Tree에 기반을 둔 질의어의 중요도를 측정하고 베스트 질의어를 제안하는 새로운 방안을 제안한다. 본 논문의 실험 결과에 의하면, 기존 방안의 정확도가 10% 미만인데 비해, 제안 방안의 평균 정확도는 59%로, 약 5배 향상시킨다.

Keywords

References

  1. K. Balog, M. Bron, and M. Rijke, "Query modeling for entity search based on terms, categories, and examples," The ACM Transactions on Information Systems, Vol.29, No.4, pp.22, 2011.
  2. R. Blanco, P. Mika, and S. Vigna, "Effective and efficient entity search in RDF data," in Proceedings of the 10 th International Semantic Web Conference, Bonn, Germany, 2011.
  3. T. Cheng, X. Yan, and K. Chang, "Supporting entity search: A large-scale prototype search engine," in Proceedings of ACM SIGMOD/PODS Conference, Beijing, China, 2007.
  4. T. Cheng and K. Chang, "Entity search engine: Towards agile best-effort information integration over the web," in Proceedings of the 3 rd Biennial Conference on Innovative Data Systems Research, CA, USA, 2007.
  5. T. Cheng, X. Yan, and K. Chang, "EntityRank: Searching entities directly and holistically," in Proceedings of the 33 rd International Conference on Very Large Data Bases, Vienna, Austria, 2007.
  6. S. Endrullis, A. Thor, and E. Rahm, "Entity search strategies for mashup applications," in Proceedings of IEEE 28 th International Conference on Data Engineering, Washington DC, USA, 2012.
  7. E. Elmacioglu, Y. Tan, S. Yan, M. Kan, and D. Lee, "PSNUS: Web people name disambiguation by simple clustering with rich features," in Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech, 2007.
  8. G. Hu, J. Liu, H. Li, Y. Cao, J. Nie, and J. Gao, "A supervised learning approach to entity search," Information Retrieval Technology, Vol.4182, pp.54-66, 2006. https://doi.org/10.1007/11880592_5
  9. M. Ikeda, S. Ono, I. Sato, M. Yoshida, and H. Nakagawa, "Person name disambiguation on the web by two-stage clustering," in Proceedings of the 18 th International Conference on World Wide Web, Madrid, Spain, 2009.
  10. B. Jansen and A. Spink, "An analysis of web documents retrieved and viewed," in Proceedings of the 16 th International Conference on Internet Computing and Big Data, NV, USA, 2003.
  11. J. Lee and S. Cheon, "Recommendation query ranking system for the search query expansion," Journal of KIISE, Vol.36, No.2(c), 2009.
  12. S. Yoon, "Using query word senses and user feedback to improve precision of search engine," Journal of Korea Society for Information Management, Vol.26, No.4, pp.81-91, 2009.