A Wikipedia-based Query Expansion Method for In-depth Blog Distillation

주제를 깊이 있게 다루는 블로그 피드 검색을 위한 위키피디아 기반 질의 확장 방법

  • Received : 2010.08.10
  • Accepted : 2010.10.07
  • Published : 2010.11.15

Abstract

This paper proposes a Wikipedia-based feedback method for in-depth blog distillation whose goal is to find blogs that represent in-depth thoughts or analysis on a given query. The proposed method uses Wikipedia articles which are relevant to the query. TREC Blogs08 collection which is a large-scale blog corpus and English Wikipedia dump were used for experiments, The proposed method significantly increased the retrieval performance including MAP over the conventional post based feedback method.

본 논문에서는 질의로 주어진 주제를 깊이 있게 다루는 블로그 검색을 위한 위키피디아 기반 질의 확장 방법을 제안한다. 제안된 방법은 질의와 연관된 위키피디아 문서를 질의 확장에 사용한다. 실험을 위해 대규모 블로그 실험 데이터인 TREC Blogs08 collection과 영문 위키피디아 데이터를 사용하였다. 실험 결과 제안된 방법은 기존의 블로그 포스트 기반 질의 확장 방법에 비해 MAP을 비롯한 검색 성능을 콘 폭으로 향상시켰다.

Keywords

References

  1. C. Macdonald, I. Ounis, and I. Soboroff, "Overview of TREC-2007 Blog track," in Proc. of TREC-2007, 2008.
  2. I. Ounis, C. Macdonald, and I. Soboroff, "Overview of TREC-2008 Blog track," in Proc. of TREC-2008, 2009.
  3. C. Macdonald, I. Ounis, and I. Soboroff, "Overview of TREC-2009 Blog track," in Proc. of TREC-2009, 2010.
  4. S. LI, H. Gao, H. Sun, F. Chen, O. Feng, S. Gao, H. Zhang, X. Li, C. Tan, W. Xu, G. Chen, and J. Guo, "A Study of Faceted Blog Disilltation - PRIS at TREC 2009 Blog Track," in Proc. of TREC-2009, 2010.
  5. M. Keikha, M. Carman, R. Gwadera, S. Gerani, I. Markov, G. Inches, A. A. Alidin, and F. Crestani, "University of Lugano at TREC 2009 Blog Track," in Proc. of TREC-2009, 2010.
  6. P. Jiang, Q. Yang, C. Zhang, and Z. Niu, "BIT at TREC 2009 Faceted Blog Distillation Task," in Proc. of TREC-2009, 2010.
  7. R. McCreadie, C. Macdonald, I. Ounis, J. Peng, R. L. T. Santos, "University of Glasgow at TREC 2009: Experiments with Terrier," in Proc. of TREC-2009, 2010.
  8. C. Zhai, J. Lafferty, "A study of smoothing methods for language models applied to information retrieval," ACM Trans. on Inf. Syst., vol.22, no.2, pp.179-214, April, 2004. https://doi.org/10.1145/984321.984322
  9. J. Lafferty, C. Zhai, "Document language models, query models, and risk minimization for information retrieval," in Proc. of the 24th ACM Annl. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2001, pp.111-119.
  10. C. Zhai, J. Lafferty, "Model-based feedback in the language modeling approach to information retrieval," in Proc. of the 10th ACM Conf. on Information and knowledge management, 2001, pp.403-410.
  11. A. P. Dempster, N. M. Laird and D. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," Journal of Royal Statist. Soc. B, vol.39, no.1, pp.1-38, 1977.
  12. Y. Lee, S.-H. Na, J. Kim, S.-H. Nam, H.-Y. Jung, J.-H. Lee, "KLE at TREC 2008 Blog Track: Blog Post and Feed Retrieval," in Proc. of TREC-2008, 2009.
  13. Y. Xu, G. J. F. Jones, B. Wang, "Query Dependent Pseudo-Relevance Feedback based on Wikipedia," in Proc. of the 32nd ACM Annl. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2009, pp.59-66.
  14. K. Jarvelin, J. Kekalainen, "Cumulated gain-based evaluation of IR techniques," ACM Trans. on Inf. Syst., vol.20, no.4, pp.422-446, Oct, 2002. https://doi.org/10.1145/582415.582418