DOI QR코드

DOI QR Code

Query Expansion based on Word Graph using Term Proximity

질의 어휘와의 근접도를 반영한 단어 그래프 기반 질의 확장

  • 장계훈 (전북대학교 컴퓨터공학과) ;
  • 이경순 (전북대학교 컴퓨터공학부/영상정보신기술연구센터)
  • Received : 2011.05.18
  • Accepted : 2011.07.07
  • Published : 2012.02.29

Abstract

The pseudo relevance feedback suggests that frequent words at the top documents are related to initial query. However, the main drawback associated with the term frequency method is the fact that it relies on feature independence, and disregards any dependencies that may exist between words in the text. In this paper, we propose query expansion based on word graph using term proximity. It supplements term frequency method. On TREC WT10g test collection, experimental results in MAP(Mean Average Precision) show that the proposed method achieved 6.4% improvement over language model.

잠정적 적합성 피드백모델은 초기 검색 결과의 상위에 순위화된 문서를 적합 문서라 가정하고, 상위문서에서 빈도가 높은 어휘를 확장 질의로 선택한다. 빈도수를 이용한 질의 확장 방법의 단점은 문서 안에서 포함된 어휘들 사이의 근접도에 상관없이 각 어휘를 독립적으로 생각한다는 것이다. 본 논문에서는 어휘빈도를 이용한 질의 확장을 대체할 수 있는 어휘 근접도를 반영한 단어 그래프 기반 질의 확장을 제안한다. 질의 어휘 주변에 발생한 어휘들을 노드로 표현하고, 어휘들 사이의 근접도를 에지의 가중치로 하여 단어 그래프를 표현한다. 반복된 연산을 통해 확장 질의를 선택함으로써 성능을 향상시키는 기법을 제안한다. 유효성 검증을 위해 웹문서 집합인 TREC WT10g 테스트 컬렉션에 대한 실험에서 언어모델 보다 MAP 평가 기준에서 6.4% 향상됨을 보였다.

Keywords

References

  1. Lavrenko, V., Croft, W.B. 2001. Relevance-based Language Models. In Proc. of 24th ACM SIGIR on Research and Development in Information Retrieval. pp.120-127.
  2. Collins-Thompson, K., Callan, J. 2007. Estimation and Use of Uncertainty in Pseudo-Relevance Feedback. In Proc. of 30th ACM SIGIR on Research and Development in Information Retrieval. pp.303-310.
  3. Sakai, T., Manabe, T., Koyama, M. 2005. Flexible Pseudo-Relevance Feedback via Selective Sampling. ACM Transaction on Asian Language Information Processing(TALIP), 4(2), pp.111-135. https://doi.org/10.1145/1105696.1105699
  4. Lv, Y., Zhai, C.X. 2009. Positional Language Models for Information Retrieval. In Proc. of 32nd ACM SIGIR on Research and Development in Information Retrieval. pp.299-306.
  5. Lv, Y., Zhai, C.X. 2010. Positional Relevance Model for Pseudo-Relevance Feedback. In Proc. of 33rd ACM SIGIR on Research and Development in Information Retrieval.
  6. Blanco, R., Lioma, C. 2007. Random Walk Term Weighting for Information. In Proc. of 30th ACM SIGIR on Research and Development in Information Retrieval.
  7. Huang, Y., Sun, L., Nie, J.Y., 2009. Smoothing Document Language Model with Local Word Graph. In Proc. of 18th ACM Conference on Information and Knowledge Management.
  8. Mei, Q., Zhang, D., Zhai, C.X., 2008. A General Optimization FrameWork for Smoothing Language Models on Graph Structures. In Proc. of 31st ACM SIGIR on Research and Development in Information Retrieval.
  9. Mihalcea, R., Tarau, P., 2004. TextRank-Bringing Order into Texts. In Proc. of the Conference on Empirical Methods in Natural Language Processing(EMNLP 2004).
  10. Zhao, J., Yun, Y. 2009. A Proximity Language Model for Information Retrieval. In Proc. of 32nd ACM SIGIR on Research and Development in Information Retrieval. pp.291-298.
  11. S. Hassan and C. Banea, 2006. Random-Walk Term Weighting for Improved Text Classification. In Proc. of TextGraphs: 2nd Workshop on Graph Based Methods for Natural Language Processing. ACL. pp.53-60.
  12. Page, L., Brin, S., Motowani, R. and Winograd, T. 1998. The PageRank Citation Ranking: Bringing Order to the Web, Unpublished manuscript, Stanford University.
  13. Ponte, J.M., Croft, W.B. 1998. A Language Modeling Approach to Information Retrieval. In Proc. of 21st ACM SIGIR on Research and Development in Information Retrieval. pp.275-281.
  14. Abdul-Jaleel, N., Allan, J., Croft, W.B., Diaz, F., Larkey, L., Li, X., Smucker, M.D., Wade, C. 2004. UMASS at TREC 2004-novelty and hard. In proc. Of the Thirteenth Text Retrieval Conference(TREC-13). pp.715-725.
  15. Strohman, T., Metzler, D., Turtle, H., and Croft, W.B. 2005. Indri: A Language Model-Based Search Engine for Complex Queries. In proc. International Conference on Intelligence Analysis. http://www.lemurproject.org