Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2012.19B.3.189

Query Expansion Based on Word Graphs Using Pseudo Non-Relevant Documents and Term Proximity  

Jo, Seung-Hyeon (전북대학교 컴퓨터공학부)
Lee, Kyung-Soon (전북대학교 컴퓨터공학부 영상정보신기술연구센터)
Abstract
In this paper, we propose a query expansion method based on word graphs using pseudo-relevant and pseudo non-relevant documents to achieve performance improvement in information retrieval. The initially retrieved documents are classified into a core cluster when a document includes core query terms extracted by query term combinations and the degree of query term proximity. Otherwise, documents are classified into a non-core cluster. The documents that belong to a core query cluster can be seen as pseudo-relevant documents, and the documents that belong to a non-core cluster can be seen as pseudo non-relevant documents. Each cluster is represented as a graph which has nodes and edges. Each node represents a term and each edge represents proximity between the term and a query term. The term weight is calculated by subtracting the term weight in the non-core cluster graph from the term weight in the core cluster graph. It means that a term with a high weight in a non-core cluster graph should not be considered as an expanded term. Expansion terms are selected according to the term weights. Experimental results on TREC WT10g test collection show that the proposed method achieves 9.4% improvement over the language model in mean average precision.
Keywords
Query Expansion; Term Proximity; Pseudo Relevant Documents; Pseudo Non-relevant Documents; Word Graph;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 L. Page, S. Brin, R. Motowani, and T. Winograd, "The PageRank Citation Ranking: Bringing Order to the Web", Unpublished manuscript, Stanford University. 1998.
2 V. Lavrenko, and W.B. Croft, "Relevance-based language models", In Proc. of 24th ACM SIGIR Conference (SIGIR2001), pp.120-127, 2001.
3 S. Hassan, and C. Banea, "Random-Walk Term Weighting for Improved Text Classification", In Proc. of TextGraphs: 2nd Workshop on Graph Based Methods for Natural Language Processing. pp.53-60, 2006.
4 장계훈, 이경순. "핵심 질의 클러스터와 단어 근접도를 이용한 문서 검색 정확률 향상 기법", 정보처리학회논문지B 제 17권 제 5호, pp.399-404, 2010.
5 장계훈, 조승현, 이경순. "단어 근접도를 반영한 단어 그래프 기반 질의 확장", 제34회 한국정보처리학회 추계학술발표대회, 2010.
6 T. Strohman, D. Metzler, H. Turtle, and W.B. Croft, "Indri: A language model-based search engine for complex queries", In Proc. International Conference on Intelligence Analysis. http://www.lemurproject.org. 2005.
7 K.-S. Lee, W.B. Croft, and J. Allan, "A Cluster-Based Resampling Method for Pseudo-Relevance Feedback", In Proc. of 31st ACM SIGIR Conference(SIGIR2008), pp.235-242, 2008.
8 C. Buckley, M. Mitra, J. Walz, and C. Cardie, "Using Clustering and SuperConcepts within SMART: TREC 6", In Proc. of the Sixth Text REtrieval Conference(TREC-6), pp.500-240, 1995.
9 M. Bendersky and W.B. Croft, "Discovering Key Concepts in Verbose Queries", In Proc 31th ACM SIGIR Conference (SIGIR2008), pp.491-498, 2008.
10 A. Hulth, "Improved automatic keyword extraction given more linguistic knowledge", In Proc. Empirical Mothods in Natural Language Processing(EMNLP2003), pp.216-223, 2003.
11 G. Kumaran and J. Allan, "Effective and Efficient User Interaction for Long Queries", In Proc 31th ACM SIGIR Conference(SIGIR2008), pp.11-18, 2008.
12 G. Kumaran, and J. Allan, "A case for shorter queries and helping users create them", In Proc. HLT-EMNLP Conference. pp.220-227, 2007.
13 Y. Lv and C.X. Zhai, "Positional Language Model for Information Retrieval", In Proc. of 32nd ACM SIGIR Conference (SIGIR2009). pp.299-306, 2009.
14 R. Mihalcea, and P. Tarau, "TextRank-Bringing Order into Texts", In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pp.404-411, 2004.
15 Y. Lv and C.X. Zhai, "Positional Relevance Model for Pseudo-Relevance Feedback", In Proc. of 33rd ACM SIGIR Conference (SIGIR2010), pp.579-586, 2010.
16 Q. Mei, D. Zhang, and C.X. Zhai, "A General Optimization Framework for Smoothing Language Models on Graph Structures", In Proc. of 31st ACM SIGIR Conference (SIGIR2008), pp.611-618, 2008.
17 Y. Huang, L. Sun, and J.Y. Nie, "Smoothing Document Language Model with Local Word Graph", In Proc. of 18th ACM Conference on Information and Knowledge Management (CIKM2009), pp.1943-1946, 2009.
18 V. Lavrenko and W.B. Croft, "Relevance-based Language Models", In Proc. of 24th ACM SIGIR Conference(SIGIR2001). pp.120-127, 2001.
19 J. J. Rocchio, "Relevance feedback in information retrieval. In The SMART Retrieval System - Experiments in Automatic Document Processing", Prentice Hall. pp.313-323, 1971.