DOI QR코드

DOI QR Code

A Method for Precision Improvement Based on Core Query Clusters and Term Proximity

핵심질의 클러스터와 단어 근접도를 이용한 문서 검색 정확률 향상 기법

  • 장계훈 (전북대학교 컴퓨터공학과) ;
  • 이경순 (전북대학교 컴퓨터공학부/영상정보신기술연구센터)
  • Received : 2010.07.21
  • Accepted : 2010.10.25
  • Published : 2010.10.31

Abstract

In this paper, we propose a method for precision improvement based on core clusters and term proximity. The method is composed by three steps. The initial retrieval documents are clustered based on query term combination, which occurred in the document. Core clusters are selected by using proximity between query terms. Then, the documents in core clusters are reranked based on context information of query. On TREC AP test collection, experimental results in precision at the top documents(P@100) show that the proposed method improved 11.2% over the language model.

본 논문에서는 상위 검색결과 문서의 정확률을 향상시키기 위하여 핵심질의 클러스터와 단어 근접도를 이용해 재순위화하는 방법을 제안한 다. 언어모델에 의한 초기 검색결과를 상위 문서에 대해 발생한 질의어휘 조합을 기반으로 문서를 클러스터링한다. 질의어휘 조합 클러스터에 대해 질의어휘 사이의 근접도를 이용하여 핵심질의 클러스터를 선택한다. 질의의 문맥정보를 이용해 핵심질의 클러스터의 문서를 재순위화한다. 뉴스집합인 TREC AP 컬렉션에 대해 언어모델과 제안한 방법의 문서 정확률을 비교한 결과 제안방법이 언어모델에 비해 상위 100개 문서(P@100)에서 11.2% 성능이 향상되었다.

Keywords

References

  1. Balinski, J., Danilowicz, C. 2004. Re-ranking method based on inter-document distances. Information Processing and Management, 41(2005)759-775. https://doi.org/10.1016/j.ipm.2004.01.006
  2. Sakai, T., Manabe, T., Koyama, M. 2005. Flexible pseudo-relevance feedback via selective sampling. ACM Transaction on Asian Language Information Processing (TALIP), 4(2), pp.111-135. https://doi.org/10.1145/1105696.1105699
  3. Collins-Thompson, K., Callan, J. 2007. Estimation and Use of Uncertainty in Pseudo-relevance Feedback. In Proc. of 30th ACM SIGIR on Research and Development in Information Retrieval. pp.303-310.
  4. Lavrenko, V., Croft, W.B. 2001. Relevance-based language models. In Proc. of 24th ACM SIGIR on Research and Development in Information Retrieval. pp.120-127.
  5. Bendersky, M., Croft, W.B. 2008. Discovering Key Concepts in Verbose Queries. In Proc. of 31st ACM SIGIR on Research and Development in Information Retrieval. pp.491-498.
  6. Kumaran, G., Allan, J. 2008. Effective and Efficient User Interaction for Long Queries. In Proc. of 31st ACM SIGIR on Research and Development in Information Retrieval. pp.11-18.
  7. Lv, Y., Zhai, C.X. 2009. Positional Language Models for Information Retrieval. In Proc. of 32nd ACM SIGIR on Research and Development in Information Retrieval. pp.299-306.
  8. Zhao, J., Yun, Y. 2009. A Proximity Language Model for Information Retrieval. In Proc. of 32nd ACM SIGIR on Research and Development in Information Retrieval. pp.291-298.
  9. Seo, J.W., Jeon, J.W. 2009. High Precision Retrieval Using Relevance-Flow Graph. In Proc. of 32nd ACM SIGIR on Research and Development in Information Retrieval. pp 694-695.
  10. Ponte, J.M., Croft, W.B. 1998. A Language Modeling Approach to Information Retrieval. In Proc. of 21st ACM SIGIR on Research and Development in Information Retrieval. pp.275-281.
  11. Strohman, T., Metzler, D., Turtle, H., and Croft, W.B. 2005. Indri: A language model-based search engine for complex queries. In proc. International Conference on Intelligence Analysis. http://www.lemurproject.org
  12. 신승은, 강유환, 오효정, 장명길, 박상규, 이재성, 서영훈, 2003. 문서필터링을 위한 질의어 확장과 가중치 부여기법. 정보처리학회 논문지. 제10-B권. 제7호. pp.743-750.