Relevance Feedback Method of an Extended Boolean Model using Hierarchical Clustering Techniques

계층적 클러스터링 기법을 이용한 확장 불리언 모델의 적합성 피드백 방법

  • 최종필 (아주대학교 정보통신연구소) ;
  • 김민구 (아주대학교 컴퓨터공학과)
  • Published : 2004.10.01

Abstract

The relevance feedback process uses information obtained from a user about an initially retrieved set of documents to improve subsequent search formulations and retrieval performance. In the extended Boolean model, the relevance feedback Implies not only that new query terms must be identified, but also that the terms must be connected with the Boolean AND/OR operators properly Salton et al. proposed a relevance feedback method for the extended Boolean model, called the DNF (disjunctive normal form) method. However, this method has a critical problem in generating a reformulated queries. In this study, we investigate the problem of the DNF method and propose a relevance feedback method using hierarchical clustering techniques to solve the problem. We show the results of experiments which are performed on two data sets: the DOE collection in TREC 1 and the Web TREC 10 collection.

적합성 피드백 방법은 다음 검색 질의어와 검색 성능을 향상시키기 위해 사용자로부터 획득된 정보를 사용한다. 일반적으로 적합성 피드백 방법은 사용자로부터 획득된 정보를 새로운 질의어에 추가될 새로운 단어를 찾거나 질의어에 존재하는 단어의 가중치를 조정하는데 사용한다. 그러나 확장 불리언 검색모델에서 적합성 피드백은 이것들뿐만 아니라 질의어에 있는 단어들을 적절하게 불리언 연산자(AND/OR)로 연결시켜야 한다. Salton과 그의 동료들은 확장 불리언 모델을 위한 DNF(disjunctive normal form) 방법이라 불리는 적합성 피드백 방법을 제안하였다. 그렇지만 이 방법은 질의어를 재구성할 때 심각한 문제점을 갖고 있다. 이 논문에서는 DNF 방법의 문제점을 조사하고 이러한 문제점을 극복하기 위해 계층적 클러스터링 기법을 이용한 적합성 피드백 방법을 제안한다. 그리고 두개의 실험 데이타 집합인 TREC 1 의 DOE 컬렉션과 Web TREC 10 컬렉션을 이용하여 제안한 방법의 우수성을 보였다.

Keywords

References

  1. Ide, E. New experiments in relevance feedback. In Salton, G., ed., The Smart System - Experiments in Automatic Document Processing, pp. 337-354. Englewood Cliffs, NJ: Prentice-Hall Inc, 1971
  2. Rocchio, J. J. Jr. Relevance feedback in information retrieval. In Salton, G., ed., The Smart System - Experiments in Automatic Document Processing, pp. 313-323. Englewood Cliffs, NJ: Prentice-Hall Inc, 1971
  3. Salton, G. and Buckely, C. Improving retrieval performance by relevance feedback. J, of the American Society for Information Science, 41(4): pp. 288-297, 1990 https://doi.org/10.1002/(SICI)1097-4571(199006)41:4<288::AID-ASI8>3.0.CO;2-H
  4. Bookstein, A. Fuzzy requests: An approach to weighted Boolean searches, J, ASlS, Vol 31, No. 4, July, 1980, pp. 275-279
  5. Salton, G., Fox, E. A, and Wu, H. Extended Boolean information retrieval, Vol. 36, No. 11, December 1983, Communication of the ACM, pp. 1022-1036 https://doi.org/10.1145/182.358466
  6. Waller, W. G. and Kraft, D. H. A mathematical model for a weighted Boolean retrieval system. Information Processing and Management, Vol 15, No.5, 1979, pp. 235-245 https://doi.org/10.1016/0306-4573(79)90030-X
  7. Wong, S.K.M., Ziarko, W., Raghavan.V. V., and Wong, P. C. N. Extended Boolean query processing in the generalized vector space Model, Information Systems Vol. 14, No.1, pp. 47-63, 1989 https://doi.org/10.1016/0306-4379(89)90024-0
  8. Joon Ho Lee. Properties of Extended Boolean Models in Information Retrieval. In Proceedings of ACM-SIGIR Conference, 1994, pp. 182-190
  9. Salton, G., Fox, E. A, and Voorhees, E. Advanced feedback methods in information retrieval. J, of the American Society for Information Science, 36(3): pp. 200-210, 1985 https://doi.org/10.1002/asi.4630360311
  10. Alsaffar, A. H., Deogun, J. S., Raghavan, V. V., and Sever, H. Concept-based retrieval with minimal term sets. In Z. W. Ras and A. Skowon, editors, Foundations of Intelligent Systems: Eleventh Int'l Symposium, ISMIS'99 proceedings, pp. 114-122. Springer, Warsaw, Poland, Jun, 1999
  11. Raghavan, V. V. and Wong, S. A critical analysis of the vector space model for information retrieval. Journal of the American Society for Information Science 37(5): pp. 279-287, 1986 https://doi.org/10.1002/(SICI)1097-4571(198609)37:5<279::AID-ASI1>3.0.CO;2-Q
  12. Salton, G. and McGill, M. J. Introduction to Modern Information Retrieval. McGraw Hill, New York, 1983
  13. J. T. Rickman, Design Considerations for a Boolean Search system with Automatic Relevance Feedback Processing, Proc. National Meeting, Assoc. for Computing Machiner, New York, August 1971, p. 478-481 https://doi.org/10.1145/800193.569959
  14. M. Dillon and J. Desper, Automatic relevance feedback in Boolean retrieval system, J. Documentation 1980. 36, 197-208 https://doi.org/10.1108/eb026696
  15. M. Dillon and J. Ulmschncider and J. Desper, A prevalence formula for automatic relevance feedback in Boolean retrieval system, Infor. Proc. Management 1983, 19(1), 27-36
  16. A.K. Jain and R.C. Dubes. Algorithms for clustering Data, PrenticeHall, Upper Saddle River, NJ, 1988
  17. Efthimis N. Efthimiadis. Query Expansion. Annual Review of Information System and Technology, v31, pp. 121-187, 1996
  18. Robertson, Stephen E., Sparck Jones, Karen. Relevance Weighting of Search Terms. Journal of the American Society for Information Science, 27(3), pp. 129-146, 1976 https://doi.org/10.1002/asi.4630270302
  19. Robertson, Stephen E. On Relevance Weight Estimation and Query Expansion. Journal of, 42(3), pp. 182-188, 1986
  20. Porter M.F. and Galpin V. Relevance Feedback in a Public Access Catalogue for a Research Library: Muscat at the Scott Polar Research Institute. Program, 22(1), pp. 1-20, 1988 https://doi.org/10.1108/eb046983