Mathematical Properties of the Formulas Evaluating Boolean Operators in Information Retrieval

정보검색에서 부울연산자를 연산하는 식의 수학적 특성

  • 이준호 (연구개발정보센터 정보시스템개발실) ;
  • 이기호 (연구개발정보센터 정보유통실) ;
  • 조영화 (연구개발정보센터 정보시스템부)
  • Published : 1995.06.01

Abstract

Boolean retrieval systems have been most widely used in the area of information retrieval due to easy implementation and efficient retrieval. Conventional Boolean retrieval systems. however, cannot rank retrieved documents in decreasing order of query-document similarities because they cannot compute similarity coefficients between queries and documents. Extended Boolean models such as fuzzy set. Waller-Kraft, Paice, P-Norm and Infinite-One have been developed to provide the document ranking facility. In extended Boolean models, the formulas evaluating Boolean operators AND and OR are an important component to affect the quality of document ranking. In this paper we present mathematical properties of the formulas, and analyse their effect on retrieval effectiveness. Our analyses show that P-Norm is the most suitable for achieving high retrieval effectiveness.

부울 검색 시스템은 구현이 용이하고 빠를 검색 시간을 제공하기 때문에, 오늘날 정보 검색 분야에서 가장 널리 사용되고 있다. 그러나 순수한 부울 검색 시스템은 문서값을 계산할 수 없기 때문에, 검색된 문서들을 질의를 만족하는 정도에 따라 정렬 할 수 없다. 부울 검색 시스템에 순위 결정 기능을 부여하기 위하여 퍼지 집합, Waller-Kraft, Paice, P-Norm, Infinite-One과 같은 확장된 부울 모델들이 개발되어 왔다. 이들 모델에서 부울 연산자 AND와 OR에 대한 계산식은 순위 결정의 성능을 결정하는 중요한 요소이다. 본 논문에서는 부울 연산자 계산식의 수학적 특성을 제시하고, 이들이 검색효과에 미치는 영향을 분석한다. 분석 결과는 P-Norm 모델이 높은 검색 효과를 얻기에 가장 적합함을 보여준다.

Keywords

References

  1. Journal of the American Society for Information Science v.31 no.4 Fuzzy requests: an approach to weighted Boolean searches Bookstein,A.
  2. Information Processing & Management v.17 no.5 A general model of query processing in information retrieval system Buell,D.A.
  3. Information Retrieval Data Structures & Algorithms Extended Boolean models Fox,E.A.;Betrabet,S.;Koushik,M.;Lee,W.;Frakes,W.B.(ed.);Yates,R.B.(ed.)
  4. Information Processing Letters v.46 no.5 Analysis of fuzzy operators for high quality information retrieval Kim,M.H.;Lee,J.H.;Lee,Y.J.
  5. Proceedings of the 19th Euromicro Conference Enhancing the fuzzy set model for high quality document rankings Lee,J.H.;Kim,M.H.;Lee,Y.J.
  6. Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval On the evaluation of Boolean operators in the extended Boolean retrieval framework Lee,J.H.;Kim,W.Y.;Kim,M.H.;Lee,Y.J.
  7. Information Processing & Management v.30 no.1 Ranking documents in thesaurus-based Boolean retrieval systems Lee,J.H.;Kim,M.H.;Lee,Y.J.
  8. Information Technology: Research and Development v.3 no.1 Soft evaluation of Boolean search queries in information retrieval systems Paice,C.P.
  9. Information Processing & Management v.15 no.5 Fuzzy set theoretical approach to document retrieval Radecki,T.
  10. Journal of the American Society for Information Science v.27 An approach to associative retrieval through the theory of fuzzy sets Sachs,W.M.
  11. Communications of the ACM v.26 no.11 Extended Boolean information retrieval Salton,G.;Fox,E.A.;Wu,H.
  12. PhD thesis, Cornell University Aspects of the pnorm model of information retrieval: syntactic query generation, efficiency, and theoretical properties Smith,M.E.
  13. Information Processing & Management v.15 A mathematical model of a weighted Boolean retrieval system Waller,W.G.;Kraft,D.H.
  14. Fuzzy sets, decision making, and expert systems Zimmermann,H.J.
  15. Fuzzy set theory and its applications(2nd edition) Zimmermann,H.J.