서지 데이터베이스에서의 레코드 필드 선택이 검색 성능에 미치는 영향에 관한 연구

A Study of the Influence of Choice of Record Fields on Retrieval Performance in the Bibliographic Database

  • 발행 : 2001.12.01

초록

본 연구에서는 레코드필드 선택이 대규모 서지 데이터베이스 탐색시 미치는 검색 성능에 대하여 관찰하였다. 실험의 구성 요소는 크게 (1) 대규모 상업용 데이터베이스 INSPEC. (2) 관련된 레코드들 (target sets이라고 정의함). (3) 4개의 키워드가 한 세트로 이루어진 4개의 서로 다른 형태의 질의어들 (CT_TF, CT_IDF, UT_TF, UT_IDF), (4) 최적의 질의를 위한 알고리즘, (5) 가능한 모든 경우의 탐색식을 생성해내는 블리언 탐색식 생성기, 그리고 (6) 실제 운영중인 웹 기반의 검색 시스템으로 이뤄졌다. 실험에서의 레코드 필드 선택은 (1) Abstract, (2) Descriptors, (3) Identifiers, (4) 'Subject'(Descriptors + Identifiers). (5) Title. (6) 'All fields'로 정의하여 독립변수로 채택하였다. 검색 성능은 재현율, 정도율을 모두 반영한 Heine의 D측정에 의하여 평가 되었다. 본 연구에서 얻은 주된 결과로는 (1) 필드선택은 검색성능에 중요한 영향을 미치며, (2) 각 검색 성능에서 보여준 순위는 질의어에 따라 민감한 결과를 보였고 (3) 제목(Title)필드 선택이 D측정에서 최적의 결과를 보였다.

This empirical study investigated the effect of choice of record field(s) upon which to search on retrieval performance for a large operational bibliographic database. The query terms used in the study were identified algorithmically from each target set in four different ways: (1) controlled terms derived from index term frequency weights, (2) uncontrolled terms derived from index term frequency weights. (3) controlled terms derived from inverse document frequency weights, and (4) uncontrolled terms based on universe document frequency weights. Su potable choices of record field were recognised. Using INSPEC terminology, these were the fields: (1) Abstract. (2) 'Anywhere'(i.e., ail fields). (3) Descriptors. (4) Identifiers, (5) 'Subject'(i.e., 'Descriptors' plus Identifiers'). and (6) Title. The study was undertaken in an operational web-based IR environment using the INSPEC bibliographic database. The retrieval performances were evaluated using D measure (bivariate in Recall and Precision). The main findings were that: (1) there exist significant differences in search performance arising from choice of field, using 'mean performance measure' as the criterion statistic; (2) the rankings of field-choices for each of these performance measures is sensitive to the choice of query : and (3) the optimal choice of field for the D-measure is Title.

키워드

참고문헌

  1. Report no.:INSPEC/4 INSPEC:Comparative Evaluation of Index Language:Part 1:Design Aitchison,T.M.;Tracy,J.M.
  2. Report no.:INSPEC/5 INSPEC:Comparative Evaluation of Index Language:Part I1:Results Aitchison,T.M.(et al.)
  3. Journal of the American Society for Information Science v.47 no.1 Evaluating Interactive Systems in TREC Beaulieu,M.;Robertson,S.E.;Rasmussen,E.M.
  4. Information Processing & Management v.26 no.2 Full-Text Information Retrieval:Further Analysis and Clarification Blair,D.;Marosn,M.E.
  5. Journal of Documentation v.56 no.1 Experimental Components for the Evaluation of Interactive Information Retrieval Systems Borlund,P.;Ingwersen,P.
  6. Journal of Documentation v.53 no.3 The Development of a Method for the Evaluation of Interactive Information Retrieval Systems Borlund,P.;Ingwersen,P.
  7. Journal of the American Society for Information Science v.40 no.4 Entry Point Depth and Online Search Using a Controlled Vocabulary Boyce,B.R.;McLain,J.P.
  8. Journal of Documentation v.24 no.1 The Measure of Information Retrieval Effectiveness Proposed by Swetss Brookes,B.C.
  9. Journal of Internet Cataloging v.2 no.3-4 Search Engines for the World Wide Web:An Evaluation of Recent Developments Clarke,S.J.
  10. The Methodology of Evaluation of Operational Information Retrieval Systems based on a Test of MEDLARS Cleverdon,C.W.
  11. Journal of Information Science v.8 no.1 Theory and Explanation in Information Retrieval Research Ellis,D.
  12. Journal of the American Society for Information Science v.47 no.1 The Dilemma of Measurement in Information Retrieval Research Ellis,D.
  13. Information Retrieval & Library Automation v.33 no.5 Online World: the Bumpy Ride of the Web Engine Hattery,M.
  14. paper presented in SIGIR 2000, Athens, July 2000 Describing Query Expansion using Logic-induced Vectors of Performance Measures Heine,M.H.
  15. Proceedings of the MIRA `99:Final MIRA Conference on Information Retrieval Evaluation, Glasgow, 14-16 April 1999 Reassessing and Extending the Precision and Recall Concepts. Revised version of Time to dump `P and R`? Heine,M.H.
  16. Workshop on Logical and Uncertainty Models for Information Systems of the Fifth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty Measuring the Effects of AND, OR and NOT operators in Document Retrieval Systems using Directed Line Segments Heine,M.H.
  17. Information Technology:Research and Development v.3 no.2 Information Retrieval from Classical Databases from a Signal-Detection Standpoint:A Review Heine,M.H.
  18. Journal of Informatics v.2 no.1 The Signal-Detection Model of Information Retrieval Heine,M.H.
  19. Information Storage and Retrieval v.9 no.3 Distance Between Sets as an Objective Measure of Retrieval Effectiveness Heine,M.H.
  20. Journal of Documentation v.29 no.1 The Inverse Relationship of Precision and Recall in terms of the Swet`s Model Heine,M.H.
  21. Online v.22 no.3 How to Do Field Searching in Web Search Engines:A Field Trip Hock,R.E.
  22. Classification: A Classification Scheme for the INSPEC Database The Institution of Electrical Engineers
  23. Thesaurus:1999. Surrey The Institution of Electrical Engineers
  24. INSPEC Database on WebSPIRS:User Notes The Institution of Electrical Engineers
  25. INSPECMATTERS:the Newsletter of the IEE Publishing and Information Services Division The Institution of Electrical Engineers
  26. Unpublished MPhil thesis, Department nof Information Studies, University of Sheffield User Differences in Interactive Web-based OPAC Evaluation Kim,H.
  27. ETRI Journal v.21 no.4 Correlations between User`s Characteristics and Preferred Features of Web-based OPAC Evaluation Kim,H.(et al.)
  28. Vocabulary Control for Information Retrieval ($2^{nd}$ Edition) Lancaster,F.W.
  29. Indexing and Abstracting in Theory and Practicel ($2^{nd}$ Edition) Lancaster,F.W.
  30. Journal of Information Science v.6 no.3 Methods for Evaluating the Number of Relevant Documents in a Collection Martin,W.A.
  31. Journal of the American Society for Information Science v.49 no.10 Natural Language versus Controlled Vocabulary in Information Retrieval:A Case Study in Soil Mechanics Muddamalle,M.R.
  32. Online v.21 no.4 Internet Search Techniques and Strategies Notess.G.R.
  33. Database v.19 no.3 Searching the Web with Alta Vista Notess.G.R.
  34. Journal of Documentation v.56 no.1 Information Retrieval, Experimental Models and Statistical Analysis Pors,N.O.
  35. Journal of the American Society for Information Science v.27 no.3 Relevance Weighting of Search Terms Robertson,S.E.;Sparck Jones,K.
  36. Journal of Documentation v.53 no.1 Research and Evaluation in Information Retrieval Robertson,S.E.;Beaulieu,M.
  37. Information Processing & Management v.36 no.1 Experimentation as a Way of Life:OKAPI at TREC Robertson,S.E.;Walker,S.;Beaulieu,M.
  38. Information Processing & Management v.31 no.3 Large Test Collection Experiments on an Operational, Interactive System:OKAPI at TREC Robertson,S.E.;Walker,S.;Hancock-Beaulieu,M.
  39. Journal of the American Society for Information Science v.23 no.1 The Generality Effect and the Retrieval Evaluation for Large Collection Salton,G.
  40. Information Processing & Management v.28 no.4 The State of Retrieval System Evaluation Salton,G.
  41. Journal of Documentation v.29 no.4 On the Specification of Term Values in Automatic Indexing Salton,G.;Yang,C.S.
  42. Information Processing & Management v.24 no.5 Term-Weighting Approaches in Automatic Text Retrieval Salton,G.;Buckley,C.
  43. SIFIR `95: Proceedings of the Association for Computing Machinery Special Interest Group on Information Retrieval (ACM/SIGIR) $ 18^{th}$ Annual International Conference on Research and Development in Information Retrieval v.18 Evaluation of Evaluation in Information Retrieval Saracevic,T.;E.A.Fox(ed.)P.Ingwesen(ed.);R.Fidel(ed.)
  44. Journal of the American Society for Information Science v.37 no.5 On the Foundation of Evaluation Shaw,Jr.,W.M.
  45. Information Processing & Management v.30 no.5 Retrieval Expectations, Cluster-based Effectiveness, and Performance Standards in the CF Database Shaw,Jr.,W.M.
  46. Organizing Information:Principles of Data Base and Retrieval Systems Soergel,D.
  47. Information Retrieval Experiment Sparck Jones,K.(ed.)
  48. Proceedings of the $12^{th}$ National Online Meeting v.12 Evaluation of Interactive Information Retrieval:Impication for Operational Systems and Practice Su,L.T.;M.E.Williams(ed.)
  49. Science v.141 Information Retrieval Systems Swets,J.A.
  50. American Documentation v.20 no.1 Effectiveness of Information Retrieval Methods Swets,J.A.
  51. Information Retrieval Experiment The Pragmatics of Information Retrieval Experimentation Tague,J.M.
  52. Information Processing & Management v.28 no.4 The Pragmatics of Information Retrieval Experimentation, Revisited Tague-Sutcliffe,J.M.
  53. Journal of the American Society for Information Science v.47 no.1 Some Perspective on the Evaluation of Information Retrieval Systems Tague-Sutcliffe,J.M.
  54. Information Retrieval($2^{th}$ ed.) van Rijsbergen,C.J.
  55. Computers in Libraries v.19 no.5 Darwin on the Web:the Evolution of Search Tools Vidmar,D.J.
  56. Journal of Documentation v.54 no.4 Title Keywords and Subject Descriptors:A Comparison of Subject Search Entries of Books in the Humanities and Social Sciences Voorbij,H.J.
  57. Business Information Review v.15 no.4 Search Engines and News Services:Developments on the Internet Webber,S.