DOI QR코드

DOI QR Code

The Effectiveness of Hierarchic Clustering on Query Results in OPAC

OPAC에서 탐색결과의 클러스터링에 관한 연구

  • 노정순 (한남대학교 문헌정보학과)
  • Published : 2004.03.01

Abstract

This study evaluated the applicability of the static hierarchic clustering model to clustering query results in OPAC. Two clustering methods(Between Average Linkage(BAL) and Complete Linkage(CL)) and two similarity coefficients(Dice and Jaccard) were tested on the query results retrieved from 16 title-based keyword searchings. The precision of optimal dusters was improved more than 100% compared with title-word searching. There was no difference between similarity coefficients but clustering methods in optimal cluster effectiveness. CL method is better in precision ratio but BAL is better in recall ratio at the optimal top-level and bottom-level clusters. However the differences are not significant except higher recall ratio of BAL at the top-level duster. Small number of clusters and long chain of hierarchy for optimal cluster resulted from BAL could not be desirable and efficient.

본 연구는 한글 OPAC에서 문헌의 분류와 브라우징에 적합한 정적 계층클러스터링 모형이 서명단어 탐색으로 검색된 탐색결과를 클러스터링하는데도 효과적인지를 규명하기 위해 수행되었다. 서명에 출현하는 단어와 색인자가 부여한 통제어를 통합한 색인어를 이진빈도로 가중치를 주어, 다이스와 자카드 계수, 집단 간 평균연결과 완전연결 클러스터링 기법이 테스트되었다. 16개의 서명단어 탐색으로 검색된 문헌을 클러스터링한 결과 최적으로 선택된 클러스터의 정확률은 유사도 계수나 클러스터링 기법에 관계없이 서명단어탐색보다 100%이상 향상되었다. 1단계와 최종단계 클러스터링 모두에서, 정확률 측면에서는 완전연결이, 재현을 측면에서는 집단 간 평균연결이 더 효과적이었으나 통계적으로 유의한 수준은 아니었다. 1단계 클러스터에서 집단 간 평균연결이 보다 높은 재현율을 보인 것은 유의하였다. 다이스와 자카드 사이에 차이는 없었다. 최종클러스터가 선택되기까지 집단 간 평균연결은 너무 긴 계층군집 단계를 필요로 하여 탐색효율 측면에서 바람직해 보이지 않았다.

Keywords

References

  1. 정보관리학회지 OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구 노정순
  2. RQ v.31 no.winter End-user Searching: program review and future prospects Barbuto,D.M.;E.E.Cevallos
  3. Journal of the American Society for Information Science v.47 no.7 Ordering author and work records: an evaluation of collection in online catalog displays Carlyle, Allyson https://doi.org/10.1002/(SICI)1097-4571(199607)47:7<538::AID-ASI6>3.0.CO;2-V
  4. Information Systems v.5 A Model of cluster searching based on classification Croft,W.B. https://doi.org/10.1016/0306-4379(80)90010-1
  5. Journal of Documentation v.41 no.3 Automatic classification of book material represented by back-of-book index Enser,P.G.B. https://doi.org/10.1108/eb026777
  6. Information Processing and Management v.19 no.3 An Experiment in automatic hierachical document classification Garland,K. https://doi.org/10.1016/0306-4573(83)90064-X
  7. Journal of Documentation v.40 no.3 Hierarchic agglomerative clustering methods for automatic document classification Griffiths,A.;L.A.Robinson;P.Willett https://doi.org/10.1108/eb026764
  8. Proceedings of the 19th Annual International ACM SIGIR Conference of Research and development in Information Retrieval Reexamining the cluster hypothesis: Scatter/Gather on retrieval results Hearst,M.;J.Pederson
  9. Ph.D.Diss., University of California Workload Characteristics and Computer System Utilization in Online Library Catalog Larson,R.
  10. Journal of the American Society for Information Science v.42 no.3 The Decline of subject searching: long-term trends and patterns of index use in an online catalog Larson,R. https://doi.org/10.1002/(SICI)1097-4571(199104)42:3<197::AID-ASI6>3.0.CO;2-T
  11. Proceedings of the second European Conference of Research and Technology for Digital Libraries Evaluating a visual navigation system for a digital library Leouski,A.;J.Allan
  12. OPAC and Beyond: Proceedings of a Joint Meeting of the British Library, DBMIST, and OCLC Large database and multiple database problems in online catalogs Lynch,C.A.
  13. Information Technology and Libraries v.10 no.Sept. Controlled and uncontrolled vocabulary subject searching in an academic library online catalog Peter,T.A.;M.Kurth
  14. Proceedings of the American Society for Information Science v.10 Clustering as an output option Preece
  15. Information processing & Management v.37 Information navigation on the web by clustering and summarizing query results Roussinov,D.;Chen,H. https://doi.org/10.1016/S0306-4573(00)00062-5
  16. The SMART Retrieval System-Experiments in Automatic Document Retrieval Salton,G.
  17. Proceeding of the 20th annual ACM SIGIR conference Almost-constant-time clustering of arbitary corpus subsets Silverstein,D.;J.O.Pedersen
  18. Information Processing and Management v.38 no.4 The Effectiveness of query-specific hierarchic clustering in information retrieval Tombros,A.;R.Villa;C.J.Van Rijsbergen https://doi.org/10.1016/S0306-4573(01)00048-6
  19. Vivisimo
  20. LRTS v.39 no.3 User persistence in displaying online catalog posting: LUIS Wibereley,S.E.;R.A.Daugherty;J.A.Danowsky
  21. Internation Forum on Information and Documentation v.10 no.2 Query specific automatic document classification Willett,P.
  22. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Web document clustering: A feasibility demonstration Zamir,O.;Etzioni,O.