DOI QR코드

DOI QR Code

Hierarchic Document Clustering in OPAC

OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구

  • 노정순 (한남대학교 문헌정보학과)
  • Published : 2004.03.01

Abstract

This study is to develop a hierarchic clustering model fur document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean). and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.

본 연구는 OPAC에서 계층 클러스터링을 응용하여 소장자료를 계층구조로 분류하여 열람하는데 사용될 수 있는 최적의 계층 클러스터링 모형을 찾기 위한 목적으로 수행되었다. 문헌정보학 분야 단행본과 학위논문으로 실험집단을 구축하여 다양한 색인기법(서명단어 자동색인과 통제어 통합색인)과 용어가중치 기법(절대빈도와 이진빈도), 유사도 계수(다이스, 자카드, 피어슨, 코싸인, 제곱 유클리드), 클러스터링 기법(집단간 평균연결, 집단내 평균연결, 완전연결)을 변수로 실험하였다. 연구결과 집단간 평균연결법과 제곱 유클리드 유사도를 제외하고 나머지 유사도 계수와 클러스터링 기법은 비교적 우수한 클러스터를 생성하였으나, 통제어 통합색인을 이진빈도로 가중치를 부여하여 완전연결법과 집단간 평균연결법으로 클러스터링 하였을 때 가장 좋은 클러스터가 생성되었다. 그러나 자카드 유사도 계수를 사용한 집단간 평균연결법이 십진구조와 더 유사하였다.

Keywords

References

  1. 정영미, 이재윤. 2001. 지식분류의 자동화를 위한 클러스터링 모형 연구. "정보관리학회지", 18(2): 203-230.
  2. 한승희, 이재윤. 1999. 문헌틀러스터링을 위한 유사계수 간의 연관성 측정. "제6회 한국정보관리학회 학술대회 논문집", 8: 25-28.
  3. Barbuto, D. M. & E. E. Cevallos. 1991. "Enduser Searching: program review and future prospects." RQ, 31(winter): 214-227.
  4. Blecic, D. D., J. L. Dorsch, M. H. Koenig, and N. S. Bangalore. 1999. "A Lorgitudinal study of the effects of OPAC screen changes on searching behavior and searcher success." College & Research Libraries, 60(Nov.): 515-530. https://doi.org/10.5860/crl.60.6.515
  5. Carlyle, Allyson. 1996. "Ordering author and work records: an evaluation of collection in online catalog displays." Journal of the American Society for Information Science, 47(7): 538-554. https://doi.org/10.1002/(SICI)1097-4571(199607)47:7<538::AID-ASI6>3.0.CO;2-V
  6. Cooper, M. D. & Hui-Min Chen. 2001. "Predicting the relevance of a library catalog search." Journal of the American Society for Information Science and Technology, 52(10): 812-827.
  7. Croft, W. B. 1980. "A Model of cluster searching based on classification." Information Systems, 5: 189-195. https://doi.org/10.1016/0306-4379(80)90010-1
  8. Cutting, D. R., J. O. Pedersen, D. Karger, and 1. W. Tukey. 1992. "Scatter/Gather: a cluster-based approach to browsing large document collections." Processing of the 15th Annual International ACM SIGIR Conference on Research and development in Information Retrieval: 318-329.
  9. EI-Hamdouchi, A. and P. Willett. 1989. "Comparison of hierarchic agglomerative clustering methods for document retrieval." The Computer Journal, 32(3): 220-227. https://doi.org/10.1093/comjnl/32.3.220
  10. Enser, P. G. B. 1985. "Automatic classification of book material represented by backof-book index." Journal of Documentation, 41 (3): 135-155. https://doi.org/10.1108/eb026777
  11. Garland, K. 1983. "An Experiment in automatic hierachical document classification." Information Processing and Management, 19(3): 113-120. https://doi.org/10.1016/0306-4573(83)90064-X
  12. Griffiths, A., L. A. Robinson, and P. Willett. 1984. "Hierarchic agglomerative clustering methods for automatic document classification." Journal of Documentation, 40(3): 175-205. https://doi.org/10.1108/eb026764
  13. Hearst, M. & J. Pederson. 1996. "Reexamining the cluster hypothesis: Scatter/Gather on retrieval results." Proceedings of the 19th Annual International ACM SIGIR Conference of Research and developoment in Information Retrieval: 76-84.
  14. Larson, R. 1986. Workload Characteristics and Computer System Utilization in Online Library Catalog. Ph.D. Diss., University of California, Berkeley.
  15. Larson, R. 1991. "The Decline of subject searching: long-term trends and patterns of index use in an online catalog." Journal of the American Society for Information Science, 42(3): 197-215 https://doi.org/10.1002/(SICI)1097-4571(199104)42:3<197::AID-ASI6>3.0.CO;2-T
  16. Leouski, A. and J. Allan. 1998. "Evaluating a visual navigation system for a digital library." Proceedings of the second European Conference of Research and Technology for Digital Libraries, Heraklion, Greece: 535-554.
  17. Mechkour, Harper, & Muresan. 1998. "The WebCluster Project Using clustering for mediating access to the world Wide Web." Proceedings of the 21st Annual International ACM SIGIR Conference on Research and development in information Retrieval: 357-358.
  18. Preece. 1973. "Clustering as an output option." Proceedings of the American Society for information Science, 10: 189-190.
  19. Roussinov, D. & Chen, H. 2001. "Information navigation on the web by clustering and summarizing query results." Information processing & Management, 37: 789-816. https://doi.org/10.1016/S0306-4573(00)00062-5
  20. Salton, G. 1971. The SMART Retrieval System-Experiments in Automatic Document Retrieval. Englewood Cliffs, NJ: Prentice-Hall.
  21. Silverstein, D. and J. O. Pedersen. 1997. "Almost-constant-time clustering of arbitary corpus subsets." Proceeding of the 20th annual ACM SIGIR conference, Philadelphia, PA: 60-66.
  22. Tombros, A., R. Villa, and C. J. Van Rijsbergen. 2002. "The Effectiveness of query-specific hierarchic clustering in information retrieval." Information Processing and Management, 38(4): 559-582. https://doi.org/10.1016/S0306-4573(01)00048-6
  23. Voorbij, Henk J. 1998. "Title key-words and subject descriptors: a comparison of subject entries of books in the humanity and social science." Journal of Documentation, 54(4): 466-476. https://doi.org/10.1108/EUM0000000007178
  24. Wibereley, S. E., R. A. Daugherty, and J. A. Danowsky. 1995. "User persistence in displaying online catalog posting: LUIS." LRTS, 39(3): 247-264.
  25. Willett, P. 1985. "Query specific automatic document classification." Internation Forum on Information and Documentation, 10(2): 28-32.
  26. Zamir, O. and Etzioni, O. 1998. "Web document clustering: A feasibility demonstration." Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: 46-54.
  27. Zink, Steven A. 1991. "Monitoring user success through transaction log analysis: The WolfPAC Example." Reference Services Review, 19(spring): 49-56.