• Title/Summary/Keyword: 문헌클러스터링

Search Result 54, Processing Time 0.033 seconds

User Query Expansion Through Keyword Similarity Ranking Algorithm Us ins Cluster ing Methods (클러스터링 기법을 이용한 키워드 유사도 순위화 알고리즘에 따른 사용자 질의 확장)

  • 이상훈;김기태
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.479-481
    • /
    • 2003
  • 본 논문에서는 여러 가지 클러스터링 기법들을 사용하여 키워드 유사도롤 순위화하여 사용자의 질의를 확장하는 기법을 제안한다. 클러스터링 기법에는 연관(Association) 클러스터링, 메트릭(Metric) 클러스터링, 스칼라(Scalar) 클러스터링 기법을 사용하고, 이들간의 가중치를 적절히 조절하여 검색 시스템을 만든다. 사용자의 질의가 주어졌을 때, 질의 키워드와 연관된 키워드들을 순위화 하여 사용자에게 보여주고, 사용자의 추가입력을 받아서 질의를 확장한다. 사용자가 적당한 질의어로 판단하여 확장된 질의로 검색을 수행할 때까지 이 과정을 반복한다. 실험에서 사용한 문헌집합은 Korea Herald의 2003년 1월과 2월의 경제 관련 기사들을 수집하여 사용하였고, 실험을 거쳐서 질의를 확장한 결과 만족할 만한 결과가 도출되었다.

  • PDF

A Clustering Technique Using Association Rules for The Library and Information Science Terminology (연관규칙을 이용한 문헌정보학 전문용어 클러스터링 기법에 관한 연구)

  • Seung, Hyon-Woo;Park, Mi-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.37 no.2
    • /
    • pp.89-105
    • /
    • 2003
  • In this paper, an effective method for clustering terminologies extracted from text is proposed, in order to develope a search engine to extract relevant information from large web documents. To prevent frequency of the meaningless association rules among general terminologies, only useful association rules among terminologies are produced using database tables which consist of domain-specific terminologies. Such association rules are produced by applying the Apriori algorithm after forming transaction units from groups of association rules in a document. A group of association rules produced from a terminology forms in a cluster.

Comparative Evaluation of Term Weighting Methods in Automatic Document Classification (문헌 자동분류에서 용어가중치 기법에 대한 연구)

  • 이재윤;최보영;정영미
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2000.08a
    • /
    • pp.41-44
    • /
    • 2000
  • 정보검색 시스템의 성능을 향상시키기 위해서 다양한 용어가중치 공식이 제안 되어왔다. 용어가중치는 질의와 문헌을 비교하는 검색의 경우뿐만 아니라 문헌과 문헌을 비교하는 자동분류에서도 성능에 영향을 미칠 수가 있다. 본 논문에서는 다양한 용어가중치 공식에 대해서 살펴보고, 문헌 자동분류 성능에 미치는 영향을 문헌 클러스터링 실험과 범주화 실험을 통해 확인해 보았다.

  • PDF

The Ecology of the Scientific Literature and Information Retrieval (I)

  • Jeong, Jun-Min
    • Journal of the Korean Society for information Management
    • /
    • v.2 no.2
    • /
    • pp.3-37
    • /
    • 1985
  • This research deals with the problems encountered in designing systems for more efficient and effective information retrieval used in the proliferation of literature. This research was designed to develop and test 1) the partitioning a large bibliographic data base into quality oriented subsets (quality filtering), and 2) a system for effective and efficient information retrieval within subsets of data base (relevance). In order to accomplish this partitioning, the 'kernel' technique of graph theory was applied. In addition, a method of quality filtering utilizing the 'epidemic' theory and the 'obsolescence' of scientific literature was developed.

  • PDF

The Ecology of the Scientific Literature and Information Retrieval (II)

  • Jeong, Jun-Min
    • Journal of the Korean Society for information Management
    • /
    • v.3 no.1
    • /
    • pp.3-16
    • /
    • 1986
  • This research deals with the problems encountered in designing systems for more efficient and effective information retrieval used in the proliferation of literature. This research was designed to develop and test 1) the partitioning a large bibliographic data base into quality oriented subsets (quality filtering), and 2) a system for effective and efficient Information retrieval within subsets of data base (relevance). In order to accomplish this partitioning, the 'kernel' technique of graph theory was applied. In addition, a method of quality filtering utilizing the 'epidemic' theory and the 'obsolescence' of scientific literature was developed.

  • PDF

A Theoretical Study on Indexing Methods using the Metadata for the Automatic Construction of a Thesaurus Browser (시소러스 브라우저 자동구현을 위한 Metadata를 이용한 색인어 처리방안에 대한 연구)

  • Seo , Whee
    • Journal of Korean Library and Information Science Society
    • /
    • v.35 no.4
    • /
    • pp.451-467
    • /
    • 2004
  • This paper is intended to present the theoretical analyses on automatic indexing, which is vital in the process of constructing a thesaurus browser, and clustering algorithms to construct hierarchical relations among terms as well as the methods for the automatic construction of a thesaurus browser. The methods to select the index term automatically in the web documents are studied by surveying the methods for analyzing and processing metadata which conforms to bibliographical roles of traditional paper documents in web documents. Also, the result of the study suggests to adding or involving the metadata in web documents, using the metadata automatic editor because metadata is not listed in most of the web documents.

  • PDF

A Study on the Clustering Technique Associated with Statistical Term Relatedness in Information Retrieval (정보검색(情報檢索)에 있어서 용어(用語)의 통계적(統計的) 관련성(關聯性)을 응용(應用)한 클러스터링기법(技法))

  • Jeong, Jun-Min
    • Journal of Information Management
    • /
    • v.18 no.4
    • /
    • pp.98-117
    • /
    • 1985
  • At the present time, the role and importance of information retrieval has greatly increased for two main reasons: the coverage of the searchable collections is now extensive and collection size may exceed several million documents; further more, the search results can now be obtained more or less instantaneously using online procedures and computer terminal devices that provide interaction and communication between system and users. The large collection size make it plausible to the users that relevant information will in fact be retrieved as a result of a search operation, and the probability of obtaining the search output without delay creates a substantial user demand for the retrieval services.

  • PDF

A Study on Keyword Extraction From a Single Document Using Term Clustering (용어 클러스터링을 이용한 단일문서 키워드 추출에 관한 연구)

  • Han, Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.3
    • /
    • pp.155-173
    • /
    • 2010
  • In this study, a new keyword extraction algorithm is applied to a single document with term clustering. A single document is divided by multiple passages, and two ways of calculating similarities between two terms are investigated; the first-order similarity and the second-order distributional similarity. In this experiment, the best cluster performance is achieved with a 50-term passage from the second-order distributional similarity. From the results of first experiment, the second-order distribution similarity was also applied to various keyword extraction methods using statistic information of terms. In the second experiment, pf(paragraph frequency) and $tf{\times}ipf$(term frequency by inverse paragraph frequency) were found to improve the overall performance of keyword extraction. Therefore, it showed that the algorithm fulfills the necessary conditions which good keywords should have.

A Comparative Study of Feature Selection Methods for Korean Web Documents Clustering (한글 웹 문서 클러스터링 성능향상을 위한 자질선정 기법 비교 연구)

  • Kim Young-Gi
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.39 no.1
    • /
    • pp.45-58
    • /
    • 2005
  • This Paper is a comparative study of feature selection methods for Korean web documents clustering. First, we focused on how the term feature and the co-link of web documents affect clustering performance. We clustered web documents by native term feature, co-link and both, and compared the output results with the originally allocated category. And we selected term features for each category using $X^2$, Information Gain (IG), and Mutual Information (MI) from training documents, and applied these features to other experimental documents. In addition we suggested a new method named Max Feature Selection, which selects terms that have the maximum count for a category in each experimental document, and applied $X^2$ (or MI or IG) values to each term instead of term frequency of documents, and clustered them. In the results, $X^2$ shows a better performance than IG or MI, but the difference appears to be slight. But when we applied the Max Feature Selection Method, the clustering Performance improved notably. Max Feature Selection is a simple but effective means of feature space reduction and shows powerful performance for Korean web document clustering.

Examining the Intellectual Structure of Records Management & Archival Science in Korea with Text Mining (텍스트 마이닝을 이용한 국내 기록관리학 분야 지적구조 분석)

  • Lee, Jae-Yun;Moon, Ju-Young;Kim, Hee-Jung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.41 no.1
    • /
    • pp.345-372
    • /
    • 2007
  • In this study, the intellectual structure of Records Management & Archival Science in Korea was analyzed using document clustering, a widely used method of text mining, and document similarity network analysis. The data used in this study were 145 articles written on the subject of Records Management & Archival Science selected from five major representative journals in the field of Library & Information Science in Korea, published from 2001 to 2006. The results of cluster analysis show that the core subject areas are "electronic records management and digital Preservation," "records management policy and institution," "records description and catalogues." and "records management domain and education." The results of document analysis, which is more detailed than cluster analysis, show that "digital archiving," a specialized subject in digital preservation, plays a central role. The results of serial analysis, which proceeds according to a timeline, show the emergence of "archival services" as a new subject area.