• Title/Summary/Keyword: 문헌 군집화

Search Result 49, Processing Time 0.022 seconds

Medical Document Clustering using the Growing Hierarchical SOM (신경망 GHSOM을 이용한 의료 문헌 정보의 군집화)

  • Heo, Jin-Seok;Kim, In-Cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04a
    • /
    • pp.519-522
    • /
    • 2002
  • 일반적으로 PubMed와 같은 인터넷을 이용한 대규모 의료 문헌정보 검색시스템에서 포괄적인 주제어나 간결한 주제어를 이용한 검색을 시도할 경우, 종종 매우 다양한 세부주제의 문헌리스트들이 다량으로 검색된다. 이러한 경우 이용자는 실제로 본인이 원했던 세부주제에 부합되는 문헌들을 찾기 위해서는 검색결과로 주어진 긴 문헌리스트상의 문헌 하나하나에 대해 다시 문헌제목이나 혹은 요약 등의 내용을 직접 읽어보고 내용을 확인하여야 한다. 이러한 작업은 매우 번거럽고 시간과 노력을 많이 필요로 한다. 따라서 본 논문에서는 이러한 노력을 줄이기 위한 한 가지 방안으로, PubMed 시스템의 주제어 검색결과로 주어진 문헌들에 대해 내용의 유사성과 차별성에 따라 자동으로 몇 개의 그룹으로 나누어주는 군집화시스템 MedCluster의 설계와 구현에 대해 소개한다. MedCluster의 큰 특징은 기존의 문서 군집화 방법과는 다른 신경망 GHSOM을 이용한 군집화 방법을 사용하는 점이다. GHSOM은 미리 문서 그룹의 개수를 정해줄 필요가 없고 다양한 레벨의 문서 그룹들을 얻을 수 있는 계층적 군집화를 이루어낸다는 장점을 가지고 있다. 본 논문에서는 신경망 GHSOM의 구조와 특성에 대해 간략히 살펴보고, GHSOM을 채용한 의료문헌 군집화시스템 MedCluster의 설계와 구현에 대해 설명한다.

  • PDF

Visualization Method of Document Retrieval Result based on Centers of Clusters (군집 중심 기반 문헌 검색 결과의 시각화)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.5
    • /
    • pp.16-26
    • /
    • 2007
  • Because it is difficult on existing document retrieval systems to visualize the search result, search results show document titles and short summaries of the parts that include the search keywords. If the result list is long, it is difficult to examine all the documents at once and to find a relation among them. This study uses clustering to classify similar documents into groups to make it easy to grasp the relations among the searched documents. Also, this study proposes a two-level visualization algorithm such that, first, the center of clusters is projected to low-dimensional space by using multi-dimensional scaling to help searchers grasp the relation among clusters at a glance, and second, individual documents are drawn in low-dimensional space based on the center of clusters using the orbital model as a basis to easily confirm similarities among individual documents. This study is tested on the benchmark data and the real data, and it shows that it is possible to visualize search results in real time.

A Study on the Main Classes of DDC (DDC 주류구분법에 관한 연구)

  • Nam, Tae-Woo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.1
    • /
    • pp.27-56
    • /
    • 2009
  • The purpose of this study is to analyze on the main classes of DDC. The DDC is a general classification system which aims to classify documents of all kinds falling in any knowledge domain. At best, the order of the main classes represents a mix of Baconian and Hegelian philosophy adulterated by the practical exigencies of organization a collection of books. Each of the main classes have been subdivided further into what are technically known as divisions. This division of knowledge into the nine main classes mirrors the educational consensus of the late nineteen-century Western academic world. The DDC thus scatters subjects by discipline, and the subjects are subordinated to discipline. The DDC has been criticised for its rigidity of division by ten at every step of its division. Division by the decimal classification has been likened to the Procrustean bed.

A Keyword analysis on the 'user' related research papers : In Library and Information Science (이용자 관련 연구논문에 대한 주제어 분석)

  • Park, Seonmi;Oh, Kyung-mook
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2013.08a
    • /
    • pp.43-46
    • /
    • 2013
  • 본 연구에서는 국내 문헌정보학 분야의 연구 논문 중 이용자 관련 연구 논문 125편을 대상으로 논문에 부여된 주제어간의 연결 관계를 분석 하였다. 사전 작업을 통하여 정리된 226개의 주제어에 대한 연결 관계를 네트워크 분석을 통하여 분석하고 시각화 하였다. 그래프를 통하여 주제어간 연결 강도를 확인하였고, 다른 주제어와 연결성이 높은 상위 20개의 주제어를 제시하였다. 주제어간 근접성이 높은 주제어를 군집화한 결과 14개의 군집으로 정리되었다. 다른 주제어와 연결이 없이 고립된 군집이 8개, 연결된 군집이 6개였다.

  • PDF

Extraction of higher yeast protein-protein interaction with hierarchical clustering from textual data (계층적 군집화를 통한 이스트(Yeast) 단백질의 고차 상호작용 추출)

  • 엄재홍;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.364-366
    • /
    • 2002
  • 본 논문에서는 텍스트 형태로 구성된 특정 생물에 대한 문헌 데이터에서 해당 생물의 주요 단백질간의 이진(binary) 관계를 추출하여 이들을 특징별로 계층적으로 군집화 함으로써 특정 현상을 나타내는 단백질간의 주요 관계를 추출하는 방법을 제시한다. 텍스트 데이터에서 단백질간의 이진관계는 기본적인 데이터마이닝 기법을 사용하여 연관규칙(association rule)의 형태로 추출하게 된다. 본 논문에서는 실험을 위해 PUBMED에서 추출한 Yeast의 주요 단백질간의 관계를 포함하고 있는 논문 데이터인 MEDLINE Abstract와 몇몇 공개 데이터베이스를 사용하였다. 실험 결과 SH3와 같이 기존에 알려진 단백질간의 단일 관계를 추출하는 것 이외에 이러한 관계들을 이용하여 클러스터링을 행한 결과 공통 현상에 작용하는 주요 단백질간의 관계들이 서로 군집화 됨을 확인 할 수 있었다. 또한 단순 이진관계가 아닌 클러스터링을 이용한 보다 상위 단계에서 단순 규칙들 간의 관계를 살펴봄으로써 단백질간의 이진관계를 추출하기 위한 데이터로 사용한 문헌 데이터에 나타나 있지 않은 1차 이상의 관계를 고찰 해 볼 수 있었다. 논문에서는 규칙 추출의 전체 과정과 함께 사용된 추출 시스템의 각 부와 데이터에 대한 설명을 다룬다.

  • PDF

A Study on Principle and Theory of Main Classes in the Library Classification (문헌분류법에서의 주류설정의 원리)

  • Nam, Tae-Woo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.40 no.4
    • /
    • pp.333-366
    • /
    • 2006
  • The purpose of this study is principle and theory of main class in a Library Classification. According to Sayers, 'The foundation of the library is the book; the foundation of librarianship is classification.' We looked at the between scientific and bibliographic classification, and at the fact that bibliographic scheme is usually an aspect classification. That is to say, the organization of topics is based on areas or activity and the first division of the scheme is into disciplines or subject domains. This first division of classification creates what are called main class. The sequence of main classes is also important. A rough definition of a amin class is that it corresponds to a sin91e notational character. Main classes usually equivalent to traditional disciplines. What constitutes a main class will vary from one classification to another. The order in which the main classes are listed is often discussed at the theoretical level, and some orders are considered to be better than others.

Exploration of Hierarchical Techniques for Clustering Korean Author Names (한글 저자명 군집화를 위한 계층적 기법 비교)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.40 no.2
    • /
    • pp.95-115
    • /
    • 2009
  • Author resolution is to disambiguate same-name author occurrences into real individuals. For this, pair-wise author similarities are computed for author name entities, and then clustering is performed. So far, many studies have employed hierarchical clustering techniques for author disambiguation. However, various hierarchical clustering methods have not been sufficiently investigated. This study covers an empirical evaluation and analysis of hierarchical clustering applied to Korean author resolution, using multiple distance functions such as Dice coefficient, Cosine similarity, Euclidean distance, Jaccard coefficient, Pearson correlation coefficient.

A Study on the Reading Program Improvement Plan of a Public Library Based on the Reading Culture Promotion Policy (독서문화진흥 정책에 기반한 공공도서관의 독서 프로그램 개선 방안 연구)

  • Miah Cho;Seung-Jin Kwak
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.3
    • /
    • pp.191-210
    • /
    • 2023
  • The purpose of this study is to draw implications through domestic and international best case studies of library programs, and to suggest ways to improve a public library reading programs through analysis based on the 3rd Reading Culture Promotion Basic Plan in line with the changing role of future libraries. there is To this end, first, prior studies were analyzed from various angles to derive clustering standards for library programs. Based on this, programs of various domestic and foreign libraries were analyzed based on clustering criteria. And based on the clustering criteria of library programs and the 13 key tasks under the 4 strategies of the 3rd Reading Culture Promotion Basic Plan, the status of a specific public library reading programs was analyzed. Through this, in consideration of the demand of users in the era of the 4th Industrial Revolution, participatory reading promotion programs are expanded, and in response to the post-COVID-19 era, beyond face-to-face library services, non-face-to-face and non-contact library services are also considered. A development plan was presented. It is expected that this analysis and application attempt will ultimately go beyond the unit library and contribute to improving the public library service in Korea into a library program closely related to the lives of users.

Research of Topic Analysis for Extracting the Relationship between Science Data (과학기술용어 간 관계 도출을 위한 토픽 분석 연구)

  • Kim, Mucheol
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.1
    • /
    • pp.119-129
    • /
    • 2016
  • With the development of web, amount of information are generated in social web. Then many researchers are focused on the extracting and analyzing social issues from various social data. The proposed approach performed gathering the science data and analyzing with LDA algorithm. It generated the clusters which represent the social topics related to 'health'. As a result, we could deduce the relationship between science data and social issues.

A Study on Intellectual Structure of Records Management and Archives in Korea: Based on Syntactic and Semantic Structure of Article Titles (우리나라 기록관리학 분야의 연구영역 분석 - 논문제목의 구문 및 의미 구조를 중심으로 -)

  • Kim, Gyu-Hwan;Jang, Bo-Seong;Yi, Hyun-Jung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.3
    • /
    • pp.417-439
    • /
    • 2009
  • In this study, the intellectual structure of Records Management and Archival Science in Korea was analyzed based on the syntactic and semantic structure analysis of article titles. The data used in this study were 344 articles from three major representative journals in the field of Records Management and Archival Science, published from 1999 to 2008. The results of the syntactic and semantic structure analysis of article titles show that the three role concepts of keywords are 'research domain', 'research object', and 'research focus'. Keywords in article titles were clustered into the core subject areas after they were assigned three concepts. Based on the results of cluster analysis, the intellectual structure of Records Management and Archival Science in Korea was proposed.