• Title/Summary/Keyword: 클러스터 대표어

Search Result 11, Processing Time 0.028 seconds

A Study on Cluster Topic Selection in Hierarchical Clustering (계층적 클러스터링에서 분류 대표어 선정에 관한 연구)

  • Yi, Sang-Seon;Lee, Shin-Won;An, Dong-Un;Chung, Sung-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.669-672
    • /
    • 2004
  • 정보의 양이 많아지면서 정보 검색 시스템에 검색 결과를 자동으로 구조화하는 계층적 클러스터링을 적용하는 시도가 늘고 있다. 계층적 클러스터링은 문서 간의 유사도를 통해 클러스터를 계층 구조로 만들어 검색 성능을 높이고 결과를 사용자에게 이해하기 쉽게 보여준다. 계층 구조는 검색 결과를 요약하는 것이기 때문에 클러스터의 내용을 효과적으로 함축할 수 있는 대표어의 선정이 중요하다. 각 클러스터의 대표어를 선정하기 위해 대표어에 명사인 단어만 추출하고 상위 클러스터 대표어에 사용된 단어는 하위 클러스터에 사용하지 않는 방법을 적용하여 대표어의 질을 높였다.

  • PDF

Automatic Generation of the Local Level Knowledge Structure of a Single Document Using Clustering Methods (클러스터링 기법을 이용한 개별문서의 지식구조 자동 생성에 관한 연구)

  • Han, Seung-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.251-267
    • /
    • 2004
  • The purpose of this study is to generate the local level knowledge structure of a single document, similar to end-of-the-book indexes and table of contents of printed material through the use of term clustering and cluster representative term selection. Furthermore, it aims to analyze the functionalities of the knowledge structure. and to confirm the applicability of these methods in user-friend1y information services. The results of the term clustering experiment showed that the performance of the Ward's method was superior to that of the fuzzy K -means clustering method. In the cluster representative term selection experiment, using the highest passage frequency term as the representative yielded the best performance. Finally, the result of user task-based functionality tests illustrate that the automatically generated knowledge structure in this study functions similarly to the local level knowledge structure presented In printed material.

Representative Labels Selection Technique for Document Cluster using WordNet (문서 클러스터를 위한 워드넷기반의 대표 레이블 선정 방법)

  • Kim, Tae-Hoon;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.18 no.2
    • /
    • pp.61-73
    • /
    • 2017
  • In this paper, we propose a Documents Cluster Labeling method using information content of words in clusters to understand what the clusters imply. To do so, we calculate the weight and frequency of the words. These two measures are used to determine the weight among the words in the cluster. As a nest step, we identify the candidate labels using the WordNet. At this time, the candidate labels are matched to least common hypernym of the words in the cluster. Finally, the representative labels are determined with respect to information content of the words and the weight of the words. To prove the superiority of our method, we perform the heuristic experiment using two kinds of measures, named the suitability of the candidate label ($Suitability_{cl}$) and the appropriacy of representative label ($Appropriacy_{rl}$). In applying the method proposed in this research, in case of suitability of the candidate label, it decreases slightly compared with existing methods, but the computational cost is about 20% of the conventional methods. And we confirmed that appropriacy of the representative label is better results than the existing methods. As a result, it is expected to help data analysts to interpret the document cluster easier.

Designing Hierarchical User Interface Model for Browsing the Knowledge Structure of a Single Document Using MDS (MDS를 이용한 개별문서의 계층적 지식구조 브라우징 인터페이스 설계)

  • Han, Seung-Hee;Lee, Jae-Yun
    • Journal of Information Management
    • /
    • v.35 no.3
    • /
    • pp.125-138
    • /
    • 2004
  • The purpose of this study is to propose a hierarchical user interfaces for browsing the knowledge structure of a single document. To generate the hierarchical knowledge structure, hierarchical term clustering and cluster representative term selection were performed with a single thesis in information science field, and the result was applied to design the interfaces which browse a single document hierarchically using multidimensional scaling. The interfaces can be applied to develop the user-friendly information retrieval system.

A Text Summarization Model Based on Sentence Clustering (문장 클러스터링에 기반한 자동요약 모형)

  • 정영미;최상희
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.3
    • /
    • pp.159-178
    • /
    • 2001
  • This paper presents an automatic text summarization model which selects representative sentences from sentence clusters to create a summary. Summary generation experiments were performed on two sets of test documents after learning the optimum environment from a training set. Centroid clustering method turned out to be the most effective in clustering sentences, and sentence weight was found more effective than the similarity value between sentence and cluster centroid vectors in selecting a representative sentence from each cluster. The result of experiments also proves that inverse sentence weight as well as title word weight for terms and location weight for sentences are effective in improving the performance of summarization.

  • PDF

탐방-동국대학교 산업대학원 인쇄화상전공

  • Yu, Chang-Jun;Jo, Gap-Jun
    • 프린팅코리아
    • /
    • s.33
    • /
    • pp.108-113
    • /
    • 2005
  • 한번 몰아치면 일주일, 심지어는 열흘동안 위세를 떨쳤던 지난 겨울의 한파도 어김없이 밀려드는 봄기운에 힘없이 자리를 양보하던 2월. 인쇄산업 심장부에서 산학협력 클러스터를 형성한다는 막중한 역할을 자임하고 나선 동국대학교 산업대학원 인쇄화상전공을 찾았다. 봄을 부르는 입춘을 통과하고 있는 시점, 동국대학교는 이미 곳곳에서 생기를 뿜어내고 있었다. 지난 2004년 2학기에 개설, 산학협력의 새로운 길을 개척해 나가고 있는 동국대학교 산업대학원 인쇄화상전공에 대한 탐방은 유창준 편집국장과 이의수 인쇄화상전공 주임교수, 김성욱 인쇄 화상전공 1기 학생대표간의 인쇄산업과 산학협력에 대한 깊이 있는 인터뷰 중심으로 진행됐다.

  • PDF

Designing User Interface Model for Browsing the Knowledge Structure of a Single Document (개별문서의 지식구조 브라우징 인터페이스에 관한 연구)

  • Han, Seung-Hee;Lee, Jae-Yun
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2004.08a
    • /
    • pp.11-16
    • /
    • 2004
  • 이 연구에서는 현재의 정보검색 환경에서 이용자 친화적인 검색 시스템을 개발하기 위한 한 방안으로 개별문서의 지식구조 브라우징 인터페이스를 제안하였다. 개별문서에 대한 지식구조를 자동 생성하기위해 개별문서에 출현한 용어를 이용하여 용어 클러스터링과 클러스터 대표어 선정 작업을 수행하였고, 이를 대상으로 다차원 축척법을 이용하여 2차원 공간에 개별문서의 지식구조를 표현함으로써 이용자가 개별문서에 대해 보다 용이하게 절근할 수 있는 브라우징 인터페이스를 마련하였다.

  • PDF

Document Clustering Using Reference Titles (인용문헌 표제를 이용한 문헌 클러스터링에 관한 연구)

  • Choi, Sang-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.2
    • /
    • pp.241-252
    • /
    • 2010
  • Titles have been regarded as having effective clustering features, but they sometimes fail to represent the topic of a document and result in poorly generated document clusters. This study aims to improve the performance of document clustering with titles by suggesting titles in the citation bibliography as a clustering feature. Titles of original literature, titles in the citation bibliography, and an aggregation of both titles were adapted to measure the performance of clustering. Each feature was combined with three hierarchical clustering methods, within group average linkage, complete linkage, and Ward's method in the clustering experiment. The best practice case of this experiment was clustering document with features from both titles by within-groups average method.

A Single-End-Point DTW Algorithm for Keyword Spotting (핵심어 검출을 위한 단일 끝점 DTW알고리즘)

  • 최용선;오상훈;이수영
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.209-219
    • /
    • 2004
  • In order to implement a real time hardware for keyword spotting, we propose a Single-End-Point DTW(SEP-DTW) algorithm which is simple and less complex for computation. The SEP-DTW algorithm only needs a single end point which enables efficient applications, and it has a small wont of computations because the global search area is divided into successive local search areas. Also, we adopt new local constraints and a new distance measure for a better performance of the SEP-DTW algorithm. Besides, we make a normalization of feature same vectors so that they have the same variance in each frequency bin, and each frame has the same energy levels. To construct several reference patterns for each keyword, we use a clustering algorithm for all training patterns, and mean vectors in every cluster are taken as reference patterns. In order to detect a key word for input streams of speech, we measure the distances between reference patterns and input pattern, and we make a decision whether the distances are smaller than a pre-defined threshold value. With isolated speech recognition and keyword spotting experiments, we verify that the proposed algorithm has a better performance than other methods.

Latent Semantic Indexing Analysis of K-Means Document Clustering for Changing Index Terms Weighting (색인어 가중치 부여 방법에 따른 K-Means 문서 클러스터링의 LSI 분석)

  • Oh, Hyung-Jin;Go, Ji-Hyun;An, Dong-Un;Park, Soon-Chul
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.735-742
    • /
    • 2003
  • In the information retrieval system, document clustering technique is to provide user convenience and visual effects by rearranging documents according to the specific topics from the retrieved ones. In this paper, we clustered documents using K-Means algorithm and present the effect of index terms weighting scheme on the document clustering. To verify the experiment, we applied Latent Semantic Indexing approach to illustrate the clustering results and analyzed the clustering results in 2-dimensional space. Experimental results showed that in case of applying local weighting, global weighting and normalization factor, the density of clustering is higher than those of similar or same weighting schemes in 2-dimensional space. Especially, the logarithm of local and global weighting is noticeable.