• Title/Summary/Keyword: 지도 군집화

Search Result 592, Processing Time 0.034 seconds

Visualizing Cluster Hierarchy Using Hierarchy Generation Framework (계층 발생 프레임워크를 이용한 군집 계층 시각화)

  • Shin, DongHwa;L'Yi, Sehi;Seo, Jinwook
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.6
    • /
    • pp.436-441
    • /
    • 2015
  • There are many types of clustering algorithms such as centroid, hierarchical, or density-based methods. Each algorithm has unique data grouping principles, which creates different varieties of clusters. Ordering Points To Identify the Clustering Structure (OPTICS) is a well-known density-based algorithm to analyze arbitrary shaped and varying density clusters, but the obtained clusters only correlate loosely. Hierarchical agglomerative clustering (HAC) reveals a hierarchical structure of clusters, but is unable to clearly find non-convex shaped clusters. In this paper, we provide a novel hierarchy generation framework and application which can aid users by combining the advantages of the two clustering methods.

A Study on Feature Extraction Performance of Naive Convolutional Auto Encoder to Natural Images (자연 영상에 대한 Naive Convolutional Auto Encoder의 특징 추출 성능에 관한 연구)

  • Lee, Sung Ju;Cho, Nam Ik
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1286-1289
    • /
    • 2022
  • 최근 영상 군집화 분야는 딥러닝 모델에게 Self-supervision을 주거나 unlabeled 영상에 유사-레이블을 주는 방식으로 연구되고 있다. 또한, 고차원 컬러 자연 영상에 대해 잘 압축된 특징 벡터를 추출하는 것은 군집화에 있어 중요한 기준이 된다. 본 연구에서는 자연 영상에 대한 Convolutional Auto Encoder의 특징 추출 성능을 평가하기 위해 설계한 실험 방법을 소개한다. 특히 모델의 특징 추출 능력을 순수하게 확인하기 위하여 Self-supervision 및 유사-레이블을 제공하지 않은 채 Naive한 모델의 결과를 분석할 것이다. 먼저 실험을 위해 설계된 4가지 비지도학습 모델의 복원 결과를 통해 모델별 학습 정도를 확인한다. 그리고 비지도 모델이 다량의 unlabeled 영상으로 학습되어도 더 적은 labeled 데이터로 학습된 지도학습 모델의 특징 추출 성능에 못 미침을 특징 벡터의 군집화 및 분류 실험 결과를 통해 확인한다. 또한, 지도학습 모델에 데이터셋 간 교차 학습을 수행하여 출력된 특징 벡터의 군집화 및 분류 성능도 확인한다.

  • PDF

Comparison of journal clustering methods based on citation structure (논문 인용에 따른 학술지 군집화 방법의 비교)

  • Kim, Jinkwang;Kim, Sohyung;Oh, Changhyuck
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.827-839
    • /
    • 2015
  • Extraction of communities from a journal citation database by the citation structure is a useful tool to see closely related groups of the journals. SCI of Thomson Reuters or SCOPUS of Elsevier have had tried to grasp community structure of the journals in their indices according to citation relationships, but such a trial has not been made yet with the Korean Citation Index, KCI. Therefore, in this study, we extracted communities of the journals of the natural science area in KCI, using various clustering algorithms for a social network based on citations among the journals and compared the groups obtained with the classfication of KCI. The infomap algorithm, one of the clustering methods applied in this article, showed the best grouping result in the sense that groups obtained by it are closer to the KCI classification than by other algorithms considered and reflect well the citation structure of the journals. The classification results obtained in this study might be taken consideration when reclassification of the KCI journals will be made in the future.

Hierarchical Browsing Interface for Geo-Referenced Photo Database (위치 정보를 갖는 사진집합의 계층적 탐색 인터페이스)

  • Lee, Seung-Hoon;Lee, Kang-Hoon
    • Journal of the Korea Computer Graphics Society
    • /
    • v.16 no.4
    • /
    • pp.25-33
    • /
    • 2010
  • With the popularization of digital photography, people are now capturing and storing far more photos than ever before. However, the enormous number of photos often discourages the users to identify desired photos. In this paper, we present a novel method for fast and intuitive browsing through large collections of geo-referenced photographs. Given a set of photos, we construct a hierarchical structure of clusters such that each cluster includes a set of spatially adjacent photos and its sub-clusters divide the photo set disjointly. For each cluster, we pre-compute its convex hull and the corresponding polygon area. At run-time, this pre-computed data allows us to efficiently visualize only a fraction of the clusters that are inside the current view and have easily recognizable sizes with respect to the current zoom level. Each cluster is displayed as a single polygon representing its convex hull instead of every photo location included in the cluster. The users can quickly transfer from clusters to clusters by simply selecting any interesting clusters. Our system automatically pans and zooms the view until the currently selected cluster fits precisely into the view with a moderate size. Our user study demonstrates that these new visualization and interaction techniques can significantly improve the capability of navigating over large collections of geo-referenced photos.

A Study on Optimizing the Number of Clusters using External Cluster Relationship Criterion (외부 군집 연관 기준 정보를 이용한 군집수 최적화)

  • Lee, Hyun-Jin;Jee, Tae-Chang
    • Journal of Digital Contents Society
    • /
    • v.12 no.3
    • /
    • pp.339-345
    • /
    • 2011
  • The k-means has been one of the popular, simple and faster clustering algorithms, but the right value of k is unknown. The value of k (the number of clusters) is a very important element because the result of clustering is different depending on it. In this paper, we present a novel algorithm based on an external cluster relationship criterion which is an evaluation metric of clustering result to determine the number of clusters dynamically. Experimental results show that our algorithm is superior to other methods in terms of the accuracy of the number of clusters.

Clustering Optimization Cluster Count Determination for Tourist Destination Recommendation (관광지 추천을 위한 클러스터링 최적화 군집수 결정)

  • Hae-Jin Yeo;In-Whee Joe
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.371-373
    • /
    • 2023
  • factor 들이 많은 데이터의 군집화는 어려움을 요한다. K-means 클러스터링을 사용하여 군집화를 할 때, 각 데이터들이 가진 factor 의 개수가 상이한 경우 비슷한 성향을 가진 데이터임에도 불구하고 클러스터링이 적합하게 되지 않는 현상이 발생한다. 이러한 문제점을 해결하기 위해 최적의 군집화 개수를 결정하는 실루엣 기반 방법을 제안하고 제안기법의 성능을 평가한다.

Diabetes Predictive Analytics using FCM Clustering based Supervised Learning Algorithm (FCM 클러스터링 기반 지도 학습 알고리즘을 이용한 당뇨병 예측 분석)

  • Park, Tae-eun;Kim, Kwang-baek
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.580-582
    • /
    • 2022
  • 본 논문에서는 데이터를 정량화하여 특징을 분류하기 위한 방법으로 퍼지 클러스터링 기반 지도 학습 방법을 제안한다. 제안된 방법은 FCM 클러스터링을 기법을 적용하여 군집화를 수행한다. 그리고 군집화 된 데이터들 중에서는 정확히 분류되지 않은 데이터가 존재하므로 분류되지 않은 데이터에 대해 지도 학습 방법을 적용한다. 본 논문에서는 당뇨병의 유무를 타겟 데이터로 설정하고 나머지 8개의 속성의 데이터를 FCM 기반 지도 학습 방법을 적용하여 당뇨병의 유무를 예측한다. 당뇨병 예측에 대한 성능을 30회의 K-겹 교차검증 (K-Fold Corss Validation)을 이용하여 평가하였으며, 다층 퍼셉트론의 경우에는 훈련 데이터가 77.88%, 테스트 데이터가 62.78%로 나타났고 제안된 방법의 경우에는 훈련 데이터가 79.96%, 테스트 데이터 74.16%로 나타났다.

  • PDF

Document Clustering with Relational Graph Of Common Phrase and Suffix Tree Document Model (공통 Phrase의 관계 그래프와 Suffix Tree 문서 모델을 이용한 문서 군집화 기법)

  • Cho, Yoon-Ho;Lee, Sang-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.2
    • /
    • pp.142-151
    • /
    • 2009
  • Previous document clustering method, NSTC measures similarities between two document pairs using TF-IDF during web document clustering. In this paper, we propose new similarity measure using common phrase-based relational graph, not TF-IDF. This method suggests that weighting common phrases by relational graph presenting relationship among common phrases in document collection. And experimental results indicate that proposed method is more effective in clustering document collection than NSTC.

A Study On Predicting Stock Prices Of Hallyu Content Companies Using Two-Stage k-Means Clustering (2단계 k-평균 군집화를 활용한 한류컨텐츠 기업 주가 예측 연구)

  • Kim, Jeong-Woo
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.169-179
    • /
    • 2021
  • This study shows that the two-stage k-means clustering method can improve prediction performance by predicting the stock price, To this end, this study introduces the two-stage k-means clustering algorithm and tests the prediction performance through comparison with various machine learning techniques. It selects the cluster close to the prediction target obtained from the k-means clustering, and reapplies the k-means clustering method to the cluster to search for a cluster closer to the actual value. As a result, the predicted value of this method is shown to be closer to the actual stock price than the predicted values of other machine learning techniques. Furthermore, it shows a relatively stable predicted value despite the use of a relatively small cluster. Accordingly, this method can simultaneously improve the accuracy and stability of prediction, and it can be considered as the new clustering method useful for small data. In the future, developing the two-stage k-means clustering is required for the large-scale data application.

Word Cluster-based Mobile Application Categorization (단어 군집 기반 모바일 애플리케이션 범주화)

  • Heo, Jeongman;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.3
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we propose a mobile application categorization method using word cluster information. Because the mobile application description can be shortly written, the proposed method utilizes the word cluster seeds as well as the words in the mobile application description, as categorization features. For the fragmented categories of the mobile applications, the proposed method generates the word clusters by applying the frequency of word occurrence per category to K-means clustering algorithm. Since the mobile application description can include some paragraphs unrelated to the categorization, such as installation specifications, the proposed method uses some word clusters useful for the categorization. Experiments show that the proposed method improves the recall (5.65%) by using the word cluster information.