• 제목/요약/키워드: K-means++ algorithm

검색결과 1,367건 처리시간 0.028초

Approximate k values using Repulsive Force without Domain Knowledge in k-means

  • Kim, Jung-Jae;Ryu, Minwoo;Cha, Si-Ho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권3호
    • /
    • pp.976-990
    • /
    • 2020
  • The k-means algorithm is widely used in academia and industry due to easy and simple implementation, enabling fast learning for complex datasets. However, k-means struggles to classify datasets without prior knowledge of specific domains. We proposed the repulsive k-means (RK-means) algorithm in a previous study to improve the k-means algorithm, using the repulsive force concept, which allows deleting unnecessary cluster centroids. Accordingly, the RK-means enables to classifying of a dataset without domain knowledge. However, three main problems remain. The RK-means algorithm includes a cluster repulsive force offset, for clusters confined in other clusters, which can cause cluster locking; we were unable to prove RK-means provided optimal convergence in the previous study; and RK-means shown better performance only normalize term and weight. Therefore, this paper proposes the advanced RK-means (ARK-means) algorithm to resolve the RK-means problems. We establish an initialization strategy for deploying cluster centroids and define a metric for the ARK-means algorithm. Finally, we redefine the mass and normalize terms to close to the general dataset. We show ARK-means feasibility experimentally using blob and iris datasets. Experiment results verify the proposed ARK-means algorithm provides better performance than k-means, k'-means, and RK-means.

클러스터 중심 결정 방법을 개선한 K-Means 알고리즘의 구현 (An Implementation of K-Means Algorithm Improving Cluster Centroids Decision Methodologies)

  • 이신원;오형진;안동언;정성종
    • 정보처리학회논문지B
    • /
    • 제11B권7호
    • /
    • pp.867-874
    • /
    • 2004
  • K-Means 알고리즘은 재배치 기법의 일종으로 K개의 초기 센트로이드를 중심으로 K개의 클러스터가 될 때까지 클러스터링을 반복하는 것이다. 알고리즘의 특성상 K-Means 알고리즘은 초기 클러스터 센트로이드(중심) 및 클러스터 중심을 결정하는 방법에 따라 다른 클러스터링 결과를 얻을 수 있다. 본 논문에서는 K-Means 알고리즘을 이용한 초기 클러스터 중심 및 클러스터 중심을 결정하는 방법을 개선한 변형 K-Means 알고리즘을 제안한다. 제안한 알고리즘의 평가를 위하여 SMART 시스템의 16가지 가중치 계산 방식을 이용하여 성능을 평가한 결과 변형 K-Means알고리즘이 K-Means 알고리즘보다 재현률과 F-Measure에서 $20{\%}$이상 향상된 결과를 얻을 수 있었으며 특정 주제 아래 관련 문서가 할당되는 클러스터링 성능이 우수함을 알 수 있었다.

학습시간을 개선한 Fuzzy c-means 알고리즘 (The Enhancement of Learning Time in Fuzzy c-means algorithm)

  • 김형철;조제황
    • 융합신호처리학회 학술대회논문집
    • /
    • 한국신호처리시스템학회 2001년도 하계 학술대회 논문집(KISPS SUMMER CONFERENCE 2001
    • /
    • pp.113-116
    • /
    • 2001
  • The conventional K-means algorithm is widely used in vector quantizer design and clustering analysis. Recently modified K-means algorithm has been proposed where the codevector updating step is as fallows: new codevector = current codevector + scale factor (new centroid - current codevector). This algorithm uses a fixed value for the scale factor. In this paper, we propose a new algorithm for the enhancement of learning time in fuzzy c-means a1gorithm. Experimental results show that the proposed method produces codebooks about 5 to 6 times faster than the conventional K-means algorithm with almost the same Performance.

  • PDF

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • 제6권4호
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.

Pattern Analysis and Performance Comparison of Lottery Winning Numbers

  • Jung, Yong Gyu;Han, Soo Ji;kim, Jae Hee
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제6권1호
    • /
    • pp.16-22
    • /
    • 2014
  • Clustering methods such as k-means and EM are the group of classification and pattern recognition, which are used in management science and literature search widely. In this paper, k-means and EM algorithm are compared the performance using by Weka. The winning Lottery numbers of 567 cases are experimented for our study and presentation. Processing speed of the k-means algorithm is superior to the EM algorithm, which is about 0.08 seconds faster than the other. As the result it is summerized that EM algorithm is better than K-means algorithm with comparison of accuracy, precision and recall. While K-means is known to be sensitive to the distribution of data, EM algorithm is probability sensitive for clustering.

On hierarchical clustering in sufficient dimension reduction

  • Yoo, Chaeyeon;Yoo, Younju;Um, Hye Yeon;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제27권4호
    • /
    • pp.431-443
    • /
    • 2020
  • The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

K-Means 와 GHA를 이용한 위성영상 구름영역 검출 (Cloudy Area Detection in Satellite Image using K-Means & GHA)

  • 서석배;김종우;최해진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 신호처리소사이어티 추계학술대회 논문집
    • /
    • pp.405-408
    • /
    • 2003
  • This paper proposes a new algorithm for cloudy area detection using K-Means and GHA (Generalized Hebbian Algorithm). K-Means is one of simple classification algorithm, and GHA is unsupervised neural network for data compression and pattern classification. Proposed algorithm is based on block based image processing that size is l6$\times$l6. Experimental results shows good performance of cloudy area detection except blur cloudy areas.

  • PDF

Initial Mode Decision Method for Clustering in Categorical Data

  • Yang, Soon-Cheol;Kang, Hyung-Chang;Kim, Chul-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권2호
    • /
    • pp.481-488
    • /
    • 2007
  • The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. The k-modes algorithm is to extend the k-means paradigm to categorical domains. The algorithm requires a pre-setting or random selection of initial points (modes) of the clusters. This paper improved the problem of k-modes algorithm, using the Max-Min method that is a kind of methods to decide initial values in k-means algorithm. we introduce new similarity measures to deal with using the categorical data for clustering. We show that the mushroom data sets and soybean data sets tested with the proposed algorithm has shown a good performance for the two aspects(accuracy, run time).

  • PDF

새로운 갱신조건을 적용한 부호책 생성 알고리즘 (A Codebook Generation Algorithm Using a New Updating Condition)

  • 김형철;조제황
    • 융합신호처리학회논문지
    • /
    • 제5권3호
    • /
    • pp.205-209
    • /
    • 2004
  • 벡터양자화에서 사용되는 부호책 생성 알고리즘들 중에서 가장 널리 사용되는 방법은 K-means 알고리즘이다. 본 논문에서는 부호책의 성능 개선을 위해 새로운 갱신조건을 적용한 부호책 생성 알고리즘을 제안한다. 기존의 K-means 알고리즘은 모든 학습반복 과정 동안 부호벡터 갱신 시 거리의 가중치를 고정하지만, 제안된 방법은 학습반복 과정에서 새로운 부호벡터의 갱신 조건에 따라서 다른 가중치를 적용하여 부호책을 구한다. 따라서, 갱신 조건에 의해 부호벡터에 다른 가중치를 적용할 수 있고, 학습반복 과정마다 가변되는 가중치를 적용하는 효과를 얻을 수 있다. 실험 결과 K-means 알고리즘보다 부호책의 성능이 향상됨을 확인하였다.

  • PDF

클러스터 중심 결정 방법을 개선한 K-Means Algorithm의 구현 (An Implementation of K-Means Algorithm improving cluster centroids decision methodologies)

  • 조시성;김호영;오형진;이신원;안동언;정성종
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2002년도 추계학술발표논문집 (상)
    • /
    • pp.373-376
    • /
    • 2002
  • K-Means 알고리즘은 재배치 기법의 일종으로 K 개의 초기 클러스터중심(centroid)를 중심으로 K 개의 클러스터가 될 때까지 클러스터링을 반복하는 것이다. K-Means 알고리즘은 특성상 초기 클러스터 중심과 새롭게 생성된 클러스터 중심에 따라 클러스터링 결과가 달라진다. 본 논문에서는 K-Means Algorithm 의 초기 클러스터중심 선택 방법과 새로운 클러스터 중심 결정 방법을 개선한 변형 K-Means Algorithm을 제안한다. SMART 시스템에서 제안한 16가지 가중치 계산 방식에 의하여 두 알고리즘의 성능을 평가한 결과 제안한 변형 알고리즘이 재현률과 F-Measure 에서 20%이상 향상된 결과를 얻을 수 있었으며 특정 주제 아래 문서가 할당되는 클러스터링 성능이 우수하였다.

  • PDF