• Title/Summary/Keyword: K-means++ algorithm

Search Result 1,367, Processing Time 0.023 seconds

Approximate k values using Repulsive Force without Domain Knowledge in k-means

  • Kim, Jung-Jae;Ryu, Minwoo;Cha, Si-Ho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.3
    • /
    • pp.976-990
    • /
    • 2020
  • The k-means algorithm is widely used in academia and industry due to easy and simple implementation, enabling fast learning for complex datasets. However, k-means struggles to classify datasets without prior knowledge of specific domains. We proposed the repulsive k-means (RK-means) algorithm in a previous study to improve the k-means algorithm, using the repulsive force concept, which allows deleting unnecessary cluster centroids. Accordingly, the RK-means enables to classifying of a dataset without domain knowledge. However, three main problems remain. The RK-means algorithm includes a cluster repulsive force offset, for clusters confined in other clusters, which can cause cluster locking; we were unable to prove RK-means provided optimal convergence in the previous study; and RK-means shown better performance only normalize term and weight. Therefore, this paper proposes the advanced RK-means (ARK-means) algorithm to resolve the RK-means problems. We establish an initialization strategy for deploying cluster centroids and define a metric for the ARK-means algorithm. Finally, we redefine the mass and normalize terms to close to the general dataset. We show ARK-means feasibility experimentally using blob and iris datasets. Experiment results verify the proposed ARK-means algorithm provides better performance than k-means, k'-means, and RK-means.

An Implementation of K-Means Algorithm Improving Cluster Centroids Decision Methodologies (클러스터 중심 결정 방법을 개선한 K-Means 알고리즘의 구현)

  • Lee Shin-Won;Oh HyungJin;An Dong-Un;Jeong Seong-Jong
    • The KIPS Transactions:PartB
    • /
    • v.11B no.7 s.96
    • /
    • pp.867-874
    • /
    • 2004
  • K-Means algorithm is a non-hierarchical (plat) and reassignment techniques and iterates algorithm steps on the basis of K cluster centroids until the clustering results converge into K clusters. In its nature, K-Means algorithm has characteristics which make different results depending on the initial and new centroids. In this paper, we propose the modified K-Means algorithm which improves the initial and new centroids decision methodologies. By evaluating the performance of two algorithms using the 16 weighting scheme of SMART system, the modified algorithm showed $20{\%}$ better results on recall and F-measure than those of K-Means algorithm, and the document clustering results are quite improved.

The Enhancement of Learning Time in Fuzzy c-means algorithm (학습시간을 개선한 Fuzzy c-means 알고리즘)

  • 김형철;조제황
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2001.06a
    • /
    • pp.113-116
    • /
    • 2001
  • The conventional K-means algorithm is widely used in vector quantizer design and clustering analysis. Recently modified K-means algorithm has been proposed where the codevector updating step is as fallows: new codevector = current codevector + scale factor (new centroid - current codevector). This algorithm uses a fixed value for the scale factor. In this paper, we propose a new algorithm for the enhancement of learning time in fuzzy c-means a1gorithm. Experimental results show that the proposed method produces codebooks about 5 to 6 times faster than the conventional K-means algorithm with almost the same Performance.

  • PDF

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.4
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.

Pattern Analysis and Performance Comparison of Lottery Winning Numbers

  • Jung, Yong Gyu;Han, Soo Ji;kim, Jae Hee
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.6 no.1
    • /
    • pp.16-22
    • /
    • 2014
  • Clustering methods such as k-means and EM are the group of classification and pattern recognition, which are used in management science and literature search widely. In this paper, k-means and EM algorithm are compared the performance using by Weka. The winning Lottery numbers of 567 cases are experimented for our study and presentation. Processing speed of the k-means algorithm is superior to the EM algorithm, which is about 0.08 seconds faster than the other. As the result it is summerized that EM algorithm is better than K-means algorithm with comparison of accuracy, precision and recall. While K-means is known to be sensitive to the distribution of data, EM algorithm is probability sensitive for clustering.

On hierarchical clustering in sufficient dimension reduction

  • Yoo, Chaeyeon;Yoo, Younju;Um, Hye Yeon;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.4
    • /
    • pp.431-443
    • /
    • 2020
  • The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

Cloudy Area Detection in Satellite Image using K-Means & GHA (K-Means 와 GHA를 이용한 위성영상 구름영역 검출)

  • 서석배;김종우;최해진
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.405-408
    • /
    • 2003
  • This paper proposes a new algorithm for cloudy area detection using K-Means and GHA (Generalized Hebbian Algorithm). K-Means is one of simple classification algorithm, and GHA is unsupervised neural network for data compression and pattern classification. Proposed algorithm is based on block based image processing that size is l6$\times$l6. Experimental results shows good performance of cloudy area detection except blur cloudy areas.

  • PDF

Initial Mode Decision Method for Clustering in Categorical Data

  • Yang, Soon-Cheol;Kang, Hyung-Chang;Kim, Chul-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.481-488
    • /
    • 2007
  • The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. The k-modes algorithm is to extend the k-means paradigm to categorical domains. The algorithm requires a pre-setting or random selection of initial points (modes) of the clusters. This paper improved the problem of k-modes algorithm, using the Max-Min method that is a kind of methods to decide initial values in k-means algorithm. we introduce new similarity measures to deal with using the categorical data for clustering. We show that the mushroom data sets and soybean data sets tested with the proposed algorithm has shown a good performance for the two aspects(accuracy, run time).

  • PDF

A Codebook Generation Algorithm Using a New Updating Condition (새로운 갱신조건을 적용한 부호책 생성 알고리즘)

  • 김형철;조제황
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.5 no.3
    • /
    • pp.205-209
    • /
    • 2004
  • The K-means algorithm is the most widely used method among the codebook generation algorithms in vector quantization. In this paper, we propose a codebook generation algorithm using a new updating condition to enhance the codebook performance. The conventional K-means algorithm uses a fixed weight of the distance for all training iterations, but the proposed method uses different weights according to the updating condition from the new codevectors for training iterations. Then, different weights can be applied to generate codevectors at each iteration according to this condition, and it can have a similar effect to variable weights. Experimental results show that the proposed algorithm has the better codebook performance than that of K-means algorithm.

  • PDF

An Implementation of K-Means Algorithm improving cluster centroids decision methodologies (클러스터 중심 결정 방법을 개선한 K-Means Algorithm의 구현)

  • Cho, Si-Sung;Kim, Ho-Young;Oh, Hyung-Jin;Lee, Shin-Won;An, Dong-Un;Chung, Sung-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.373-376
    • /
    • 2002
  • K-Means 알고리즘은 재배치 기법의 일종으로 K 개의 초기 클러스터중심(centroid)를 중심으로 K 개의 클러스터가 될 때까지 클러스터링을 반복하는 것이다. K-Means 알고리즘은 특성상 초기 클러스터 중심과 새롭게 생성된 클러스터 중심에 따라 클러스터링 결과가 달라진다. 본 논문에서는 K-Means Algorithm 의 초기 클러스터중심 선택 방법과 새로운 클러스터 중심 결정 방법을 개선한 변형 K-Means Algorithm을 제안한다. SMART 시스템에서 제안한 16가지 가중치 계산 방식에 의하여 두 알고리즘의 성능을 평가한 결과 제안한 변형 알고리즘이 재현률과 F-Measure 에서 20%이상 향상된 결과를 얻을 수 있었으며 특정 주제 아래 문서가 할당되는 클러스터링 성능이 우수하였다.

  • PDF