• Title/Summary/Keyword: k-means clustering Algorithm

Search Result 545, Processing Time 0.029 seconds

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.4
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.

On hierarchical clustering in sufficient dimension reduction

  • Yoo, Chaeyeon;Yoo, Younju;Um, Hye Yeon;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.4
    • /
    • pp.431-443
    • /
    • 2020
  • The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

Geodesic Clustering for Covariance Matrices

  • Lee, Haesung;Ahn, Hyun-Jung;Kim, Kwang-Rae;Kim, Peter T.;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.4
    • /
    • pp.321-331
    • /
    • 2015
  • The K-means clustering algorithm is a popular and widely used method for clustering. For covariance matrices, we consider a geodesic clustering algorithm based on the K-means clustering framework in consideration of symmetric positive definite matrices as a Riemannian (non-Euclidean) manifold. This paper considers a geodesic clustering algorithm for data consisting of symmetric positive definite (SPD) matrices, utilizing the Riemannian geometric structure for SPD matrices and the idea of a K-means clustering algorithm. A K-means clustering algorithm is divided into two main steps for which we need a dissimilarity measure between two matrix data points and a way of computing centroids for observations in clusters. In order to use the Riemannian structure, we adopt the geodesic distance and the intrinsic mean for symmetric positive definite matrices. We demonstrate our proposed method through simulations as well as application to real financial data.

The Document Clustering using Multi-Objective Genetic Algorithms (다목적 유전자 알고리즘을 이용한문서 클러스터링)

  • Lee, Jung-Song;Park, Soon-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.2
    • /
    • pp.57-64
    • /
    • 2012
  • In this paper, the multi-objective genetic algorithm is proposed for the document clustering which is important in the text mining field. The most important function in the document clustering algorithm is to group the similar documents in a corpus. So far, the k-means clustering and genetic algorithms are much in progress in this field. However, the k-means clustering depends too much on the initial centroid, the genetic algorithm has the disadvantage of coming off in the local optimal value easily according to the fitness function. In this paper, the multi-objective genetic algorithm is applied to the document clustering in order to complement these disadvantages while its accuracy is analyzed and compared to the existing algorithms. In our experimental results, the multi-objective genetic algorithm introduced in this paper shows the accuracy improvement which is superior to the k-means clustering(about 20 %) and the general genetic algorithm (about 17 %) for the document clustering.

Exponential Probability Clustering

  • Yuxi, Hou;Park, Cheol-Hoon
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.671-672
    • /
    • 2008
  • K-means is a popular one in clustering algorithms, and it minimizes the mutual euclidean distance among the sample points. But K-means has some demerits, such as depending on initial condition, unsupervised learning and local optimum. However mahalanobis distancecan deal this case well. In this paper, the author proposed a new clustering algorithm, named exponential probability clustering, which applied Mahalanobis distance into K-means clustering. This new clustering does possess not only the probability interpretation, but also clustering merits. Finally, the simulation results also demonstrate its good performance compared to K-means algorithm.

  • PDF

Clustering load patterns recorded from advanced metering infrastructure (AMI로부터 측정된 전력사용데이터에 대한 군집 분석)

  • Ann, Hyojung;Lim, Yaeji
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.969-977
    • /
    • 2021
  • We cluster the electricity consumption of households in A-apartment in Seoul, Korea using Hierarchical K-means clustering algorithm. The data is recorded from the advanced metering infrastructure (AMI), and we focus on the electricity consumption during evening weekdays in summer. Compare to the conventional clustering algorithms, Hierarchical K-means clustering algorithm is recently applied to the electricity usage data, and it can identify usage patterns while reducing dimension. We apply Hierarchical K-means algorithm to the AMI data, and compare the results based on the various clustering validity indexes. The results show that the electricity usage patterns are well-identified, and it is expected to be utilized as a major basis for future applications in various fields.

Inverted Index based Modified Version of K-Means Algorithm for Text Clustering

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.2
    • /
    • pp.67-76
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of k means algorithm to be adaptable to string vectors for text clustering. Traditionally, when k means algorithm is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text clustering, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the k means algorithm adaptable to string vectors for text clustering.

A Study of Similar Blog Recommendation System Using Termite Colony Algorithm (흰개미 군집 알고리즘을 이용한 유사 블로그 추천 시스템에 관한 연구)

  • Jeong, Gi Sung;Jo, I-Seok;Lee, Malrey
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.83-88
    • /
    • 2013
  • This paper proposes a recommending system of the similar blogs gathered with similarities between blogs according to the similarity, dividing words, for each frequency, that individual blogs have. It improved the algorithm of k-means, using the model of the habits of white ants for better performance of clustering, and showed better performance of clustering as a result of evaluating and comparing with the existing algorithm of k-means as the improved algorithm. The recommending system of similar blog was designed and embodied, using the improved algorithm. TCA can reduce clustering time and the number of moving time for clustering compare with K-means algorithm.

An Improved Automated Spectral Clustering Algorithm

  • Xiaodan Lv
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.185-199
    • /
    • 2024
  • In this paper, an improved automated spectral clustering (IASC) algorithm is proposed to address the limitations of the traditional spectral clustering (TSC) algorithm, particularly its inability to automatically determine the number of clusters. Firstly, a cluster number evaluation factor based on the optimal clustering principle is proposed. By iterating through different k values, the value corresponding to the largest evaluation factor was selected as the first-rank number of clusters. Secondly, the IASC algorithm adopts a density-sensitive distance to measure the similarity between the sample points. This rendered a high similarity to the data distributed in the same high-density area. Thirdly, to improve clustering accuracy, the IASC algorithm uses the cosine angle classification method instead of K-means to classify the eigenvectors. Six algorithms-K-means, fuzzy C-means, TSC, EIGENGAP, DBSCAN, and density peak-were compared with the proposed algorithm on six datasets. The results show that the IASC algorithm not only automatically determines the number of clusters but also obtains better clustering accuracy on both synthetic and UCI datasets.

Fuzzy k-Means Local Centers of the Social Networks

  • Woo, Won-Seok;Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.2
    • /
    • pp.213-217
    • /
    • 2012
  • Fuzzy k-means clustering is an attractive alternative to the ordinary k-means clustering in analyzing multivariate data. Fuzzy versions yield more natural output by allowing overlapped k groups. In this study, we modify a fuzzy k-means clustering algorithm to be used for undirected social networks, apply the algorithm to both real and simulated cases, and report the results.