• Title/Summary/Keyword: K-means Algorithm

Search Result 1,363, Processing Time 0.023 seconds

A Study on Performance Evaluation of Clustering Algorithms using Neural and Statistical Method (클러스터링 성능평가: 신경망 및 통계적 방법)

  • 윤석환;신용백
    • Journal of the Korean Professional Engineers Association
    • /
    • v.29 no.2
    • /
    • pp.71-79
    • /
    • 1996
  • This paper evaluates the clustering performance of a neural network and a statistical method. Algorithms which are used in this paper are the GLVQ(Generalized Loaming vector Quantization) for a neural method and the k -means algorithm for a statistical clustering method. For comparison of two methods, we calculate the Rand's c statistics. As a result, the mean of c value obtained with the GLVQ is higher than that obtained with the k -means algorithm, while standard deviation of c value is lower. Experimental data sets were the Fisher's IRIS data and patterns extracted from handwritten numerals.

  • PDF

Analysis of Document Clustering Varing Cluster Centroid Decisions (클러스터 중심 결정 방법에 따른 문서 클러스터링 성능 분석)

  • 오형진;변동률;이신원;박순철;정성종;안동언
    • Proceedings of the IEEK Conference
    • /
    • 2002.06c
    • /
    • pp.99-102
    • /
    • 2002
  • K-means clustering algorithm is a very popular clustering technique, which is used in the field of information retrieval. In this paper, We deal with the problem of K-means Algorithm from the view of creating the centroids and suggest a method reflecting document feature and considering the context of each document to determine the new centroids during the process of forming new centroids. For experiment, We used the automatic document summarizer to summarize the Reuter21578 newslire test dataset and achieved 20% improved results to the recall metrics.

  • PDF

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

Context-awareness User Analysis based on Clustering Algorithm (클러스터링 알고리즘기반의 상황인식 사용자 분석)

  • Lee, Kang-whan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.942-948
    • /
    • 2020
  • In this paper, we propose a clustered algorithm that possible more efficient user distinction within clustering using context-aware attribute information. In typically, the data provided to classify interrelationships within cluster information in the process of clustering data will be as a degrade factor if new or newly processing information is treated as contaminated information in comparative information. In this paper, we have developed a clustering algorithm that can extract user's recognition information to solve this problem in using K-means algorithm. The proposed algorithm analyzes the user's clustering attributed parameters from user clusters using accumulated information and clustering according to their attributes. The results of the simulation with the proposed algorithm showed that the user management system was more adaptable in terms of classifying and maintaining multiple users in clusters.

Analysis of COVID-19 Context-awareness based on Clustering Algorithm (클러스터링 알고리즘기반의 COVID-19 상황인식 분석)

  • Lee, Kangwhan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.5
    • /
    • pp.755-762
    • /
    • 2022
  • This paper propose a clustered algorithm that possible more efficient COVID-19 disease learning prediction within clustering using context-aware attribute information. In typically, clustering of COVID-19 diseases provides to classify interrelationships within disease cluster information in the clustering process. The clustering data will be as a degrade factor if new or newly processing information during treated as contaminated factors in comparative interrelationships information. In this paper, we have shown the solving the problems and developed a clustering algorithm that can extracting disease correlation information in using K-means algorithm. According to their attributes from disease clusters using accumulated information and interrelationships clustering, the proposed algorithm analyzes the disease correlation clustering possible and centering points. The proposed algorithm showed improved adaptability to prediction accuracy of the classification management system in terms of learning as a group of multiple disease attribute information of COVID-19 through the applied simulation results.

Repeated Clustering to Improve the Discrimination of Typical Daily Load Profile

  • Kim, Young-Il;Ko, Jong-Min;Song, Jae-Ju;Choi, Hoon
    • Journal of Electrical Engineering and Technology
    • /
    • v.7 no.3
    • /
    • pp.281-287
    • /
    • 2012
  • The customer load profile clustering method is used to make the TDLP (Typical Daily Load Profile) to estimate the quarter hourly load profile of non-AMR (Automatic Meter Reading) customers. This study examines how the repeated clustering method improves the ability to discriminate among the TDLPs of each cluster. The k-means algorithm is a well-known clustering technology in data mining. Repeated clustering groups the cluster into sub-clusters with the k-means algorithm and chooses the sub-cluster that has the maximum average error and repeats clustering until the final cluster count is satisfied.

Combined Artificial Bee Colony for Data Clustering (융합 인공벌군집 데이터 클러스터링 방법)

  • Kang, Bum-Su;Kim, Sung-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • Data clustering is one of the most difficult and challenging problems and can be formally considered as a particular kind of NP-hard grouping problems. The K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, it has high possibility to trap in local optimum and high variation of solutions with different initials for the large data set. Therefore, we need study efficient computational intelligence method to find the global optimal solution in data clustering problem within limited computational time. The objective of this paper is to propose a combined artificial bee colony (CABC) with K-means for initialization and finalization to find optimal solution that is effective on data clustering optimization problem. The artificial bee colony (ABC) is an algorithm motivated by the intelligent behavior exhibited by honeybees when searching for food. The performance of ABC is better than or similar to other population-based algorithms with the added advantage of employing fewer control parameters. Our proposed CABC method is able to provide near optimal solution within reasonable time to balance the converged and diversified searches. In this paper, the experiment and analysis of clustering problems demonstrate that CABC is a competitive approach comparing to previous partitioning approaches in satisfactory results with respect to solution quality. We validate the performance of CABC using Iris, Wine, Glass, Vowel, and Cloud UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KABCK (K-means+ABC+K-means) is better than ABCK (ABC+K-means), KABC (K-means+ABC), ABC, and K-means in our simulations.

A Simple Tandem Method for Clustering of Multimodal Dataset

  • Cho C.;Lee J.W.;Lee J.W.
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.729-733
    • /
    • 2003
  • The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.

  • PDF

Image Contrast Enhancement Technique Using Clustering Algorithm (클러스터링 알고리듬을 이용한 영상 대비 향상 기법)

  • Kim, Nam-Jin;Kim, Yong-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.310-315
    • /
    • 2004
  • Image taken in the night can be low-contrast images because of poor environment and image transmission. We propose an algorithm that improves the acquired low-contrast image. MPEG-2 separates chrominance and illuminance, and compresses respectively because human vision is more sensitive to luminance. We extracted illumination and used K-means algorithm to find a proper crossover point automatically. We used K-means algorithm in the viewpoint that the problem of crossover point selection can be considered as the two-category classification problem. We divided an image into two subimages using the crossover point, and applied the histogram equalization method respectively. We used the index of fuzziness to evaluate the degree of improvement. We compare the results of the proposed method with those of other methods.

Improved Classification Algorithm using Extended Fuzzy Clustering and Maximum Likelihood Method

  • Jeon Young-Joon;Kim Jin-Il
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.447-450
    • /
    • 2004
  • This paper proposes remotely sensed image classification method by fuzzy c-means clustering algorithm using average intra-cluster distance. The average intra-cluster distance acquires an average of the vector set belong to each cluster and proportionates to its size and density. We perform classification according to pixel's membership grade by cluster center of fuzzy c-means clustering using the mean-values of training data about each class. Fuzzy c-means algorithm considered membership degree for inter-cluster of each class. And then, we validate degree of overlap between clusters. A pixel which has a high degree of overlap applies to the maximum likelihood classification method. Finally, we decide category by comparing with fuzzy membership degree and likelihood rate. The proposed method is applied to IKONOS remote sensing satellite image for the verifying test.

  • PDF