• Title/Summary/Keyword: K-means 클러스터링 알고리즘

Search Result 217, Processing Time 0.022 seconds

The Document Clustering using Multi-Objective Genetic Algorithms (다목적 유전자 알고리즘을 이용한문서 클러스터링)

  • Lee, Jung-Song;Park, Soon-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.2
    • /
    • pp.57-64
    • /
    • 2012
  • In this paper, the multi-objective genetic algorithm is proposed for the document clustering which is important in the text mining field. The most important function in the document clustering algorithm is to group the similar documents in a corpus. So far, the k-means clustering and genetic algorithms are much in progress in this field. However, the k-means clustering depends too much on the initial centroid, the genetic algorithm has the disadvantage of coming off in the local optimal value easily according to the fitness function. In this paper, the multi-objective genetic algorithm is applied to the document clustering in order to complement these disadvantages while its accuracy is analyzed and compared to the existing algorithms. In our experimental results, the multi-objective genetic algorithm introduced in this paper shows the accuracy improvement which is superior to the k-means clustering(about 20 %) and the general genetic algorithm (about 17 %) for the document clustering.

An Implementation of K-Means Algorithm Improving Cluster Centroids Decision Methodologies (클러스터 중심 결정 방법을 개선한 K-Means 알고리즘의 구현)

  • Lee Shin-Won;Oh HyungJin;An Dong-Un;Jeong Seong-Jong
    • The KIPS Transactions:PartB
    • /
    • v.11B no.7 s.96
    • /
    • pp.867-874
    • /
    • 2004
  • K-Means algorithm is a non-hierarchical (plat) and reassignment techniques and iterates algorithm steps on the basis of K cluster centroids until the clustering results converge into K clusters. In its nature, K-Means algorithm has characteristics which make different results depending on the initial and new centroids. In this paper, we propose the modified K-Means algorithm which improves the initial and new centroids decision methodologies. By evaluating the performance of two algorithms using the 16 weighting scheme of SMART system, the modified algorithm showed $20{\%}$ better results on recall and F-measure than those of K-Means algorithm, and the document clustering results are quite improved.

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

K-means Clustering Method according to Documentation Numbers (문서 수에 따른 가중치를 적용한 K-means 문서 클러스터링)

  • 조시성;안동언;정성종;이신원
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1557-1560
    • /
    • 2003
  • 본 논문에서는 이 문서 클러스터링 방법 중 계층적 방법인 Kmeans 클러스터링 알고리즘을 이용하여 문서를 클러스터링 하고자 한다. 기존의 Kmeans 클러스터링 알고리즘은 문서의 수가 많을 경우 하나의 클러스터링에 너무 많은 문서들이 할당되는 문제점이 있다. 이 치우침을 완화하고자 각 클러스터링에 할당된 문서 수에 따라서 문서에 가중치를 부여한 후 다시 클러스터링을 하는 방법을 제안하였다. 실험 결과는 정확률, 재현율을 결합한 조화 평균(F-measure)을 사용하여 평가하였으며 기존 알고리즘보다 9%이상의 성능 향상을 나타냈다.

  • PDF

Selection of Cluster Hierarchy Depth and Initial Centroids in Hierarchical Clustering using K-Means Algorithm (K-Means 알고리즘을 이용한 계층적 클러스터링에서 클러스터 계층 깊이와 초기값 선정)

  • Lee, Shin-Won;An, Dong-Un;Chong, Sung-Jong
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.4 s.54
    • /
    • pp.173-185
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. In this paper, Condor system using K-Means algorithm Compares with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

An Introduction of Two-Step K-means Clustering Applied to Microarray Data (마이크로 어레이 데이터에 적용된 2단계 K-means 클러스터링의 소개)

  • Park, Dae-Hun;Kim, Yeon-Tae;Kim, Seong-Sin;Lee, Chun-Hwan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.83-86
    • /
    • 2006
  • 많은 유전자 정보와 그 부산물은 많은 방법을 통해 연구되어 왔다. DNA 마이크로어레이 기술의 사용은 많은 데이터를 가져왔으며, 이렇게 얻은 데이터는 기존의 연구 방법으로는 분석하기 힘들다. 본 눈문에서는 많은 양의 데이터를 처리할 수 있게 하기 위하여 K-means 클러스터링 알고리즘을 이용한 분할 클러스터링을 제안하였다. 제안한 방법을 쌀 유전자로부터 나온 마이크로어레이 데이터에 적용함으로써 제안된 클러스터링 방법의 유용성을 검증하였으며, 기존의 K-means 클러스터링 알고리즘을 적용한 결과와 비교함으로써 제안된 알고리즘의 우수성을 확인 할 수 있었다.

  • PDF

An Introduction of Two-Step K-means Clustering Applied to Microarray Data (마이크로 어레이 데이터에 적용된 2단계 K-means 클러스터링의 소개)

  • Park, Dae-Hoon;Kim, Youn-Tae;Kim, Sung-Shin;Lee, Choon-Hwan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.2
    • /
    • pp.167-172
    • /
    • 2007
  • Long gene sequences and their products have been studied by many methods. The use of DNA(Deoxyribonucleic acid) microarray technology has resulted in an enormous amount of data, which has been difficult to analyze using typical research methods. This paper proposes that mass data be analyzed using division clustering with the K-means clustering algorithm. To demonstrate the superiority of the proposed method, it was used to analyze the microarray data from rice DNA. The results were compared to those of the existing K-meansmethod establishing that the proposed method is more useful in spite of the effective reduction of performance time.

Efficient K-means Clustering for High-dimensional Large Data (고차원 대규모 데이터를 위한 효율적인 K-means 클러스터링)

  • Yoon, Tae-Sik;Shim, Kyu-Seok
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.33-36
    • /
    • 2011
  • 클러스터링은 데이터 포인트들을 그룹으로 묶어 데이터를 분석하는데 유용하다. 특히 K-means는 가장 널리 쓰이는 클러스터링 알고리즘으로 k개의 군집(Cluster)을 찾는다. 본 논문에서는 기존의 K-means 알고리즘과 비교해 고차원 대규모데이터에 대해서 효율적으로 동작하는 K-means 알고리즘을 제안한다. 제안된 알고리즘은 기존의 알고리즘에서와 같이 거리 정보를 이용해 불필요한 계산을 줄여나가며 또한 움직임 없는 군집들을 계산에서 제외하여 수행시간을 단축한다. 제안된 알고리즘은 기존의 관련연구에서 제안된 알고리즘에 비해 공간을 적게 쓰면서 동시에 빠르다. 실제 고차원 데이터 실험을 통해서 제안된 알고리즘의 효율성을 보였다.

Clustering-based Collaborative Filtering Using Genetic Algorithms (유전자 알고리즘을 이용한 클러스터링 기반 협력필터링)

  • Lee, Soojung
    • Journal of Creative Information Culture
    • /
    • v.4 no.3
    • /
    • pp.221-230
    • /
    • 2018
  • Collaborative filtering technique is a major method of recommender systems and has been successfully implemented and serviced in real commercial online systems. However, this technique has several inherent drawbacks, such as data sparsity, cold-start, and scalability problem. Clustering-based collaborative filtering has been studied in order to handle scalability problem. This study suggests a collaborative filtering system which utilizes genetic algorithms to improve shortcomings of K-means algorithm, one of the widely used clustering techniques. Moreover, different from the previous studies that have targeted for optimized clustering results, the proposed method targets the optimization of performance of the collaborative filtering system using the clustering results, which practically can enhance the system performance.

An Enhanced Spatial Fuzzy C-Means Algorithm for Image Segmentation (영상 분할을 위한 개선된 공간적 퍼지 클러스터링 알고리즘)

  • Truong, Tung X.;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.49-57
    • /
    • 2012
  • Conventional fuzzy c-means (FCM) algorithms have achieved a good clustering performance. However, they do not fully utilize the spatial information in the image and this results in lower clustering performance for images that have low contrast, vague boundaries, and noises. To overcome this issue, we propose an enhanced spatial fuzzy c-means (ESFCM) algorithm that takes into account the influence of neighboring pixels on the center pixel by assigning weights to the neighbors in a $3{\times}3$ square window. To evaluate between the proposed ESFCM and various FCM based segmentation algorithms, we utilized clustering validity functions such as partition coefficient ($V_{pc}$), partition entropy ($V_{pe}$), and Xie-Bdni function ($V_{xb}$). Experimental results show that the proposed ESFCM outperforms other FCM based algorithms in terms of clustering validity functions.