• Title/Summary/Keyword: index clustering

Search Result 323, Processing Time 0.025 seconds

VS-FCM: Validity-guided Spatial Fuzzy c-Means Clustering for Image Segmentation

  • Kang, Bo-Yeong;Kim, Dae-Won
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.1
    • /
    • pp.89-93
    • /
    • 2010
  • In this paper a new fuzzy clustering approach to the color clustering problem has been proposed. To deal with the limitations of the traditional FCM algorithm, we propose a spatial homogeneity-based FCM algorithm. Moreover, the cluster validity index is employed to automatically determine the number of clusters for a given image. We refer to this method as VS-FCM algorithm. The effectiveness of the proposed method is demonstrated through various clustering examples.

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

A Study of Expanded Severity Index of Voltage Sag Using Fuzzy Clusterin (Fuzzy Clustering을 이용한 순간전압강하(Voltage Sag)의 확장된 심각도 지수(Expanded Severity Index) 연구)

  • Oh, Won-Wook;Kim, Yong-Su
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2011.01a
    • /
    • pp.81-84
    • /
    • 2011
  • 본 논문은 전압 이벤트 현상 중 순간전압강하(Sag) 현상에 초점을 맞추었다. Sag 현상의 심각한 정도를 표현하는 심각도(Voltage Sag Severity) 지수는 동일 지속시간에 대한 임계치와의 비로 표현하였다. 제안하는 확장된 심각도(Expanded Severity) 지수는 sag현상의 분포에 따른 일시반복성의 정보를 표현하였다. 기존의 임계치를 표현하는 ITIC curve를 기반으로 된 심각도와 sag 현상이 발생하는 지속시간-전압 그래프의 분포를 fuzzy clustering을 통하여 medoid를 측정하고, medoid의 심각도와 실제 임계치에 근접한 sag 지점의 심각도를 계산하여 비교하였다. 확장된 심각도 지수는 심각도가 높은 현상들과의 연계성을 나타내는 지수로 심각한 정도의 수치 정보 이외에 일시적인 현상인지 지속 반복적인 현상인지를 0과 1사이의 수치로 표현하였고, 실험을 통하여 입증하였다.

  • PDF

Group Search Optimization Data Clustering Using Silhouette (실루엣을 적용한 그룹탐색 최적화 데이터클러스터링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Bum-Soo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.42 no.3
    • /
    • pp.25-34
    • /
    • 2017
  • K-means is a popular and efficient data clustering method that only uses intra-cluster distance to establish a valid index with a previously fixed number of clusters. K-means is useless without a suitable number of clusters for unsupervised data. This paper aimsto propose the Group Search Optimization (GSO) using Silhouette to find the optimal data clustering solution with a number of clusters for unsupervised data. Silhouette can be used as valid index to decide the number of clusters and optimal solution by simultaneously considering intra- and inter-cluster distances. The performance of GSO using Silhouette is validated through several experiment and analysis of data sets.

Development of an unsupervised learning-based ESG evaluation process for Korean public institutions without label annotation

  • Do Hyeok Yoo;SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.5
    • /
    • pp.155-164
    • /
    • 2024
  • This study proposes an unsupervised learning-based clustering model to estimate the ESG ratings of domestic public institutions. To achieve this, the optimal number of clusters was determined by comparing spectral clustering and k-means clustering. These results are guaranteed by calculating the Davies-Bouldin Index (DBI), a model performance index. The DBI values were 0.734 for spectral clustering and 1.715 for k-means clustering, indicating lower values showed better performance. Thus, the superiority of spectral clustering was confirmed. Furthermore, T-test and ANOVA were used to reveal statistically significant differences between ESG non-financial data, and correlation coefficients were used to confirm the relationships between ESG indicators. Based on these results, this study suggests the possibility of estimating the ESG performance ranking of each public institution without existing ESG ratings. This is achieved by calculating the optimal number of clusters, and then determining the sum of averages of the ESG data within each cluster. Therefore, the proposed model can be employed to evaluate the ESG ratings of various domestic public institutions, and it is expected to be useful in domestic sustainable management practice and performance management.

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

Variable Selection and Outlier Detection for Automated K-means Clustering

  • Kim, Sung-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.1
    • /
    • pp.55-67
    • /
    • 2015
  • An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.

Bulk Insertion Method for R-tree using Seeded Clustering (R-tree에서 Seeded 클러스터링을 이용한 다량 삽입)

  • 이태원;문봉기;이석호
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.30-38
    • /
    • 2004
  • In many scientific and commercial applications such as Earth Observation System (EOSDIS) and mobile Phone services tracking a large number of clients, it is a daunting task to archive and index ever increasing volume of complex data that are continuously added to databases. To efficiently manage multidimensional data in scientific and data warehousing environments, R-tree based index structures have been widely used. In this paper, we propose a scalable technique called seeded clustering that allows us to maintain R-tree indexes by bulk insertion while keeping pace with high data arrival rates. Our approach uses a seed tree, which is copied from the top k levels of a target R-tree, to classify input data objects into clusters. We then build an R-tree for each of the clusters and insert the input R-trees into the target R-tree in bulk one at a time. We present detailed algorithms for the seeded clustering and bulk insertion as well as the results from our extensive experimental study. The experimental results show that the bulk insertion by seeded clustering outperforms the previously known methods in terms of insertion cost and the quality of target R-trees measured by their query performance.

A Novel Cluster Validation Index (새로운 클러스터 평가 지표)

  • Seo Suk. T.;Son Seo. H.;Lee In. G.;Jeong Hye. C.;Kwon Soon. H.
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.171-174
    • /
    • 2005
  • 기존의 클러스터 평가 지표(cluster validation index)는 클러스터의 개수가 커질수록 클러스터 평가 지표 값이 단조 감소하는 경향을 보인다. 최근에 이러한 단점을 보완하는 새로운 클러스터 평가 지표가 본 논문 저자중의 하나에 의해 제안되었으나, over-clustering의 단점 을 지니고 있다. 본 논문에서는, 클러스터 평가 지표 값이 단조 감소 및 over-clustering을 방지할 수 있는 새로운 클러스터 평가 지표를 제안하고, 여러 가지 예제를 통하여 새롭게 제안된 평가 지표의 타당성을 보인다.

  • PDF