• Title/Summary/Keyword: Cluster validity

Search Result 161, Processing Time 0.03 seconds

A Cluster validity Index for Fuzzy Clustering

  • Lee, Haiyoung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.9 no.6
    • /
    • pp.621-626
    • /
    • 1999
  • In this paper a new cluster validation index which is heuristic but able to eliminate the monotonically decreasing tendency occurring in which the number of cluster c gets very large and close to the number of data points n is proposed. We review the FCM algorithm and some conventional cluster validity criteria discuss on the limiting behavior of the proposed validity index and provide some numerical examples showing the effectiveness of the proposed cluster validity index.

  • PDF

Fast Search Algorithm for Determining the Optimal Number of Clusters using Cluster Validity Index (클러스터 타당성 평가기준을 이용한 최적의 클러스터 수 결정을 위한 고속 탐색 알고리즘)

  • Lee, Sang-Wook
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.9
    • /
    • pp.80-89
    • /
    • 2009
  • A fast and efficient search algorithm to determine an optimal number of clusters in clustering algorithms is presented. The method is based on cluster validity index which is a measure for clustering optimality. As the clustering procedure progresses and reaches an optimal cluster configuration, the cluster validity index is expected to be minimized or maximized. In this Paper, a fast non-exhaustive search method for finding the optimal number of clusters is designed and shown to work well in clustering. The proposed algorithm is implemented with the k-mean++ algorithm as underlying clustering techniques using CB and PBM as a cluster validity index. Experimental results show that the proposed method provides the computation time efficiency without loss of accuracy on several artificial and real-life data sets.

Nearest neighbor and validity-based clustering

  • Son, Seo H.;Seo, Suk T.;Kwon, Soon H.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.3
    • /
    • pp.337-340
    • /
    • 2004
  • The clustering problem can be formulated as the problem to find the number of clusters and a partition matrix from a given data set using the iterative or non-iterative algorithms. The author proposes a nearest neighbor and validity-based clustering algorithm where each data point in the data set is linked with the nearest neighbor data point to form initial clusters and then a cluster in the initial clusters is linked with the nearest neighbor cluster to form a new cluster. The linking between clusters is continued until no more linking is possible. An optimal set of clusters is identified by using the conventional cluster validity index. Experimental results on well-known data sets are provided to show the effectiveness of the proposed clustering algorithm.

A new cluster validity index based on connectivity in self-organizing map (자기조직화지도에서 연결강도에 기반한 새로운 군집타당성지수)

  • Kim, Sangmin;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.591-601
    • /
    • 2020
  • The self-organizing map (SOM) is a unsupervised learning method projecting high-dimensional data into low-dimensional nodes. It can visualize data in 2 or 3 dimensional space using the nodes and it is available to explore characteristics of data through the nodes. To understand the structure of data, cluster analysis is often used for nodes obtained from SOM. In cluster analysis, the optimal number of clusters is one of important issues. To help to determine it, various cluster validity indexes have been developed and they can be applied to clustering outcomes for nodes from SOM. However, while SOM has an advantage in that it reflects the topological properties of original data in the low-dimensional space, these indexes do not consider it. Thus, we propose a new cluster validity index for SOM based on connectivity between nodes which considers topological properties of data. The performance of the proposed index is evaluated through simulations and it is compared with various existing cluster validity indexes.

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

An Empirical Study on the Korean Photonics Industrial Cluster Effects : Focusing on Absorptive Capacity and Corporate Performance (광주 광산업 클러스터 효과에 관한 연구 : 조직의 흡수역량과 기업성과에 미치는 영향에 관한 실증연구)

  • Bae, Jae-Kwon;Koo, Chul-Mo
    • Journal of Information Technology Applications and Management
    • /
    • v.19 no.2
    • /
    • pp.117-134
    • /
    • 2012
  • Cluster industries are geographically concentrated and inter-connected by the flow of goods and services, which is stronger than the flow linking them to the rest of the economy. Photonics industries are one of the fastest growing high-tech industries in the world today. Especially, the city of Gwangju(South Korea) industrial cluster, a specialized complex in photonics industry, produced remarkable results in developing high-quality technologies since it launched the cluster program in 2005. Gwangju photonics industrial cluster will be ranked top level of the world photonics industry. In this sense, this study is aimed at proposing a new research model in which corporate performance influence factors of photonics industrial cluster (i.e., business environment, cooperative relationship, and industry-university-research institute partnership) affect absorptive capacity positively, leading to corporate performance eventually. This study developed a research model to explain the Korean photonics industrial cluster effects, and collected 91 survey responses from photonics based company managers in industrial cluster complex. To prove the validity of the proposed research model, PLS analysis is applied with valid 91 questionnaires. By employing PLS technique, the measurement reliability and validity of research variables are tested and the path analysis is conducted to do the hypothesis testing. In brief, the finding of this study suggests that corporate performance influence factors of photonics industrial cluster affect absorptive capacity positively, and corporate performance as well.

An efficient heuristics for determining the optimal number of cluster using clustering balance (클러스터링 균형을 사용하여 최적의 클러스터 개수를 결정하기 위한 효율적인 휴리스틱)

  • Lee, Sangwook
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.792-796
    • /
    • 2009
  • Determining the optimal number of cluster is an important issue in research area of data clustering. It is choosing the cluster validity method and finding the cluster number where it optimizes the cluster validity. In this paper, an efficient heuristic for determining optimal number of cluster using clustering balance is proposed. The experimental results using k-means at artificial and real-life data set show that proposed algorithm is excellent in aspect of time efficiency.

  • PDF

An Optimal Cluster Analysis Method with Fuzzy Performance Measures (퍼지 성능 측정자를 결합한 최적 클러스터 분석방법)

  • 이현숙;오경환
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.6 no.3
    • /
    • pp.81-88
    • /
    • 1996
  • Cluster analysis is based on partitioning a collection of data points into a number of clusters, where the data points in side a cluster have a certain degree of similarity and it is a fundamental process of data analysis. So, it has been playing an important role in solving many problems in pattern recognition and image processing. For these many clustering algorithms depending on distance criteria have been developed and fuzzy set theory has been introduced to reflect the description of real data, where boundaries might be fuzzy. If fuzzy cluster analysis is tomake a significant contribution to engineering applications, much more attention must be paid to fundamental questions of cluster validity problem which is how well it has identified the structure that is present in the data. Several validity functionals such as partition coefficient, claasification entropy and proportion exponent, have been used for measuring validity mathematically. But the issue of cluster validity involves complex aspects, it is difficult to measure it with one measuring function as the conventional study. In this paper, we propose four performance indices and the way to measure the quality of clustering formed by given learning strategy.

  • PDF

Phonetically Based Consonant Cluster Acquisition Model (음성학을 토대로 한 자음군 습득 모형)

  • Kwon, Bo-Young
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.109-113
    • /
    • 2007
  • Second language learners' variable degree of production difficulty according to the cluster type has previously been accounted for in terms of sonority distance between adjacent segments. As an alternative to this previous model, I propose a Phonetically Based Consonant Cluster Acquisition Model (PCCAM) in which consonant cluster markedness is defined based on the articulatory and perceptual factors associated with each consonant sequence. The validity of PCCAM has been tested through Korean speakers' production of English consonant clusters.

  • PDF

A Cluster Validity Index Using Overlap and Separation Measures Between Fuzzy Clusters (클러스터간 중첩성과 분리성을 이용한 퍼지 분할의 평가 기법)

  • Kim, Dae-Won;Lee, Kwang-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.455-460
    • /
    • 2003
  • A new cluster validity index is proposed that determines the optimal partition and optimal number of clusters for fuzzy partitions obtained from the fuzzy c-means algorithm. The proposed validity index exploits an overlap measure and a separation measure between clusters. The overlap measure is obtained by computing an inter-cluster overlap. The separation measure is obtained by computing a distance between fuzzy clusters. A good fuzzy partition is expected to have a low degree of overlap and a larger separation distance. Testing of the proposed index and nine previously formulated indexes on well-known data sets showed the superior effectiveness and reliability of the proposed index in comparison to other indexes.