• Title/Summary/Keyword: k-Means Clustering

Search Result 1,104, Processing Time 0.029 seconds

A Non-linear Variant of Global Clustering Using Kernel Methods (커널을 이용한 전역 클러스터링의 비선형화)

  • Heo, Gyeong-Yong;Kim, Seong-Hoon;Woo, Young-Woon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.4
    • /
    • pp.11-18
    • /
    • 2010
  • Fuzzy c-means (FCM) is a simple but efficient clustering algorithm using the concept of a fuzzy set that has been proved to be useful in many areas. There are, however, several well known problems with FCM, such as sensitivity to initialization, sensitivity to outliers, and limitation to convex clusters. In this paper, global fuzzy c-means (G-FCM) and kernel fuzzy c-means (K-FCM) are combined to form a non-linear variant of G-FCM, called kernel global fuzzy c-means (KG-FCM). G-FCM is a variant of FCM that uses an incremental seed selection method and is effective in alleviating sensitivity to initialization. There are several approaches to reduce the influence of noise and accommodate non-convex clusters, and K-FCM is one of them. K-FCM is used in this paper because it can easily be extended with different kernels. By combining G-FCM and K-FCM, KG-FCM can resolve the shortcomings mentioned above. The usefulness of the proposed method is demonstrated by experiments using artificial and real world data sets.

Tree-structured Clustering for Continuous Data (연속형 자료에 대한 나무형 군집화)

  • Huh Myung-Hoe;Yang Kyung-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.661-671
    • /
    • 2005
  • The aim of this study is to propose a clustering method, called tree-structured clustering, by recursively partitioning continuous multivariate dat a based on overall $R^2$ criterion with a practical node-splitting decision rule. The clustering method produces easily interpretable clustering rules of tree types with the variable selection function. In numerical examples (Fisher's iris data and a Telecom case), we note several differences between tree-structured clustering and K-means clustering.

Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders (합성곱 오토인코더 기반의 응집형 계층적 군집 분석)

  • Park, Nojin;Ko, Hanseok
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.1
    • /
    • pp.1-7
    • /
    • 2020
  • Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.

Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data

  • Do, Jin Hwan;Choi, Dong-Kug
    • Molecules and Cells
    • /
    • v.25 no.2
    • /
    • pp.279-288
    • /
    • 2008
  • The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

Nonparametric analysis of income distributions among different regions based on energy distance with applications to China Health and Nutrition Survey data

  • Ma, Zhihua;Xue, Yishu;Hu, Guanyu
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.57-67
    • /
    • 2019
  • Income distribution is a major concern in economic theory. In regional economics, it is often of interest to compare income distributions in different regions. Traditional methods often compare the income inequality of different regions by assuming parametric forms of the income distributions, or using summary statistics like the Gini coefficient. In this paper, we propose a nonparametric procedure to test for heterogeneity in income distributions among different regions, and a K-means clustering procedure for clustering income distributions based on energy distance. In simulation studies, it is shown that the energy distance based method has competitive results with other common methods in hypothesis testing, and the energy distance based clustering method performs well in the clustering problem. The proposed approaches are applied in analyzing data from China Health and Nutrition Survey 2011. The results indicate that there are significant differences among income distributions of the 12 provinces in the dataset. After applying a 4-means clustering algorithm, we obtained the clustering results of the income distributions in the 12 provinces.

Design of Incremental K-means Clustering-based Radial Basis Function Neural Networks Model (증분형 K-means 클러스터링 기반 방사형 기저함수 신경회로망 모델 설계)

  • Park, Sang-Beom;Lee, Seung-Cheol;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.5
    • /
    • pp.833-842
    • /
    • 2017
  • In this study, the design methodology of radial basis function neural networks based on incremental K-means clustering is introduced for learning and processing the big data. If there is a lot of dataset to be trained, general clustering may not learn dataset due to the lack of memory capacity. However, the on-line processing of big data could be effectively realized through the parameters operation of recursive least square estimation as well as the sequential operation of incremental clustering algorithm. Radial basis function neural networks consist of condition part, conclusion part and aggregation part. In the condition part, incremental K-means clustering algorithms is used tweights of the conclusion part are given as linear function and parameters are calculated using recursive least squareo get the center points of data and find the fitness using gaussian function as the activation function. Connection s estimation. In the aggregation part, a final output is obtained by center of gravity method. Using machine learning data, performance index are shown and compared with other models. Also, the performance of the incremental K-means clustering based-RBFNNs is carried out by using PSO. This study demonstrates that the proposed model shows the superiority of algorithmic design from the viewpoint of on-line processing for big data.

An Implementation of K-Means Algorithm Improving Cluster Centroids Decision Methodologies (클러스터 중심 결정 방법을 개선한 K-Means 알고리즘의 구현)

  • Lee Shin-Won;Oh HyungJin;An Dong-Un;Jeong Seong-Jong
    • The KIPS Transactions:PartB
    • /
    • v.11B no.7 s.96
    • /
    • pp.867-874
    • /
    • 2004
  • K-Means algorithm is a non-hierarchical (plat) and reassignment techniques and iterates algorithm steps on the basis of K cluster centroids until the clustering results converge into K clusters. In its nature, K-Means algorithm has characteristics which make different results depending on the initial and new centroids. In this paper, we propose the modified K-Means algorithm which improves the initial and new centroids decision methodologies. By evaluating the performance of two algorithms using the 16 weighting scheme of SMART system, the modified algorithm showed $20{\%}$ better results on recall and F-measure than those of K-Means algorithm, and the document clustering results are quite improved.

Design of Radial Basis Function with the Aid of Fuzzy KNN and Conditional FCM (퍼지 kNN과 Conditional FCM을 이용한 퍼지 RBF의 설계)

  • Roh, Seok-Beon;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.6
    • /
    • pp.1223-1229
    • /
    • 2009
  • The performance of Radial Basis Function Neural Networks depends on setting up the Radial Basis Functions over the input space which are the important design procedure of Radial Basis Function Neural Networks. The existing method to initialize the location of the radial basis functions over the input space is to use the conditional fuzzy C-means clustering. However, the researchers which are interested in the conditional fuzzy C-means clustering cannot get as good modeling performance as they expect because the conditional fuzzy C-means clustering cannot project the information which is extracted over the output space into the input space. To compensate the above mentioned drawback of the conditional fuzzy C-means clustering, we apply a fuzzy K-nearest neighbors approach to project the auxiliary information defined over the output space into the input space without lose of the information.

An Edge Extraction Method Using K-means Clustering In Image (영상에서 K-means 군집화를 이용한 윤곽선 검출 기법)

  • Kim, Ga-On;Lee, Gang-Seong;Lee, Sang-Hun
    • Journal of Digital Convergence
    • /
    • v.12 no.11
    • /
    • pp.281-288
    • /
    • 2014
  • A method for edge detection using K-means clustering is proposed in this paper. The method is performed through there steps. Histogram equalizing is applied to the image for the uniformed intensity distribution. Pixels are clustered by K-means clustering technique. Then Sobel mask is applied to detect edges. Experiments showed that this method detected edges better than conventional method.

Customer Clustering Method Using Repeated Small-sized Clustering to improve the Classifying Ability of Typical Daily Load Profile (일일 대표 부하패턴의 분별력을 높이기 위한 반복적인 소규모 군집화를 이용한 고객 군집화 방법)

  • Kim, Young-Il;Song, Jae-Ju;Oh, Do-Eun;Jung, Nam-Joon;Yang, Il-Kwon
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.11
    • /
    • pp.2269-2274
    • /
    • 2009
  • Customer clustering method is used to make a TDLP (typical daily load profile) to estimate the quater hourly load profile of non-AMR (Automatic Meter Reading) customer. In this paper, repeated small-sized clustering method is supposed to improve the classifying ability of TDLP. K-means algorithm is well-known clustering technology of data mining. To reduce the local maxima of k-means algorithm, proposed method clusters average load profiles to small-sized clusters and selects the highest error rated cluster and clusters this to small-sized clusters repeatedly to minimize the local maxima.