• Title/Summary/Keyword: K-means cluster

Search Result 615, Processing Time 0.028 seconds

Fast K-Means Clustering Algorithm using Prediction Data (예측 데이터를 이용한 빠른 K-Means 알고리즘)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.106-114
    • /
    • 2009
  • In this paper we proposed a fast method for a K-Means Clustering algorithm. The main characteristic of this method is that it uses precalculated data which possibility of change is high in order to speed up the algorithm. When calculating distance to cluster centre at each stage to assign nearest prototype in the clustering algorithm, it could reduce overall computation time by selecting only those data with possibility of change in cluster is high. Calculation time is reduced by using the distance information produced by K-Means algorithm when computing expected input data whose cluster may change, and by using such distance information the algorithm could be less affected by the number of dimensions. The proposed method was compared with original K-Means method - Lloyd's and the improved method KMHybrid. We show that our proposed method significantly outperforms in computation speed than Lloyd's and KMHybrid when using large size data which has large amount of data, great many dimensions and large number of clusters.

Analysis of Brokerage Commission Policy based on the Potential Customer Value (고객의 잠재가치에 기반한 증권사 수수료 정책 연구)

  • Shin, Hyung-Won;Sohn, So-Young
    • IE interfaces
    • /
    • v.16 no.spc
    • /
    • pp.123-126
    • /
    • 2003
  • In this paper, we use three cluster algorithms (K-means, Self-Organizing Map, and Fuzzy K-means) to find proper graded stock market brokerage commission rates based on the cumulative transactions on both stock exchange market and HTS (Home Trading System). Stock trading investors for both modes are classified in terms of the total transaction as well as the corresponding mode of investment, respectively. Empirical analysis results indicated that fuzzy K-means cluster analysis is the best fit for the segmentation of customers of both transaction modes in terms of robustness. We then propose the rules for three grouping of customers based on decision tree and apply different brokerage commission to be 0.4%, 0.45%, and 0.5% for exchange market while 0.06%, 0.1%, 0.18% for HTS.

An Improved Hybrid Canopy-Fuzzy C-Means Clustering Algorithm Based on MapReduce Model

  • Dai, Wei;Yu, Changjun;Jiang, Zilong
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2016
  • The fuzzy c-means (FCM) is a frequently utilized algorithm at present. Yet, the clustering quality and convergence rate of FCM are determined by the initial cluster centers, and so an improved FCM algorithm based on canopy cluster concept to quickly analyze the dataset has been proposed. Taking advantage of the canopy algorithm for its rapid acquisition of cluster centers, this algorithm regards the cluster results of canopy as the input. In this way, the convergence rate of the FCM algorithm is accelerated. Meanwhile, the MapReduce scheme of the proposed FCM algorithm is designed in a cloud environment. Experimental results demonstrate the hybrid canopy-FCM clustering algorithm processed by MapReduce be endowed with better clustering quality and higher operation speed.

Double K-Means Clustering (이중 K-평균 군집화)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.343-352
    • /
    • 2000
  • In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.

  • PDF

Assessment of Premature Ventricular Contraction Arrhythmia by K-means Clustering Algorithm

  • Kim, Kyeong-Seop
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.5
    • /
    • pp.65-72
    • /
    • 2017
  • Premature Ventricular Contraction(PVC) arrhythmia is most common abnormal-heart rhythm that may increase mortal risk of a cardiac patient. Thus, it is very important issue to identify the specular portraits of PVC pattern especially from the patient. In this paper, we propose a new method to extract the characteristics of PVC pattern by applying K-means machine learning algorithm on Heart Rate Variability depicted in Poinecare plot. For the quantitative analysis to distinguish the trend of cluster patterns between normal sinus rhythm and PVC beat, the Euclidean distance measure was sought between the clusters. Experimental simulations on MIT-BIH arrhythmia database draw the fact that the distance measure on the cluster is valid for differentiating the pattern-traits of PVC beats. Therefore, we proposed a method that can offer the simple remedy to identify the attributes of PVC beats in terms of K-means clusters especially in the long-period Electrocardiogram(ECG).

A Type 2 Fuzzy C-means (제2종 퍼지 집합을 이용한 퍼지 C-means)

  • Hwang, Cheul;Rhee, Fransk Chung-Hoon
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.05a
    • /
    • pp.16-19
    • /
    • 2001
  • This paper presents a type-2 fuzzy C-means (FCM) algorithm that is an extension of the conventional fuzzy C-means algorithm. In our proposed method, the membership values for each pattern are extended as type-2 fuzzy memberships by assigning membership grades to the type-1 memberships. In doing so, cluster centers that are estimated by type-2 memberships may converge to a more desirable location than cluster centers obtained by a type-1 FCM method in the presence of noise.

  • PDF

Environmental Survey Data Modeling Using K-means Clustering Techniques

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.3
    • /
    • pp.557-566
    • /
    • 2005
  • Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering Is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.

  • PDF

Environmental Survey Data Modeling using K-means Clustering Techniques

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.10a
    • /
    • pp.77-86
    • /
    • 2004
  • Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.

  • PDF

Pattern Analysis of Volume of Basal Ganglia Structures in Patients with First-Episode Psychosis (초발 정신병 환자에서 기저핵 구조물 부피의 패턴분석)

  • Min, Sally;Lee, Tae Young;Kwak, Yoobin;Kwon, Jun Soo
    • Korean Journal of Biological Psychiatry
    • /
    • v.25 no.2
    • /
    • pp.38-43
    • /
    • 2018
  • Objectives Dopamine dysregulation has been regarded as one of the core pathologies in patients with schizophrenia. Since dopamine synthesis capacity has found to be inconsistent in patients with schizophrenia, current classification of patients based on clinical symptoms cannot reflect the neurochemical heterogeneity of the disease. Here we performed new subtyping of patients with first-episode psychosis (FEP) through biotype-based cluster analysis. We specifically suggested basal ganglia structural changes as a biotype, which deeply involves in the dopaminergic circuit. Methods Forty FEP and 40 demographically matched healthy participants underwent 3T T1 MRI. Whole brain parcellation was conducted, and volumes of total 6 regions of basal ganglia have been extracted as features for cluster analysis. We used K-means clustering, and external validation was conducted with Positive and Negative Syndrome Scale (PANSS). Results K-means clustering divided 40 FEP subjects into 2 clusters. Cluster 1 (n = 25) showed substantial volume decrease in 4 regions of basal ganglia compared to Cluster 2 (n = 15). Cluster 1 showed higher positive scales of PANSS compared with Cluster 2 (F = 2.333, p = 0.025). Compared to healthy controls, Cluster 1 showed smaller volumes in 4 regions, whereas Cluster 2 showed larger volumes in 3 regions. Conclusions Two subgroups have been found by cluster analysis, which showed a distinct difference in volume patterns of basal ganglia structures and positive symptom severity. The result possibly reflects the neurobiological heterogeneity of schizophrenia. Thus, the current study supports the importance of paradigm shift toward biotype-based diagnosis, instead of phenotype, for future precision psychiatry.

  • PDF

A Study on Effective Selection of University Lecture Evaluation (대학 강의평가에서 문항 추출에 관한 연구)

  • Hwang Se-Myung;Kim In-Taek
    • Journal of Engineering Education Research
    • /
    • v.8 no.1
    • /
    • pp.31-45
    • /
    • 2005
  • In this paper, selecting survey items was performed using three clustering methods: factor analysis, fuzzy c-Means algorithm and cluster analysis. The methods were used to extract key items from various questionnaires. The key item represents several similar questionnaires that form a cluster. Test survey was made of 120 items obtained from several surveys and it was answered by 646 students from 4 universities. Each item contains 6 choices. Applying the clustering method chose 25 items which is reduced from the original 120 items. The results yielded by three methods are very similar.