• Title/Summary/Keyword: K-means clustering technique

Search Result 149, Processing Time 0.026 seconds

Two Phase Hierarchical Clustering Algorithm for Group Formation in Data Mining (데이터 마이닝에서 그룹 세분화를 위한 2단계 계층적 글러스터링 알고리듬)

  • 황인수
    • Korean Management Science Review
    • /
    • v.19 no.1
    • /
    • pp.189-196
    • /
    • 2002
  • Data clustering is often one of the first steps in data mining analysis. It Identifies groups of related objects that can be used as a starling point for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segmentation. This paper Purpose to present the development of two phase hierarchical clustering algorithm for group formation. Applications of the algorithm for product-customer group formation in customer relationahip management are also discussed. As a result of computer simulations, suggested algorithm outperforms single link method and k-means clustering.

Kernel Pattern Recognition using K-means Clustering Method (K-평균 군집방법을 이요한 가중커널분류기)

  • 백장선;심정욱
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.447-455
    • /
    • 2000
  • We propose a weighted kernel pattern recognition method using the K -means clustering algorithm to reduce computation and storage required for the full kernel classifier. This technique finds a set of reference vectors and weights which are used to approximate the kernel classifier. Since the hierarchical clustering method implemented in the 'Weighted Parzen Window (WP\V) classifier is not able to rearrange the proper clusters, we adopt the K -means algorithm to find reference vectors and weights from the more properly rearranged clusters \Ve find that the proposed method outperforms the \VP\V method for the repre~entativeness of the reference vectors and the data reduction.

  • PDF

Application of k-means Clustering for Association Rule Using Measure of Association

  • Lee, Keun-Woo;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.925-936
    • /
    • 2008
  • An association rule mining finds the relation among each items in massive volume database. In generating association rules, the researcher specifies the measurements randomly such as support, confidence and lift, and produces the rules. The rule is not produced if it is not suitable to the one any condition which is given value. For example, in case of a little small one than the value which a confidence value is specified but a support and lift's value is very high, this rule is meaningful rule. But association rule mining can not produce the meaningful rules in this case because it is not suitable to a given condition. Consequently, we creat insignificant error which is not selected to the meaningful rules. In this paper, we suggest clustering technique to association rule measures for finding effective association rules using measure of association.

  • PDF

Latent Semantic Indexing Analysis of K-Means Document Clustering for Changing Index Terms Weighting (색인어 가중치 부여 방법에 따른 K-Means 문서 클러스터링의 LSI 분석)

  • Oh, Hyung-Jin;Go, Ji-Hyun;An, Dong-Un;Park, Soon-Chul
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.735-742
    • /
    • 2003
  • In the information retrieval system, document clustering technique is to provide user convenience and visual effects by rearranging documents according to the specific topics from the retrieved ones. In this paper, we clustered documents using K-Means algorithm and present the effect of index terms weighting scheme on the document clustering. To verify the experiment, we applied Latent Semantic Indexing approach to illustrate the clustering results and analyzed the clustering results in 2-dimensional space. Experimental results showed that in case of applying local weighting, global weighting and normalization factor, the density of clustering is higher than those of similar or same weighting schemes in 2-dimensional space. Especially, the logarithm of local and global weighting is noticeable.

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

  • Kongwudhikunakorn, Supavit;Waiyamai, Kitsana
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.277-300
    • /
    • 2020
  • This paper presents a method for clustering short text documents, such as news headlines, social media statuses, or instant messages. Due to the characteristics of these documents, which are usually short and sparse, an appropriate technique is required to discover hidden knowledge. The objective of this paper is to identify the combination of document representation, document distance, and document clustering that yields the best clustering quality. Document representations are expanded by external knowledge sources represented by a Distributed Representation. To cluster documents, a K-means partitioning-based clustering technique is applied, where the similarities of documents are measured by word mover's distance. To validate the effectiveness of the proposed method, experiments were conducted to compare the clustering quality against several leading methods. The proposed method produced clusters of documents that resulted in higher precision, recall, F1-score, and adjusted Rand index for both real-world and standard data sets. Furthermore, manual inspection of the clustering results was conducted to observe the efficacy of the proposed method. The topics of each document cluster are undoubtedly reflected by members in the cluster.

A Study on the Gustafson-Kessel Clustering Algorithm in Power System Fault Identification

  • Abdullah, Amalina;Banmongkol, Channarong;Hoonchareon, Naebboon;Hidaka, Kunihiko
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.5
    • /
    • pp.1798-1804
    • /
    • 2017
  • This paper presents an approach of the Gustafson-Kessel (GK) clustering algorithm's performance in fault identification on power transmission lines. The clustering algorithm is incorporated in a scheme that uses hybrid intelligent technique to combine artificial neural network and a fuzzy inference system, known as adaptive neuro-fuzzy inference system (ANFIS). The scheme is used to identify the type of fault that occurs on a power transmission line, either single line to ground, double line, double line to ground or three phase. The scheme is also capable an analyzing the fault location without information on line parameters. The range of error estimation is within 0.10 to 0.85 relative to five values of fault resistances. This paper also presents the performance of the GK clustering algorithm compared to fuzzy clustering means (FCM), which is particularly implemented in structuring a data. Results show that the GK algorithm may be implemented in fault identification on power system transmission and performs better than FCM.

Differentially Private k-Means Clustering based on Dynamic Space Partitioning using a Quad-Tree (쿼드 트리를 이용한 동적 공간 분할 기반 차분 프라이버시 k-평균 클러스터링 알고리즘)

  • Goo, Hanjun;Jung, Woohwan;Oh, Seongwoong;Kwon, Suyong;Shim, Kyuseok
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.288-293
    • /
    • 2018
  • There have recently been several studies investigating how to apply a privacy preserving technique to publish data. Differential privacy can protect personal information regardless of an attacker's background knowledge by adding probabilistic noise to the original data. To perform differentially private k-means clustering, the existing algorithm builds a differentially private histogram and performs the k-means clustering. Since it constructs an equi-width histogram without considering the distribution of data, there are many buckets to which noise should be added. We propose a k-means clustering algorithm using a quad-tree that captures the distribution of data by using a small number of buckets. Our experiments show that the proposed algorithm shows better performance than the existing algorithm.

Privacy-Preserving K-means Clustering using Homomorphic Encryption in a Multiple Clients Environment (다중 클라이언트 환경에서 동형 암호를 이용한 프라이버시 보장형 K-평균 클러스터링)

  • Kwon, Hee-Yong;Im, Jong-Hyuk;Lee, Mun-Kyu
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.4
    • /
    • pp.7-17
    • /
    • 2019
  • Machine learning is one of the most accurate techniques to predict and analyze various phenomena. K-means clustering is a kind of machine learning technique that classifies given data into clusters of similar data. Because it is desirable to perform an analysis based on a lot of data for better performance, K-means clustering can be performed in a model with a server that calculates the centroids of the clusters, and a number of clients that provide data to server. However, this model has the problem that if the clients' data are associated with private information, the server can infringe clients' privacy. In this paper, to solve this problem in a model with a number of clients, we propose a privacy-preserving K-means clustering method that can perform machine learning, concealing private information using homomorphic encryption.

K-Means Clustering with Deep Learning for Fingerprint Class Type Prediction

  • Mukoya, Esther;Rimiru, Richard;Kimwele, Michael;Mashava, Destine
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.29-36
    • /
    • 2022
  • In deep learning classification tasks, most models frequently assume that all labels are available for the training datasets. As such strategies to learn new concepts from unlabeled datasets are scarce. In fingerprint classification tasks, most of the fingerprint datasets are labelled using the subject/individual and fingerprint datasets labelled with finger type classes are scarce. In this paper, authors have developed approaches of classifying fingerprint images using the majorly known fingerprint classes. Our study provides a flexible method to learn new classes of fingerprints. Our classifier model combines both the clustering technique and use of deep learning to cluster and hence label the fingerprint images into appropriate classes. The K means clustering strategy explores the label uncertainty and high-density regions from unlabeled data to be clustered. Using similarity index, five clusters are created. Deep learning is then used to train a model using a publicly known fingerprint dataset with known finger class types. A prediction technique is then employed to predict the classes of the clusters from the trained model. Our proposed model is better and has less computational costs in learning new classes and hence significantly saving on labelling costs of fingerprint images.

Adjustment of the Mean Field Rainfall Bias by Clustering Technique (레이더 자료의 군집화를 통한 Mean Field Rainfall Bias의 보정)

  • Kim, Young-Il;Kim, Tae-Soon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.8
    • /
    • pp.659-671
    • /
    • 2009
  • Fuzzy c-means clustering technique is applied to improve the accuracy of G/R ratio used for rainfall estimation by radar reflectivity. G/R ratio is computed by the ground rainfall records at AWS(Automatic Weather System) sites to the radar estimated rainfall from the reflectivity of Kwangduck Mt. radar station with 100km effective range. G/R ratio is calculated by two methods: the first one uses a single G/R ratio for the entire effective range and the other two different G/R ratio for two regions that is formed by clustering analysis, and absolute relative error and root mean squared error are employed for evaluating the accuracy of radar rainfall estimation from two G/R ratios. As a result, the radar rainfall estimated by two different G/R ratio from clustering analysis is more accurate than that by a single G/R ratio for the entire range.