• Title/Summary/Keyword: K-means clustering technique

Search Result 149, Processing Time 0.029 seconds

Spectral clustering: summary and recent research issues (스펙트럴 클러스터링 - 요약 및 최근 연구동향)

  • Jeong, Sanghun;Bae, Suhyeon;Kim, Choongrak
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.2
    • /
    • pp.115-122
    • /
    • 2020
  • K-means clustering uses a spherical or elliptical metric to group data points; however, it does not work well for non-convex data such as the concentric circles. Spectral clustering, based on graph theory, is a generalized and robust technique to deal with non-standard type of data such as non-convex data. Results obtained by spectral clustering often outperform traditional clustering such as K-means. In this paper, we review spectral clustering and show important issues in spectral clustering such as determining the number of clusters K, estimation of scale parameter in the adjacency of two points, and the dimension reduction technique in clustering high-dimensional data.

K-Means Clustering in the PCA Subspace using an Unified Measure (통합 측도를 사용한 주성분해석 부공간에서의 k-평균 군집화 방법)

  • Yoo, Jae-Hung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.4
    • /
    • pp.703-708
    • /
    • 2022
  • K-means clustering is a representative clustering technique. However, there is a limitation in not being able to integrate the performance evaluation scale and the method of determining the minimum number of clusters. In this paper, a method for numerically determining the minimum number of clusters is introduced. The explained variance is presented as an integrated measure. We propose that the k-means clustering method should be performed in the subspace of the PCA in order to simultaneously satisfy the minimum number of clusters and the threshold of the explained variance. It aims to present an explanation in principle why principal component analysis and k-means clustering are sequentially performed in pattern recognition and machine learning.

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y.;Lee Ki H.;Chung Sung S.
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.509-520
    • /
    • 2005
  • Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

A Simple Tandem Method for Clustering of Multimodal Dataset

  • Cho C.;Lee J.W.;Lee J.W.
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.729-733
    • /
    • 2003
  • The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.

  • PDF

COUNTING OF FLOWERS BASED ON K-MEANS CLUSTERING AND WATERSHED SEGMENTATION

  • PAN ZHAO;BYEONG-CHUN SHIN
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.27 no.2
    • /
    • pp.146-159
    • /
    • 2023
  • This paper proposes a hybrid algorithm combining K-means clustering and watershed algorithms for flower segmentation and counting. We use the K-means clustering algorithm to obtain the main colors in a complex background according to the cluster centers and then take a color space transformation to extract pixel values for the hue, saturation, and value of flower color. Next, we apply the threshold segmentation technique to segment flowers precisely and obtain the binary image of flowers. Based on this, we take the Euclidean distance transformation to obtain the distance map and apply it to find the local maxima of the connected components. Afterward, the proposed algorithm adaptively determines a minimum distance between each peak and apply it to label connected components using the watershed segmentation with eight-connectivity. On a dataset of 30 images, the test results reveal that the proposed method is more efficient and precise for the counting of overlapped flowers ignoring the degree of overlap, number of overlap, and relatively irregular shape.

An Edge Extraction Method Using K-means Clustering In Image (영상에서 K-means 군집화를 이용한 윤곽선 검출 기법)

  • Kim, Ga-On;Lee, Gang-Seong;Lee, Sang-Hun
    • Journal of Digital Convergence
    • /
    • v.12 no.11
    • /
    • pp.281-288
    • /
    • 2014
  • A method for edge detection using K-means clustering is proposed in this paper. The method is performed through there steps. Histogram equalizing is applied to the image for the uniformed intensity distribution. Pixels are clustered by K-means clustering technique. Then Sobel mask is applied to detect edges. Experiments showed that this method detected edges better than conventional method.

Analysis of Document Clustering Varing Cluster Centroid Decisions (클러스터 중심 결정 방법에 따른 문서 클러스터링 성능 분석)

  • 오형진;변동률;이신원;박순철;정성종;안동언
    • Proceedings of the IEEK Conference
    • /
    • 2002.06c
    • /
    • pp.99-102
    • /
    • 2002
  • K-means clustering algorithm is a very popular clustering technique, which is used in the field of information retrieval. In this paper, We deal with the problem of K-means Algorithm from the view of creating the centroids and suggest a method reflecting document feature and considering the context of each document to determine the new centroids during the process of forming new centroids. For experiment, We used the automatic document summarizer to summarize the Reuter21578 newslire test dataset and achieved 20% improved results to the recall metrics.

  • PDF

A Development of Customer Segmentation by Using Data Mining Technique (데이터마이닝에 의한 고객세분화 개발)

  • Jin Seo-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.555-565
    • /
    • 2005
  • To Know customers is very important for the company to survive in its cut-throat competition among coimpetitors. Companies need to manage the relationship with each ana every customer, ant make each of customers as profitable as possible. CRM (Customer relationship management) has emerged as a key solution for managing the profitable relationship. In order to achieve successful CRM customer segmentation is a essential component. Clustering as a data mining technique is very useful to build data-driven segmentation. This paper is concerned with building proper customer segmentation with introducing a credit card company case. Customer segmentation was built based only on transaction data which cattle from customer's activities. Two-step clustering approach which consists of k-means clustering and agglomerative clustering was applied for building a customer segmentation.

Product Recommendation System on VLDB using k-means Clustering and Sequential Pattern Technique (k-means 클러스터링과 순차 패턴 기법을 이용한 VLDB 기반의 상품 추천시스템)

  • Shim, Jang-Sup;Woo, Seon-Mi;Lee, Dong-Ha;Kim, Yong-Sung;Chung, Soon-Key
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.1027-1038
    • /
    • 2006
  • There are many technical problems in the recommendation system based on very large database(VLDB). So, it is necessary to study the recommendation system' structure and the data-mining technique suitable for the large scale Internet shopping mail. Thus we design and implement the product recommendation system using k-means clustering algorithm and sequential pattern technique which can be used in large scale Internet shopping mall. This paper processes user information by batch processing, defines the various categories by hierarchical structure, and uses a sequential pattern mining technique for the search engine. For predictive modeling and experiment, we use the real data(user's interest and preference of given category) extracted from log file of the major Internet shopping mall in Korea during 30 days. And we define PRP(Predictive Recommend Precision), PRR(Predictive Recommend Recall), and PF1(Predictive Factor One-measure) for evaluation. In the result of experiments, the best recommendation time and the best learning time of our system are much as O(N) and the values of measures are very excellent.

A Similar Price Zone Determination of Public Land Price Using a Hybrid Clustering Technique (평균연결법과 K-means 혼합클러스터링 기법을 이용한 공시지가 유사가격권역의 설정)

  • Yi Seong-Kyu;Park Soo-Hong;Hong Sung-Eon
    • Journal of the Korean Geographical Society
    • /
    • v.41 no.1 s.112
    • /
    • pp.121-135
    • /
    • 2006
  • Even though the similar land price zone is very important element in the public land appraisal procedure, the concept is implicitly described and applied into the actual land appraisal system. This situation makes it worse when applying for the automatic selection of a comparative standard land parcel. In addition, the division of similar land price zones requires the objective and reasonable process for improving ALPAS(Automatic land Price Appraisal System), which becomes an issue today. To solve the similar land price zone determination problem that is caused by the lack of objective numerical standard, this study proposed a similar land price zone determination method using a hybrid clustering technique. Results showed that this hybrid clustering method that applied into the test area could easily detect similar land price zones with considerable accuracy levels, which are verified with some test statistics and real comparative standard land parcels done by manually.