• 제목/요약/키워드: k-medoids clustering

검색결과 20건 처리시간 0.024초

새로운 K-medoids 군집방법 및 성능 비교 (Performance Comparison of Some K-medoids Clustering Algorithms)

  • 박해상;이상호;전치혁
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회 2006년도 추계학술대회
    • /
    • pp.421-426
    • /
    • 2006
  • We propose a new algorithm for K-medoids clustering which runs like the K-means clustering algorithm and test several methods for selecting initial medoids. The proposed algorithm calculates similarity matrix once and uses it for finding new medoids at every iterative step. To evaluate the proposed algorithm we use real and artificial data and compare with the clustering results of other algorithms in terms of three performance measures. Experimental results show that the proposed algorithm takes the reduced time in computation with comparable performance as compared to the Partitioning Around Medoids.

  • PDF

A K-means-like Algorithm for K-medoids Clustering

  • 이종석;박해상;전치혁
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회 2005년도 추계학술대회 및 정기총회
    • /
    • pp.51-54
    • /
    • 2005
  • Clustering analysis is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. In this paper we propose a new algorithm for K-medoids clustering which runs like the K-means algorithm. The new algorithm calculates distance matrix once and uses it for finding new medoids at every iterative step. We evaluate the proposed method using real and synthetic data and compare with the results of other algorithms. The proposed algorithm takes reduced time in computation and better performance than others.

  • PDF

Medoid Determination in Deterministic Annealing-based Pairwise Clustering

  • Lee, Kyung-Mi;Lee, Keon-Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제11권3호
    • /
    • pp.178-183
    • /
    • 2011
  • The deterministic annealing-based clustering algorithm is an EM-based algorithm which behaves like simulated annealing method, yet less sensitive to the initialization of parameters. Pairwise clustering is a kind of clustering technique to perform clustering with inter-entity distance information but not enforcing to have detailed attribute information. The pairwise deterministic annealing-based clustering algorithm repeatedly alternates the steps of estimation of mean-fields and the update of membership degrees of data objects to clusters until termination condition holds. Lacking of attribute value information, pairwise clustering algorithms do not explicitly determine the centroids or medoids of clusters in the course of clustering process or at the end of the process. This paper proposes a method to identify the medoids as the centers of formed clusters for the pairwise deterministic annealing-based clustering algorithm. Experimental results show that the proposed method locate meaningful medoids.

보정된 K-medoids 군집화 기법과 이분 탐색기법을 이용한 RBF 네트워크의 중심 개수와 위치와 통합 결정 (Determining the Number and the Locations of RBF Centers Using Enhanced K-Medoids Clustering and Bi-Section Search Method)

  • 이대원;이재욱
    • 대한산업공학회지
    • /
    • 제29권2호
    • /
    • pp.172-178
    • /
    • 2003
  • In the recent researches, a variety of ways for determining the locations of RBF centers have been proposed assuming that the number of RBF centers is known. But they have also many numerical drawbacks. We propose a new method to overcome such drawbacks. The strength of our method is to determine the locations and the number of RBF centers at the same time without any assumption about the number of RBF centers. The proposed method consists of two phases. The first phase is to determine the number and the locations of RBF centers using bi-section search method and enhanced k-medoids clustering which overcomes drawbacks of clustering algorithm. In the second phase, network weights are computed and the design of RBF network is completed. This new method is applied to several benchmark data sets. Benchmark results show that the proposed method is competitive with the previously reported approaches for center selection.

그룹특징기반 슬라이딩 윈도우 클러스터링에서의 k-means와 k-medoids 비교 평가 (Comparison between k-means and k-medoids Algorithms for a Group-Feature based Sliding Window Clustering)

  • 양주연;심준호
    • 한국전자거래학회지
    • /
    • 제23권3호
    • /
    • pp.225-237
    • /
    • 2018
  • 대용량 데이터의 발생과 처리가 대중화되면서 대용량 데이터 스트림 처리에 대한 수요가 급격하게 증가하고 있다. 이 수요에 따라 다양한 대용량 데이터 처리 기술이 개발되고 있다. 한 분야로 주목받고 있는 방식은 슬라이딩 윈도우를 사용한 데이터 스트림 클러스터링이다. 슬라이딩 윈도우를 사용한 데이터 스트림 클러스터링은 윈도우가 이동할 때마다 새로운 클러스터를 생성한다. 기존의 슬라이딩 윈도우 상의 클러스터링 기법은 코어셋(Coreset)을 기반으로 데이터 스트림 클러스터링을 구현하고 있다. 이 연구에서는 코어셋을 활용한 그룹특징을 이용한 알고리즘 내에서 이용하는 클러스터링 알고리즘을 변경하였다. 그리고 이를 통해 제안 알고리즘과 기존 알고리즘의 파라미터 값 변화에 따른 성능 비교 실험을 진행하였다. 개선된 사항에 대해 논하여 두 알고리즘을 비교하고 실험자에게 파라미터에 따른 이용 방향을 제시한다.

A Computational Intelligence Based Online Data Imputation Method: An Application For Banking

  • Nishanth, Kancherla Jonah;Ravi, Vadlamani
    • Journal of Information Processing Systems
    • /
    • 제9권4호
    • /
    • pp.633-650
    • /
    • 2013
  • All the imputation techniques proposed so far in literature for data imputation are offline techniques as they require a number of iterations to learn the characteristics of data during training and they also consume a lot of computational time. Hence, these techniques are not suitable for applications that require the imputation to be performed on demand and near real-time. The paper proposes a computational intelligence based architecture for online data imputation and extended versions of an existing offline data imputation method as well. The proposed online imputation technique has 2 stages. In stage 1, Evolving Clustering Method (ECM) is used to replace the missing values with cluster centers, as part of the local learning strategy. Stage 2 refines the resultant approximate values using a General Regression Neural Network (GRNN) as part of the global approximation strategy. We also propose extended versions of an existing offline imputation technique. The offline imputation techniques employ K-Means or K-Medoids and Multi Layer Perceptron (MLP)or GRNN in Stage-1and Stage-2respectively. Several experiments were conducted on 8benchmark datasets and 4 bank related datasets to assess the effectiveness of the proposed online and offline imputation techniques. In terms of Mean Absolute Percentage Error (MAPE), the results indicate that the difference between the proposed best offline imputation method viz., K-Medoids+GRNN and the proposed online imputation method viz., ECM+GRNN is statistically insignificant at a 1% level of significance. Consequently, the proposed online technique, being less expensive and faster, can be employed for imputation instead of the existing and proposed offline imputation techniques. This is the significant outcome of the study. Furthermore, GRNN in stage-2 uniformly reduced MAPE values in both offline and online imputation methods on all datasets.

수소 충전소 최적 위치 선정을 위한 기계 학습 기반 방법론 (A Machine Learning based Methodology for Selecting Optimal Location of Hydrogen Refueling Stations)

  • 김수환;류준형
    • Korean Chemical Engineering Research
    • /
    • 제58권4호
    • /
    • pp.573-580
    • /
    • 2020
  • 최근 석유를 대체할 수송 에너지원으로 수소에 대한 관심이 커지고 있다. 수소의 장점을 극대화하기 위해서는 수소 충전소가 많이 보급되어야 한다. 본 논문은 수소 충전소를 보다 가깝게 이용 할 수 있는 최적 위치 선정 방법론을 제안하였다. 기존 에너지의 공급처인 주유소와 천연가스 충전소의 위치를 우선 참고하고, 인구, 등록 차량 수 등의 데이터를 추가 반영하여 수소자동차의 예상 충전 수요를 계산하였다. 기계 학습(machine learning) 기법 중 하나인 k-중심자 군집화(k-medoids Clustering)를 이용하여 예상 수요에 대응하는 최적 수소 충전소 위치를 계산하였다. 제안된 방법의 우수성은 서울의 사례를 통해 수치적으로 설명하였다. 본 방법론과 같은 데이터 기반 방법은 향후 수소의 보급 속도를 높여 환경친화적인 경제 체계를 구축하는데 기여할 수 있을 것이다.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제40권8호
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

IPTV환경에서 온톨로지와 k-medoids기법을 이용한 개인화 시스템 (Personalized Recommendation System for IPTV using Ontology and K-medoids)

  • 윤병대;김종우;조용석;강상길
    • 지능정보연구
    • /
    • 제16권3호
    • /
    • pp.147-161
    • /
    • 2010
  • 최근 방송과 통신의 융합으로 TV에 통신이라는 기술이 접목되면서, TV 시청 형태에 많은 변화를 가져왔다. 이러한 형태의 TV 시청 변화는 서비스 선택의 폭을 넓혀주지만 프로그램을 선택을 위해 많은 시간을 투자해야 한다. 이러한 단점을 개선하기 위해서 본 논문에서는 IPTV환경에서 사용자의 다양한 콘텐츠를 제공하는 방송 환경에서 고객의 시청 정보를 바탕으로 고객 사용정보 온톨로지를 구축하고 그에 따라 고객을 k-medoids 방법을 이용해서 클러스터링 한다. 이를 바탕으로 고객이 선호하는 콘텐츠를 추천 하는 방법을 제안하였다. 실험부분에서 본 제안방법의 우수성을 기존의 방법과 비교하여 보여준다.

A Study on the Integration Between Smart Mobility Technology and Information Communication Technology (ICT) Using Patent Analysis

  • Alkaabi, Khaled Sulaiman Khalfan Sulaiman;Yu, Jiwon
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권6호
    • /
    • pp.89-97
    • /
    • 2019
  • This study proposes a method for investigating current patents related to information communication technology and smart mobility to provide insights into future technology trends. The method is based on text mining clustering analysis. The method consists of two stages, which are data preparation and clustering analysis, respectively. In the first stage, tokenizing, filtering, stemming, and feature selection are implemented to transform the data into a usable format (structured data) and to extract useful information for the next stage. In the second stage, the structured data is partitioned into groups. The K-medoids algorithm is selected over the K-means algorithm for this analysis owing to its advantages in dealing with noise and outliers. The results of the analysis indicate that most current patents focus mainly on smart connectivity and smart guide systems, which play a major role in the development of smart mobility.