• Title/Summary/Keyword: clustering algorithms

Search Result 606, Processing Time 0.031 seconds

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Automatic Switching of Clustering Methods based on Fuzzy Inference in Bibliographic Big Data Retrieval System

  • Zolkepli, Maslina;Dong, Fangyan;Hirota, Kaoru
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.256-267
    • /
    • 2014
  • An automatic switch among ensembles of clustering algorithms is proposed as a part of the bibliographic big data retrieval system by utilizing a fuzzy inference engine as a decision support tool to select the fastest performing clustering algorithm between fuzzy C-means (FCM) clustering, Newman-Girvan clustering, and the combination of both. It aims to realize the best clustering performance with the reduction of computational complexity from O($n^3$) to O(n). The automatic switch is developed by using fuzzy logic controller written in Java and accepts 3 inputs from each clustering result, i.e., number of clusters, number of vertices, and time taken to complete the clustering process. The experimental results on PC (Intel Core i5-3210M at 2.50 GHz) demonstrates that the combination of both clustering algorithms is selected as the best performing algorithm in 20 out of 27 cases with the highest percentage of 83.99%, completed in 161 seconds. The self-adapted FCM is selected as the best performing algorithm in 4 cases and the Newman-Girvan is selected in 3 cases.The automatic switch is to be incorporated into the bibliographic big data retrieval system that focuses on visualization of fuzzy relationship using hybrid approach combining FCM and Newman-Girvan algorithm, and is planning to be released to the public through the Internet.

A Study on Performance Evaluation of Clustering Algorithms using Neural and Statistical Method (클러스터링 성능평가: 신경망 및 통계적 방법)

  • 윤석환;신용백
    • Journal of the Korean Professional Engineers Association
    • /
    • v.29 no.2
    • /
    • pp.71-79
    • /
    • 1996
  • This paper evaluates the clustering performance of a neural network and a statistical method. Algorithms which are used in this paper are the GLVQ(Generalized Loaming vector Quantization) for a neural method and the k -means algorithm for a statistical clustering method. For comparison of two methods, we calculate the Rand's c statistics. As a result, the mean of c value obtained with the GLVQ is higher than that obtained with the k -means algorithm, while standard deviation of c value is lower. Experimental data sets were the Fisher's IRIS data and patterns extracted from handwritten numerals.

  • PDF

Design of improved Mulit-FNN for Nonlinear Process modeling

  • Park, Hosung;Sungkwun Oh
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.102.2-102
    • /
    • 2002
  • In this paper, the improved Multi-FNN (Fuzzy-Neural Networks) model is identified and optimized using HCM (Hard C-Means) clustering method and optimization algorithms. The proposed Multi-FNN is based on FNN and use simplified and linear inference as fuzzy inference method and error back propagation algorithm as learning rules. We use a HCM clustering and genetic algorithms (GAs) to identify both the structure and the parameters of a Multi-FNN model. Here, HCM clustering method, which is carried out for the process data preprocessing of system modeling, is utilized to determine the structure of Multi-FNN according to the divisions of input-output space using I/O process data. Also, the parame...

  • PDF

데이터 레코드의 Clustering Algorithms

  • 문송천
    • Communications of the Korean Institute of Information Scientists and Engineers
    • /
    • v.5 no.2
    • /
    • pp.90-93
    • /
    • 1987
  • Relatively few papers are known to study the clustering the same kind of data records in a cylinder. In this article, I reviewed the clustering algorithms especially for the cellular list file which have been studied.

Design of Hierarchically Structured Clustering Algorithm and its Application (계층 구조 클러스터링 알고리즘 설계 및 그 응용)

  • Bang, Young-Keun;Park, Ha-Yong;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.29 no.B
    • /
    • pp.17-23
    • /
    • 2009
  • In many cases, clustering algorithms have been used for extracting and discovering useful information from non-linear data. They have made a great effect on performances of the systems dealing with non-linear data. Thus, this paper presents a new approach called hierarchically structured clustering algorithm, and it is applied to the prediction system for non-linear time series data. The proposed hierarchically structured clustering algorithm (called HCKA: Hierarchical Cross-correlation and K-means clustering Algorithms) in which the cross-correlation and k-means clustering algorithm are combined can accept the correlationship of non-linear time series as well as statistical characteristics. First, the optimal differences of data are generated, which can suitably reveal the characteristics of non-linear time series. Second, the generated differences are classified into the upper clusters for their predictors by the cross-correlation clustering algorithm, and then each classified differences are classified again into the lower fuzzy sets by the k-means clustering algorithm. As a result, the proposed method can give an efficient classification and improve the performance. Finally, we demonstrates the effectiveness of the proposed HCKA via typical time series examples.

  • PDF

Evaluating the Performance of Four Selections in Genetic Algorithms-Based Multispectral Pixel Clustering

  • Kutubi, Abdullah Al Rahat;Hong, Min-Gee;Kim, Choen
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.151-166
    • /
    • 2018
  • This paper compares the four selections of performance used in the application of genetic algorithms (GAs) to automatically optimize multispectral pixel cluster for unsupervised classification from KOMPSAT-3 data, since the selection among three main types of operators including crossover and mutation is the driving force to determine the overall operations in the clustering GAs. Experimental results demonstrate that the tournament selection obtains a better performance than the other selections, especially for both the number of generation and the convergence rate. However, it is computationally more expensive than the elitism selection with the slowest convergence rate in the comparison, which has less probability of getting optimum cluster centers than the other selections. Both the ranked-based selection and the proportional roulette wheel selection show similar performance in the average Euclidean distance using the pixel clustering, even the ranked-based is computationally much more expensive than the proportional roulette. With respect to finding global optimum, the tournament selection has higher potential to reach the global optimum prior to the ranked-based selection which spends a lot of computational time in fitness smoothing. The tournament selection-based clustering GA is used to successfully classify the KOMPSAT-3 multispectral data achieving the sufficient the matic accuracy assessment (namely, the achieved Kappa coefficient value of 0.923).

Energy Efficient Clustering Algorithm for Surveillance and Reconnaissance Applications in Wireless Sensor Networks (무선 센서 네트워크에서 에너지 효율적인 감시·정찰 응용의 클러스터링 알고리즘 연구)

  • Kong, Joon-Ik;Lee, Jae-Ho;Kang, Jiheon;Eom, Doo-Seop
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37C no.11
    • /
    • pp.1170-1181
    • /
    • 2012
  • Wireless Sensor Networks(WSNs) are used in diverse applications. In general, sensor nodes that are easily deployed on specific areas have many resource constrains such as battery power, memory sizes, MCUs, RFs and so on. Hence, first of all, the efficient energy consumption is strongly required in WSNs. In terms of event states, event-driven deliverly model (i.e. surveillance and reconnaissance applications) has several characteristics. On the basis of such a model, clustering algorithms can be mostly used to manage sensor nodes' energy efficiently owing to the advantages of data aggregations. Since a specific node collects packets from its child nodes in a network topology and aggregates them into one packet to relay them once, amount of transmitted packets to a sink node can be reduced. However, most clustering algorithms have been designed without considering can be reduced. However, most clustering algorithms have been designed without considering characteristics of event-driven deliverly model, which results in some problems. In this paper, we propose enhanced clustering algorithms regarding with both targets' movement and energy efficiency in order for applications of surveillance and reconnaissance. These algorithms form some clusters to contend locally between nodes, which have already detected certain targets, by using a method which called CHEW (Cluster Head Election Window). Therefore, our proposed algorithms enable to reduce not only the cost of cluster maintenance, but also energy consumption. In conclusion, we analyze traces of the clusters' movements according to targets' locations, evaluate the traces' results and we compare our algorithms with others through simulations. Finally, we verify our algorithms use power energy efficiently.

Online Clustering Algorithms for Semantic-Rich Network Trajectories

  • Roh, Gook-Pil;Hwang, Seung-Won
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.346-353
    • /
    • 2011
  • With the advent of ubiquitous computing, a massive amount of trajectory data has been published and shared in many websites. This type of computing also provides motivation for online mining of trajectory data, to fit user-specific preferences or context (e.g., time of the day). While many trajectory clustering algorithms have been proposed, they have typically focused on offline mining and do not consider the restrictions of the underlying road network and selection conditions representing user contexts. In clear contrast, we study an efficient clustering algorithm for Boolean + Clustering queries using a pre-materialized and summarized data structure. Our experimental results demonstrate the efficiency and effectiveness of our proposed method using real-life trajectory data.

Classification of Volatile Chemicals using Fuzzy Clustering Algorithm (퍼지 Clustering 알고리즘을 이용한 휘발성 화학물질의 분류)

  • Byun, Hyung-Gi;Kim, Kab-Il
    • Proceedings of the KIEE Conference
    • /
    • 1996.07b
    • /
    • pp.1042-1044
    • /
    • 1996
  • The use of fuzzy theory in task of pattern recognition may be applicable gases and odours classification and recognition. This paper reports results obtained from fuzzy c-means algorithms to patterns generated by odour sensing system using an array of conducting polymer sensors, for volatile chemicals. For the volatile chemicals clustering problem, the three unsupervise fuzzy c-means algorithms were applied. From among the pattern clustering methods, the FCMAW algorithm, which updated the cluster centres more frequently, consistently outperformed. It has been confirmed as an outstanding clustering algorithm throughout experimental trials.

  • PDF