• Title/Summary/Keyword: Optimal Clustering

Search Result 362, Processing Time 0.031 seconds

The Effectiveness of Hierarchic Clustering on Query Results in OPAC (OPAC에서 탐색결과의 클러스터링에 관한 연구)

  • Ro, Jung-Soon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.38 no.1
    • /
    • pp.35-50
    • /
    • 2004
  • This study evaluated the applicability of the static hierarchic clustering model to clustering query results in OPAC. Two clustering methods(Between Average Linkage(BAL) and Complete Linkage(CL)) and two similarity coefficients(Dice and Jaccard) were tested on the query results retrieved from 16 title-based keyword searchings. The precision of optimal dusters was improved more than 100% compared with title-word searching. There was no difference between similarity coefficients but clustering methods in optimal cluster effectiveness. CL method is better in precision ratio but BAL is better in recall ratio at the optimal top-level and bottom-level clusters. However the differences are not significant except higher recall ratio of BAL at the top-level duster. Small number of clusters and long chain of hierarchy for optimal cluster resulted from BAL could not be desirable and efficient.

A Hybrid Genetic Algorithm for K-Means Clustering

  • Jun, Sung-Hae;Han, Jin-Woo;Park, Minjae;Oh, Kyung-Whan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.330-333
    • /
    • 2003
  • Initial cluster size for clustering of partitioning methods is very important to the clustering result. In K-means algorithm, the result of cluster analysis becomes different with optimal cluster size K. Usually, the initial cluster size is determined by prior and subjective information. Sometimes this may not be optimal. Now, more objective method is needed to solve this problem. In our research, we propose a hybrid genetic algorithm, a tree induction based evolution algorithm, for determination of optimal cluster size. Initial population of this algorithm is determined by the number of terminal nodes of tree induction. From the initial population based on decision tree, our optimal cluster size is generated. The fitness function of ours is defined an inverse of dissimilarity measure. And the bagging approach is used for saying computational time cost.

  • PDF

A Design of Vector Quantization Optimal Fuzzy Systems for Vision-Based Robot Control Systems (영상 기반 로붓 제어 시스템을 위한 벡터 양자화 최적 퍼지 시스템 설계)

  • Kim, Young-Joong;Kim, Young-Rak;Kim, Beom-Soo;Lim, Myo-Taeg
    • Proceedings of the KIEE Conference
    • /
    • 2003.07d
    • /
    • pp.2447-2449
    • /
    • 2003
  • In this paper, optimal fuzzy systems using vector quantization and fuzzy logic controllers are designed for vision-based robot control systems. The complexity of the optimal fuzzy system for vision-based control systems is so great that it can not be applied to real vision-based control systems or it can not be useful, because there are so many input-output pairs. Therefore, we generally use the clustering of input-output pairs, in order to reduce the complexity of optimal fuzzy systems. To increase the effectiveness of the clustering, a vector quantization clustering method is proposed. In order to verify the effectiveness of the proposed method experimentally, it is applied to a vision-based arm robot control system.

  • PDF

Security Clustering Algorithm Based on Integrated Trust Value for Unmanned Aerial Vehicles Network

  • Zhou, Jingxian;Wang, Zengqi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1773-1795
    • /
    • 2020
  • Unmanned aerial vehicles (UAVs) network are a very vibrant research area nowadays. They have many military and civil applications. Limited bandwidth, the high mobility and secure communication of micro UAVs represent their three main problems. In this paper, we try to address these problems by means of secure clustering, and a security clustering algorithm based on integrated trust value for UAVs network is proposed. First, an improved the k-means++ algorithm is presented to determine the optimal number of clusters by the network bandwidth parameter, which ensures the optimal use of network bandwidth. Second, we considered variables representing the link expiration time to improve node clustering, and used the integrated trust value to rapidly detect malicious nodes and establish a head list. Node clustering reduce impact of high mobility and head list enhance the security of clustering algorithm. Finally, combined the remaining energy ratio, relative mobility, and the relative degrees of the nodes to select the best cluster head. The results of a simulation showed that the proposed clustering algorithm incurred a smaller computational load and higher network security.

Double K-Means Clustering (이중 K-평균 군집화)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.343-352
    • /
    • 2000
  • In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.

  • PDF

A Two-Stage Method for Near-Optimal Clustering (최적에 가까운 군집화를 위한 이단계 방법)

  • 윤복식
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.29 no.1
    • /
    • pp.43-56
    • /
    • 2004
  • The purpose of clustering is to partition a set of objects into several clusters based on some appropriate similarity measure. In most cases, clustering is considered without any prior information on the number of clusters or the structure of the given data, which makes clustering is one example of very complicated combinatorial optimization problems. In this paper we propose a general-purpose clustering method that can determine the proper number of clusters as well as efficiently carry out clustering analysis for various types of data. The method is composed of two stages. In the first stage, two different hierarchical clustering methods are used to get a reasonably good clustering result, which is improved In the second stage by ASA(accelerated simulated annealing) algorithm equipped with specially designed perturbation schemes. Extensive experimental results are given to demonstrate the apparent usefulness of our ASA clustering method.

Combined Artificial Bee Colony for Data Clustering (융합 인공벌군집 데이터 클러스터링 방법)

  • Kang, Bum-Su;Kim, Sung-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • Data clustering is one of the most difficult and challenging problems and can be formally considered as a particular kind of NP-hard grouping problems. The K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, it has high possibility to trap in local optimum and high variation of solutions with different initials for the large data set. Therefore, we need study efficient computational intelligence method to find the global optimal solution in data clustering problem within limited computational time. The objective of this paper is to propose a combined artificial bee colony (CABC) with K-means for initialization and finalization to find optimal solution that is effective on data clustering optimization problem. The artificial bee colony (ABC) is an algorithm motivated by the intelligent behavior exhibited by honeybees when searching for food. The performance of ABC is better than or similar to other population-based algorithms with the added advantage of employing fewer control parameters. Our proposed CABC method is able to provide near optimal solution within reasonable time to balance the converged and diversified searches. In this paper, the experiment and analysis of clustering problems demonstrate that CABC is a competitive approach comparing to previous partitioning approaches in satisfactory results with respect to solution quality. We validate the performance of CABC using Iris, Wine, Glass, Vowel, and Cloud UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KABCK (K-means+ABC+K-means) is better than ABCK (ABC+K-means), KABC (K-means+ABC), ABC, and K-means in our simulations.

An efficient heuristics for determining the optimal number of cluster using clustering balance (클러스터링 균형을 사용하여 최적의 클러스터 개수를 결정하기 위한 효율적인 휴리스틱)

  • Lee, Sangwook
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.792-796
    • /
    • 2009
  • Determining the optimal number of cluster is an important issue in research area of data clustering. It is choosing the cluster validity method and finding the cluster number where it optimizes the cluster validity. In this paper, an efficient heuristic for determining optimal number of cluster using clustering balance is proposed. The experimental results using k-means at artificial and real-life data set show that proposed algorithm is excellent in aspect of time efficiency.

  • PDF

Optimal Identification of IG-based Fuzzy Model by Means of Genetic Algorithms (유전자 알고리즘에 의한 IG기반 퍼지 모델의 최적 동정)

  • Park, Keon-Jun;Lee, Dong-Yoon;Oh, Sung-Kwun
    • Proceedings of the KIEE Conference
    • /
    • 2005.05a
    • /
    • pp.9-11
    • /
    • 2005
  • We propose a optimal identification of information granulation(IG)-based fuzzy model to carry out the model identification of complex and nonlinear systems. To optimally identity we use genetic algorithm (GAs) sand Hard C-Means (HCM) clustering. An initial structure of fuzzy model is identified by determining the number of input, the selected input variables, the number of membership function, and the conclusion inference type by means of GAs. Granulation of information data with the aid of Hard C-Means(HCM) clustering algorithm help determine the initial parameters of fuzzy model such as the initial apexes of the membership functions and the initial values of polynomial functions being used in the premise and consequence part of the fuzzy rules. And the initial parameters are tuned effectively with the aid of the genetic algorithms(GAs) and the least square method. Numerical example is included to evaluate the performance of the proposed model.

  • PDF

Group Search Optimization Data Clustering Using Silhouette (실루엣을 적용한 그룹탐색 최적화 데이터클러스터링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Bum-Soo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.42 no.3
    • /
    • pp.25-34
    • /
    • 2017
  • K-means is a popular and efficient data clustering method that only uses intra-cluster distance to establish a valid index with a previously fixed number of clusters. K-means is useless without a suitable number of clusters for unsupervised data. This paper aimsto propose the Group Search Optimization (GSO) using Silhouette to find the optimal data clustering solution with a number of clusters for unsupervised data. Silhouette can be used as valid index to decide the number of clusters and optimal solution by simultaneously considering intra- and inter-cluster distances. The performance of GSO using Silhouette is validated through several experiment and analysis of data sets.