• Title/Summary/Keyword: Clustering Evaluation

Search Result 328, Processing Time 0.028 seconds

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

An Improved Automated Spectral Clustering Algorithm

  • Xiaodan Lv
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.185-199
    • /
    • 2024
  • In this paper, an improved automated spectral clustering (IASC) algorithm is proposed to address the limitations of the traditional spectral clustering (TSC) algorithm, particularly its inability to automatically determine the number of clusters. Firstly, a cluster number evaluation factor based on the optimal clustering principle is proposed. By iterating through different k values, the value corresponding to the largest evaluation factor was selected as the first-rank number of clusters. Secondly, the IASC algorithm adopts a density-sensitive distance to measure the similarity between the sample points. This rendered a high similarity to the data distributed in the same high-density area. Thirdly, to improve clustering accuracy, the IASC algorithm uses the cosine angle classification method instead of K-means to classify the eigenvectors. Six algorithms-K-means, fuzzy C-means, TSC, EIGENGAP, DBSCAN, and density peak-were compared with the proposed algorithm on six datasets. The results show that the IASC algorithm not only automatically determines the number of clusters but also obtains better clustering accuracy on both synthetic and UCI datasets.

The Effect of Bias in Data Set for Conceptual Clustering Algorithms

  • Lee, Gye Sung
    • International journal of advanced smart convergence
    • /
    • v.8 no.3
    • /
    • pp.46-53
    • /
    • 2019
  • When a partitioned structure is derived from a data set using a clustering algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. Many algorithms in machine learning fields try to achieve optimized result from available training and test data. Optimization is determined by an evaluation function which has also a tendency toward a certain goal. It is inevitable to have a tendency in the evaluation function both for efficiency and for consistency in the result. But its preference for a specific goal in the evaluation function may sometimes lead to unfavorable consequences in the final result of the clustering. To overcome this bias problems, the first clustering process proceeds to construct an initial partition. The initial partition is expected to imply the possible range in the number of final clusters. We apply the data centric sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm that reduces bias effect resulting from how data is fed into the algorithm. Experiment results have been presented to show that the algorithm helps minimize the order bias effects. We have also shown that the current evaluation measure used for the clustering algorithm is biased toward favoring a smaller number of clusters and a larger size of clusters as a result.

Fundamental Considerations: Impact of Sensor Characteristics, Application Environments in Wireless Sensor Networks

  • Choi, Dongmin;Chung, Ilyong
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.4
    • /
    • pp.441-457
    • /
    • 2014
  • Observed from the recent performance evaluation of clustering schemes in wireless sensor networks, we found that most of them did not consider various sensor characteristics and its application environment. Without considering these, the performance evaluation results are difficult to be trusted because these networks are application-specific. In this paper, for the fair evaluation, we measured several clustering scheme's performance variations in accordance with sensor data pattern, number of sensors per node, density of points of interest (data density) and sensor coverage. According to the experiment result, we can conclude that clustering methods are easily influenced by POI variation. Network lifetime and data accuracy are also slightly influenced by sensor coverage and number of sensors. Therefore, in the case of the clustering scheme that did not consider various conditions, fair evaluation cannot be expected.

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.4
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.

A Method of Clustering for SCOs in the SCORM (SCORM에서 SCO의 클러스터링 기법)

  • Yun, Hong-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2230-2234
    • /
    • 2006
  • A SCO is a learning resource that is retrieved by a learner in the SCORM. A storage policy is required a learner to search SCOs rapidly in e-learning environment. In this paper, We define the mathematical formulation of clustering method for SCOs. Also we present criteria for cluster evaluation and describe procedure to evaluate each SCO. We show the search based on proposed clustering method increase performance than the existing search though performance evaluation.

Security Clustering Algorithm Based on Integrated Trust Value for Unmanned Aerial Vehicles Network

  • Zhou, Jingxian;Wang, Zengqi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1773-1795
    • /
    • 2020
  • Unmanned aerial vehicles (UAVs) network are a very vibrant research area nowadays. They have many military and civil applications. Limited bandwidth, the high mobility and secure communication of micro UAVs represent their three main problems. In this paper, we try to address these problems by means of secure clustering, and a security clustering algorithm based on integrated trust value for UAVs network is proposed. First, an improved the k-means++ algorithm is presented to determine the optimal number of clusters by the network bandwidth parameter, which ensures the optimal use of network bandwidth. Second, we considered variables representing the link expiration time to improve node clustering, and used the integrated trust value to rapidly detect malicious nodes and establish a head list. Node clustering reduce impact of high mobility and head list enhance the security of clustering algorithm. Finally, combined the remaining energy ratio, relative mobility, and the relative degrees of the nodes to select the best cluster head. The results of a simulation showed that the proposed clustering algorithm incurred a smaller computational load and higher network security.

A study on the measurement for multidimensional entity clustering (다차원 clustering문제를 위한 척도에 관한 연구)

  • Lee, Cheol
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1989.10a
    • /
    • pp.30-39
    • /
    • 1989
  • 일반적으로 cluster의 수가 미정인 상황하에서의 clustering문제는 semistructured문제로 알려져 있다. clustering문제를 구조화하는데 있어서 해의 품질평가(evaluation of solution quality)가 필수적이나 각 응용분야에 널리 적용될 수 있는 척도는 아직까지 개발되어있지 못한 상태이다. 그 주된 원인은 cluster해에 대한 개념적 차원에서의 평가기준은 제시되어있으나 척도의 구현에 있어서는 제시된 개념들이 명확하게 적용될 정도의 수준으로는 구체화되지 못한데에 기인한다고 할 수 있다. 본 연구의 목적은 개체차원이 다차원으로 확장된 clustering문제를 대상으로하는 clustering문제의 척도개발에 있다.

  • PDF

Pairwise fusion approach to cluster analysis with applications to movie data (영화 데이터를 위한 쌍별 규합 접근방식의 군집화 기법)

  • Kim, Hui Jin;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.265-283
    • /
    • 2022
  • MovieLens data consists of recorded movie evaluations that was often used to measure the evaluation score in the recommendation system research field. In this paper, we provide additional information obtained by clustering user-specific genre preference information through movie evaluation data and movie genre data. Because the number of movie ratings per user is very low compared to the total number of movies, the missing rate in this data is very high. For this reason, there are limitations in applying the existing clustering methods. In this paper, we propose a convex clustering-based method using the pairwise fused penalty motivated by the analysis of MovieLens data. In particular, the proposed clustering method execute missing imputation, and at the same time uses movie evaluation and genre weights for each movie to cluster genre preference information possessed by each individual. We compute the proposed optimization using alternating direction method of multipliers algorithm. It is shown that the proposed clustering method is less sensitive to noise and outliers than the existing method through simulation and MovieLens data application.

The Study on Improvement of Cohesion of Clustering in Incremental Concept Learning (점진적 개념학습의 클러스터 응집도 개선)

  • Baek, Hey-Jung;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.297-304
    • /
    • 2003
  • Nowdays, with the explosive growth of the web information, web users Increase requests of systems which collect and analyze web pages that are relevant. The systems which were develop to solve the request were used clustering methods to improve the duality of information. Clustering is defining inter relationship of unordered data and grouping data systematically. The systems using clustering provide the grouped information to the users. So, they understand the information efficiently. We proposed a hybrid clustering method to cluster a large quantity of data efficiently. By that method, We generate initial clusters using COBWEB Algorithm and refine them using Ezioni Algorithm. This paper adds two ideas in prior hybrid clustering method to increment accuracy and efficiency of clusters. Firstly, we propose the clustering method considering weight of attributes of data. Second, we redefine evaluation functions which generate initial clusters to increase efficiency in clustering. Clustering method proposed in this paper processes a large quantity of data and diminish of dependancy on sequence of input of data. So the clusters are useful to make user profiles in high quality. Ultimately, we will show that the proposed clustering method outperforms the pervious clustering method in the aspect of precision and execution speed.