• Title/Summary/Keyword: Cluster Number

Search Result 1,598, Processing Time 0.042 seconds

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

An Energy Saving Method Using Cluster Group Model in Wireless Sensor Networks (무선 센서 네트워크에서 클러스터 그룹 모델을 이용한 에너지 절약 방안)

  • Kim, Jin-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.4991-4996
    • /
    • 2010
  • Clustering method in wireless sensor network is the technique that forms the cluster to aggregate the data and transmit them at the same time that they can use the energy efficiently. Even though cluster group model is based on clustering, it differs from previous method that reducing the total energy consumption by separating energy overload to cluster group head and cluster head. In this thesis, I calculate the optimal cluster group number and cluster number in this kind of cluster group model according to threshold of energy consumption model. By using that I can minimize the total energy consumption in sensor network and maximize the network lifetime. I also show that proposed cluster group model is better than previous clustering method at the point of network energy efficiency.

Evaluation of Combustion Mechanism of Droplet Cluster in Premixed Spray Flame by Simultaneous Time-Series Measurement (동시 시계열 계측에 의한 예혼합 분무화염 내 유적군 연소기구의 평가)

  • Hwang, Seung-Min
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.31 no.6
    • /
    • pp.442-448
    • /
    • 2009
  • To evaluate the combustion mechanism of each droplet cluster downstream of the premixed spray flame, the simultaneous time-series measurements were conducted by using optical measurement system consisting of laser tomography, multi-color integrated Cassegrain receiving optics (MICRO) and phase Doppler anemometer (PDA). Furthermore, the group combustion number of droplet cluster was estimated experimentally, and the combustion mechanism of droplet cluster was examined applying the theoretical analysis. The group combustion number, $G_c$, was experimentally estimated about all droplet cluster verified by planar images, and it was classified into the internal group combustion mode and the external group combustion mode according to the theoretical analysis. It is found that there are cases in which the group combustion number estimated experimentally for droplet cluster agree or disagree with the classification by theoretical analysis. The reason of disagreement is considered due to that the group combustion number was only estimated by the geometrical arrangement of droplets in cluster, and that the actual phenomenon is three-dimensional but the measurement system is two-dimensional.

Performance Factor of Distributed Processing of Machine Learning using Spark (스파크를 이용한 머신러닝의 분산 처리 성능 요인)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.1
    • /
    • pp.19-24
    • /
    • 2021
  • In this paper, we study performance factor of machine learning in the distributed environment using Apache Spark and presents an efficient distributed processing method through experiments. This work firstly presents performance factor when performing machine learning in a distributed cluster by classifying cluster performance, data size, and configuration of spark engine. In addition, performance study of regression analysis using Spark MLlib running on the Hadoop cluster is performed while changing the configuration of the node and the Spark Executor. As a result of the experiment, it was confirmed that the effective number of executors was affected by the number of data blocks, but depending on the cluster size, the maximum and minimum values were limited by the number of cores and the number of worker nodes, respectively.

Systematic Determination of Number of Clusters Based on Input Representation Coverage (클러스터 분석을 위한 IRC기반 클러스터 개수 자동 결정 방법)

  • 신미영
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.6
    • /
    • pp.39-46
    • /
    • 2004
  • One of the significant issues in cluster analysis is to identify a proper number of clusters hidden under given data. In this paper we propose a novel approach to systematically determine the number of clusters based on Input Representation Coverage (IRC), which is newly defined as a quantified value of how well original input data in Gaussian feature space can be captured with a certain number of clusters. Furthermore, its usability and applicability is also investigated via experiments with synthetic data. Our experiment results show that the proposed approach is quite useful in approximately finding the real number of clusters implicitly contained in the data.

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

Testing Weak-Lensing Maps of Galaxy Clusters with Dense Redshift Surveys Testing Weak-Lensing Maps of Galaxy Clusters with Dense Redshift Surveys

  • Hwang, Ho Seong;Geller, Margaret J.;Diaferio, Antonaldo;Rines, Kenneth J.;Zahid, H. Jabran
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.39 no.2
    • /
    • pp.54-54
    • /
    • 2014
  • We use dense redshift surveys of nine galaxy clusters at z ~ 0.2 to compare the galaxy distribution in each system with the projected matter distribution from weak lensing. By combining 2087 new MMT/Hectospec redshifts and the data in the literature, we construct spectroscopic samples within the region of weak-lensing maps of high (70-89%) and uniform completeness. With these dense redshift surveys, we construct galaxy number density maps using several galaxy subsamples. The shape of the main cluster concentration in the weak-lensing maps is similar to the global morphology of the number density maps based on cluster members alone, mainly dominated by red members. We cross correlate the galaxy number density maps with the weak-lensing maps. The cross correlation signal when we include foreground and background galaxies at 0.5zcl < z < 2 zcl is 10 - 23% larger than for cluster members alone at the cluster virial radius. The excess can be as high as 30% depending on the cluster. Cross correlating the galaxy number density and weak-lensing maps suggests that superimposed structures close to the cluster in redshift space contribute more significantly to the excess cross correlation signal than unrelated large-scale structure along the line of sight. Interestingly, the weak-lensing mass profiles are not well constrained for the clusters with the largest cross correlation signal excesses (>20% for A383, A689 and A750). The fractional excess in the cross correlation signal including foreground and background structures could be a useful proxy for assessing the reliability of weak-lensing cluster mass estimates.

  • PDF

A Study on Optimizing the Number of Clusters using External Cluster Relationship Criterion (외부 군집 연관 기준 정보를 이용한 군집수 최적화)

  • Lee, Hyun-Jin;Jee, Tae-Chang
    • Journal of Digital Contents Society
    • /
    • v.12 no.3
    • /
    • pp.339-345
    • /
    • 2011
  • The k-means has been one of the popular, simple and faster clustering algorithms, but the right value of k is unknown. The value of k (the number of clusters) is a very important element because the result of clustering is different depending on it. In this paper, we present a novel algorithm based on an external cluster relationship criterion which is an evaluation metric of clustering result to determine the number of clusters dynamically. Experimental results show that our algorithm is superior to other methods in terms of the accuracy of the number of clusters.

A Token Based Protocol for Mutual Exclusion in Mobile Ad Hoc Networks

  • Sharma, Bharti;Bhatia, Ravinder Singh;Singh, Awadhesh Kumar
    • Journal of Information Processing Systems
    • /
    • v.10 no.1
    • /
    • pp.36-54
    • /
    • 2014
  • Resource sharing is a major advantage of distributed computing. However, a distributed computing system may have some physical or virtual resource that may be accessible by a single process at a time. The mutual exclusion issue is to ensure that no more than one process at a time is allowed to access some shared resource. The article proposes a token-based mutual exclusion algorithm for the clustered mobile ad hoc networks (MANETs). The mechanism that is adapted to handle token passing at the inter-cluster level is different from that at the intra-cluster level. It makes our algorithm message efficient and thus suitable for MANETs. In the interest of efficiency, we implemented a centralized token passing scheme at the intra-cluster level. The centralized schemes are inherently failure prone. Thus, we have presented an intra-cluster token passing scheme that is able to tolerate a failure. In order to enhance reliability, we applied a distributed token circulation scheme at the inter-cluster level. More importantly, the message complexity of the proposed algorithm is independent of N, which is the total number of nodes in the system. Also, under a heavy load, it turns out to be inversely proportional to n, which is the (average) number of nodes per each cluster. We substantiated our claim with the correctness proof, complexity analysis, and simulation results. In the end, we present a simple approach to make our protocol fault tolerant.