• Title/Summary/Keyword: cluster number determination

Search Result 20, Processing Time 0.025 seconds

Systematic Determination of Number of Clusters Based on Input Representation Coverage (클러스터 분석을 위한 IRC기반 클러스터 개수 자동 결정 방법)

  • 신미영
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.6
    • /
    • pp.39-46
    • /
    • 2004
  • One of the significant issues in cluster analysis is to identify a proper number of clusters hidden under given data. In this paper we propose a novel approach to systematically determine the number of clusters based on Input Representation Coverage (IRC), which is newly defined as a quantified value of how well original input data in Gaussian feature space can be captured with a certain number of clusters. Furthermore, its usability and applicability is also investigated via experiments with synthetic data. Our experiment results show that the proposed approach is quite useful in approximately finding the real number of clusters implicitly contained in the data.

Automatic Categorization of Clusters in Unsupervised Classificatin

  • Jeon, Dong-Keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.1E
    • /
    • pp.29-33
    • /
    • 1996
  • A categorization for cluster is necessary when an unsupervised classfication is used for remote sensing image classification. It is desirable that this method is performed automatically, because manual categorization is a highly time consuming process. In this paper, several automatic determination methods were proposed and evaluated. They are four methods. a) maximum number method : which assigns the tharget cluster to the category which occupies the largest area of that cluster b) maximum percentage method : which assigns the target cluster to the category which shows the maximum percentage within the category in that cluster. c) minmun distance method : which assigns the target cluster to the category having minmum distance with that cluster d) element ratio matching method : which assigns local regions to the category having the most similar element ratio of that region From the results of the experiments, it was certified that the result of minimum distance method was almost the same as the result made by a human operator.

  • PDF

A Hybrid Genetic Algorithm for K-Means Clustering

  • Jun, Sung-Hae;Han, Jin-Woo;Park, Minjae;Oh, Kyung-Whan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.330-333
    • /
    • 2003
  • Initial cluster size for clustering of partitioning methods is very important to the clustering result. In K-means algorithm, the result of cluster analysis becomes different with optimal cluster size K. Usually, the initial cluster size is determined by prior and subjective information. Sometimes this may not be optimal. Now, more objective method is needed to solve this problem. In our research, we propose a hybrid genetic algorithm, a tree induction based evolution algorithm, for determination of optimal cluster size. Initial population of this algorithm is determined by the number of terminal nodes of tree induction. From the initial population based on decision tree, our optimal cluster size is generated. The fitness function of ours is defined an inverse of dissimilarity measure. And the bagging approach is used for saying computational time cost.

  • PDF

A study on the determination of the number of mobility cluster (적정 이동군집수 결정에 관한 연구)

  • ;Ham, Sung Hun
    • Journal of the Korean Geographical Society
    • /
    • v.30 no.2
    • /
    • pp.120-131
    • /
    • 1995
  • To analyze mobility patterns, this study used three Constraint (Capability Constraint, Coupling Constraint, Authority Constraint) models which were proposed in Dr. Hagerstrand's Time-space theory. This paper shows that three constraint models have some effects upon mobility by age. In this study, Capability Constraint means a certain special constraint that is what we can't do during proceeding basic natural urges like sleep, fare, etc. Coupling constraint is a physical one. Each person limits the action range for staying on a special place in special time. For instance, students have to stay in school so that they have mobility constraints. Authority Constraint is a social one. When we use urban facilities or traffic, we may be controlled by mobility sphere by an agreement or a social position. It is social agreement that the opening hour of a store, the time table of mass-transportation and a social positional control that the personal income, the standard of education. In this study it has been in a process of determination of the cluster number that degree of influences a social constraint to mobility. Considering the mobility constraint of characteristics of space divides urban and rural, people in urban area have higher mobility rate than in rural area. Resuets of determination of the cluster, show similar mobility pattern. People in urban area are connected verity of mobility which related to urban space structures with determination of cluste-number. That is to say, mobility patterns can be changed by space charactcristics. Constraints by sex and age are also social constraints and they are influenced by mobility patterns. For instance, females at the age of twenties have similar mobility pattern to the same age male but they have sudden changes after thirty's age. Male entertains a similar pattern without restriction of age. That is to say, management by sex as a social constraint affects mobility. To establish more realistic traffie policy, mobility formation should be reflected to the space in a view of social-behavioral science. To embody this, some problems should be investigated as follows. 1. As a problem of methodology, if sufficient samples ensured, we could subdivide clusters and could open up a new method of analyzing the mobility clusters by using the neuro-network. 2. Extracting actions connected with mobility and finding life cycle which is classified by daily cluste-characteristics, suitable counterproposal could be presented to the traific policy.

  • PDF

Automatic Determination of Usenet News Groups from User Profile (사용자 프로파일에 기초한 유즈넷 뉴스그룹 자동 결정 방법)

  • Kim, Jong-Wan;Cho, Kyu-Cheol;Kim, Hee-Jae;Kim, Byeong-Man
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.142-149
    • /
    • 2004
  • It is important to retrieve exact information coinciding with user's need from lots of Usenet news and filter desired information quickly. Differently from email system, we must previously register our interesting news group if we want to get the news information. However, it is not easy for a novice to decide which news group is relevant to his or her interests. In this work, we present a service classifying user preferred news groups among various news groups by the use of Kohonen network. We first extract candidate terms from example documents and then choose a number of representative keywords to be used in Kohonen network from them through fuzzy inference. From the observation of training patterns, we could find the sparsity problem that lots of keywords in training patterns are empty. Thus, a new method to train neural network through reduction of unnecessary dimensions by the statistical coefficient of determination is proposed in this paper. Experimental results show that the proposed method is superior to the method using every dimension in terms of cluster overlap defined by using within cluster distance and between cluster distance.

Deduplication and Exploitability Determination of UAF Vulnerability Samples by Fast Clustering

  • Peng, Jianshan;Zhang, Mi;Wang, Qingxian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.10
    • /
    • pp.4933-4956
    • /
    • 2016
  • Use-After-Free (UAF) is a common lethal form of software vulnerability. By using tools such as Web Browser Fuzzing, a large amount of samples containing UAF vulnerabilities can be generated. To evaluate the threat level of vulnerability or to patch the vulnerabilities, automatic deduplication and exploitability determination should be carried out for these samples. There are some problems existing in current methods, including inadequate pertinence, lack of depth and precision of analysis, high time cost, and low accuracy. In this paper, in terms of key dangling pointer and crash context, we analyze four properties of similar samples of UAF vulnerability, explore the method of extracting and calculate clustering eigenvalues from these samples, perform clustering by fast search and find of density peaks on a large number of vulnerability samples. Samples were divided into different UAF vulnerability categories according to the clustering results, and the exploitability of these UAF vulnerabilities was determined by observing the shape of class cluster. Experimental results showed that the approach was applicable to the deduplication and exploitability determination of a large amount of UAF vulnerability samples, with high accuracy and low performance cost.

A Study on the Scalability of Multi-core-PC Cluster for Seismic Design of Reinforced-Concrete Structures based on Genetic Algorithm (유전알고리즘 기반 콘크리트 구조물의 최적화 설계를 위한 멀티코어 퍼스널 컴퓨터 클러스터의 확장 가능성 연구)

  • Park, Keunhyoung;Choi, Se Woon;Kim, Yousok;Park, Hyo Seon
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.26 no.4
    • /
    • pp.275-281
    • /
    • 2013
  • In this paper, determination of the scalability of the cluster composed common personal computer was performed when optimization of reinforced concrete structure using genetic algorithm. The goal of this research is watching the potential of multi-core-PC cluster for optimization of seismic design of reinforced-concrete structures. By increasing the number of core-processer of cluster, decreasing of computation time per each generation of genetic algorithm was observed. After classifying the components in singular personal computer, the estimation of the expected bottle-neck phenomenon and comparison with wall-clock time and Amdahl's law equation was performed. So we could obseved the scalability of the cluster appear complex tendency. For separating the bottle-neck phenomenon of physical and algorithm, the different size of population was selected for genetic algorithm cases. When using 64 core-processor, the efficiency of cluster is low as 31.2% compared with Amdahl's law efficiency.

AUTOMATED ELECTROFACIES DETERMINATION USING MULTIVARIATE STATISTICAL ANALYSIS

  • Kim Jungwhan;Lim Jong-Se
    • 한국석유지질학회:학술대회논문집
    • /
    • spring
    • /
    • pp.10-14
    • /
    • 1998
  • A systematic methodology is developed for the electrofacies determination from wireline log data using multivariate statistical analysis. To consider corresponding contribution of each log and reduce the computational dimension, multivariate logs are transformed into a single variable through principal components analysis. Resultant principal components logs are segmented using the statistical zonation method to enhance the efficiency and quality of the interpreted results. Hierarchical cluster analysis is then used to group the segments into electrofacies. Optimal number of groups is determined on the basis of the ratio of within-group variance to total variance and core data. This technique is applied to the wells in the Korea Continental Shelf. The results of field application demonstrate that the prediction of lithology based on the electrofacies classification matches well to the core and the cutting data with high reliability This methodology for electrofacies classification can be used to define the reservoir characteristics which are helpful to the reservoir management.

  • PDF

SEJONG OPEN CLUSTER SURVEY (SOS). 0. TARGET SELECTION AND DATA ANALYSIS

  • Sung, Hwankyung;Lim, Beomdu;Bessell, Michael S.;Kim, Jinyoung S.;Hur, Hyeonoh;Chun, Moo-Young;Park, Byeong-Gon
    • Journal of The Korean Astronomical Society
    • /
    • v.46 no.3
    • /
    • pp.103-123
    • /
    • 2013
  • Star clusters are superb astrophysical laboratories containing cospatial and coeval samples of stars with similar chemical composition. We initiate the Sejong Open cluster Survey (SOS) - a project dedicated to providing homogeneous photometry of a large number of open clusters in the SAAO Johnson-Cousins' UBV I system. To achieve our main goal, we pay much attention to the observation of standard stars in order to reproduce the SAAO standard system. Many of our targets are relatively small sparse clusters that escaped previous observations. As clusters are considered building blocks of the Galactic disk, their physical properties such as the initial mass function, the pattern of mass segregation, etc. give valuable information on the formation and evolution of the Galactic disk. The spatial distribution of young open clusters will be used to revise the local spiral arm structure of the Galaxy. In addition, the homogeneous data can also be used to test stellar evolutionary theory, especially concerning rare massive stars. In this paper we present the target selection criteria, the observational strategy for accurate photometry, and the adopted calibrations for data analysis such as color-color relations, zero-age main sequence relations, Sp - MV relations, Sp - $T_{eff}$ relations, Sp - color relations, and $T_{eff}$ - BC relations. Finally we provide some data analysis such as the determination of the reddening law, the membership selection criteria, and distance determination.

Reproducibility Assessment of K-Means Clustering and Applications (K-평균 군집화의 재현성 평가 및 응용)

  • 허명회;이용구
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.135-144
    • /
    • 2004
  • We propose a reproducibility (validity) assessment procedure of K-means cluster analysis by randomly partitioning the data set into three parts, of which two subsets are used for developing clustering rules and one subset for testing consistency of clustering rules. Also, as an alternative to Rand index and corrected Rand index, we propose an entropy-based consistency measure between two clustering rules, and apply it to determination of the number of clusters in K-means clustering.