• Title/Summary/Keyword: silhouette statistics

Search Result 22, Processing Time 0.021 seconds

Empirical Comparisons of Clustering Algorithms using Silhouette Information

  • Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.1
    • /
    • pp.31-36
    • /
    • 2010
  • Many clustering algorithms have been used in diverse fields. When we need to group given data set into clusters, many clustering algorithms based on similarity or distance measures are considered. Most clustering works have been based on hierarchical and non-hierarchical clustering algorithms. Generally, for the clustering works, researchers have used clustering algorithms case by case from these algorithms. Also they have to determine proper clustering methods subjectively by their prior knowledge. In this paper, to solve the subjective problem of clustering we make empirical comparisons of popular clustering algorithms which are hierarchical and non hierarchical techniques using Silhouette measure. We use silhouette information to evaluate the clustering results such as the number of clusters and cluster variance. We verify our comparison study by experimental results using data sets from UCI machine learning repository. Therefore we are able to use efficient and objective clustering algorithms.

Comparison of clustering with yeast microarray gene expression data (효모 마이크로어레이 유전자발현 데이터에 대한 군집화 비교)

  • Lee, Kyung-A;Kim, Jae-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.741-753
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. We compare model-based clustering, K-means, PAM, SOM and hierarchical Ward method with yeast data. As the validity measure for clustering results, connectivity, Dunn Index and silhouette values are computed and compared.

A Study on Classification and Localization of Structural Damage through Wavelet Analysis

  • Koh, Bong-Hwan;Jung, Uk
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.11a
    • /
    • pp.754-759
    • /
    • 2007
  • This study exploits the data discriminating capability of silhouette statistics, which combines wavelet-based vertical energy threshold technique for the purpose of extracting damage-sensitive features and clustering signals of the same class. This threshold technique allows to first obtain a suitable subset of the extracted or modified features of our data, i.e., good predictor sets should contain features that are strongly correlated to the characteristics of the data without considering the classification method used, although each of these features should be as uncorrelated with each other as possible. The silhouette statistics have been used to assess the quality of clustering by measuring how well an object is assigned to its corresponding cluster. We use this concept for the discriminant power function used in this paper. The simulation results of damage detection in a truss structure show that the approach proposed in this study can be successfully applied for locating both open- and breathing-type damage even in the presence of a considerable amount of process and measurement noise.

  • PDF

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data (고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교)

  • Jeong, Yun-Kyoung;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.167-181
    • /
    • 2007
  • Many clustering algorithms and cluster validation techniques for high-dimensional gene expression data have been suggested. The evaluations of these cluster validation techniques have, however, seldom been implemented. In this paper we compared various cluster validity indices for low-dimensional simulation data and real gene expression data, and found that Dunn's index is the most effective and robust, Silhouette index is next and Davies-Bouldin index is the bottom among the internal measures. Jaccard index is much more effective than Goodman-Kruskal index and adjusted Rand index among the external measures.

The Design Characteristics of Form of Jean Fashion in Fashion Collections (패션 컬렉션에 나타난 진패션의 형태적 디자인 특성)

  • Pu, Chen;Kim, Ae-Kyung;Lee, Kyoung-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.12
    • /
    • pp.577-586
    • /
    • 2012
  • This research was focused on jean jacket and jean pant design characteristics in the collection. To offer a basic proposal for the development of jean jackets and pants, pictures of fashion web pages from 2007 to 2011 were used, and data were analysed by the usage of the frequency and percentage of the SPAW Statistics 18. The results of the research were as follows. Men's jackets were mainly medium in length with a tetragonal silhouette and simple detail. On the contrary, women's jackets were mainly of an X silhouette, short in length, and with varied details. Men's jean pants were mainly represented by a straight, comfortable silhouette while women's jean pants were characterized by a variety of silhouettes, fit, and lengths.

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi;Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.6
    • /
    • pp.589-602
    • /
    • 2020
  • The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

A Study on Moving Fitness and Slit Length in Relation to Length & Silhouette of Tight Skirt (타이트 스커트 실루엣 및 길이에 따른 동작적합성과 트임길이에 관한 연구)

  • Kim, Hee Young;Choi, Hae Sun
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.17 no.4
    • /
    • pp.539-549
    • /
    • 1993
  • The purpose of this study was to find out the moving fitness and slit length of tight skirt in relation to its length & silhouette. Five kinds of length, micro mini, mini, natural line, medi and maxi, and two kinds of siihuette, slim & straight-a total of ten tight skirts-were investigated. Ten college students were chosen for this experiment. The moving fitness was tested by measuring the step length, step width and step angle in the case of walking on the flat and going up the stairway & bus stair. The slit length was tested by measuring the back slit length needed in the case of going up stairway & bus stair. Data were analyzed with use of SAS pakage. The statistics were based on average, standard diviation, two-way ANOVA, Pearson's correlation and multiple regression analysis. The main results were as follows. 1. There was significant difference in the moving fitness according to length & silhouette of tight skirt. The moving fitness of slim type was lower than that of straight type and the longer the skirt length was, the lower the moving fitness was. The significance appeared particularly in the case of going up the bus stair. 2. There was significant difference in the skirt length obove slit accorting to length & silhouette of tight skirt. The skirt length obove slit of slim type was shorter than that of straight type. The longer skirt length was, the longer it was from micro mini to natural line, that of medi skirt was shorter or a little longer than that of natural line skirt and there was little change from medi skirt to maxi skirt.

  • PDF

Body Features and Body Satisfaction of Middle-aged Women for Clothing Design (의복설계를 위한 중년여성의 체형별 특징 및 신체만족도)

  • Kim, Kyung-Hee
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.10 no.2
    • /
    • pp.57-68
    • /
    • 2008
  • In this study, we prepared reference data needed for clothing design for middle-aged women by analyzing body satisfaction of their body shape, which had been classified by collecting body features of middle-aged women. As for the study method, we have set five scales from 'never satisfied' to 'very much satisfied,' after analyzing body features of middle-aged women by measuring their body shape through the body meter and auxiliary tools. We used the SPSS 12.0 statistics program, and the results are the following: Body shapes of middle-aged women can be classified into the following four types. A middle-age women with an 'A silhouette' has a normal height, but fat nether limbs. A 'Y silhouette' is short with a fat upper body. The 'O silhouette' is short with fat nether limbs and upper body, and 'H silhouette' is tall and thin. Body shape I has displayed satisfaction with her own body shape, and body shape II showed the most dissatisfaction compared to other body shapes. Body shape III showed satisfaction on all items except face size and breast size, whereas body shape IV was dissatisfied with her face size, neck length, shape of her breast, waist, and buttocks. The result of this study is expected to contribute in accomplishing clothing production that will satisfy the desire of the consumers in the clothing business, while being utilized as the basic data for clothing design that fits their body shape by grasping the changing patterns of their body shape.

  • PDF

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls (군집분석 비교 및 한우 관능평가데이터 군집화)

  • Kim, Jae-Hee;Ko, Yoon-Sil
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.745-758
    • /
    • 2009
  • Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.