• Title/Summary/Keyword: K-평균군집법

Search Result 63, Processing Time 0.022 seconds

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

K-means clustering using a center of gravity for grid-based sample (그리드 기반 표본의 무게중심을 이용한 케이-평균군집화)

  • Lee, Sun-Myung;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.121-128
    • /
    • 2010
  • K-means clustering is an iterative algorithm in which items are moved among sets of clusters until the desired set is reached. K-means clustering has been widely used in many applications, such as market research, pattern analysis or recognition, image processing, etc. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters that we want, because it is more primitive, explorative. In this paper we propose a new method of k-means clustering using a center of gravity for grid-based sample. It is more fast than any traditional clustering method and maintains its accuracy.

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls (군집분석 비교 및 한우 관능평가데이터 군집화)

  • Kim, Jae-Hee;Ko, Yoon-Sil
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.745-758
    • /
    • 2009
  • Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.

K-평균 군집분석을 활용한 다중대응분석의 재해석

  • 김경희;최용석
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.175-178
    • /
    • 2001
  • 다원분할표에서 범주들의 대응관계를 그래프적으로 보여주는 다중대응분석(multiple correspondence analysis)은 주결여성(principal inertia)이 총결여성(total inertia)에서 차지하는 비율이 전반적으로 낮아 설명력(goodness-of-fit)이 낮은 2차원의 대응분석그림을 얻게 된다. 이를 극복하기 위해 Benzecri의 공식을 사용하면 낮은 주결여성을 높이고 새로운 2차원 대응분석그림을 얻을 수 있다. 그러나 이 새로운 대응분석그림도 범주들의 대응관계를 명확히 보여주지는 못한다(Greenacre and Blasius, 1994, chapter 10). 앤드류 플롯(Andrews plot)을 이용하여 범주들의 군집화(clustering)로 다중대응분석을 재해석 하고자 하나 범주의 수가 많은 경우 해석상 어려움이 따른다. 본 소고에서 이와 같은 경우 K-평균 군집분석을 활용하여 다중대응분석의 해석을 용이하게 하고자 한다.

  • PDF

Comparison of clustering with yeast microarray gene expression data (효모 마이크로어레이 유전자발현 데이터에 대한 군집화 비교)

  • Lee, Kyung-A;Kim, Jae-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.741-753
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. We compare model-based clustering, K-means, PAM, SOM and hierarchical Ward method with yeast data. As the validity measure for clustering results, connectivity, Dunn Index and silhouette values are computed and compared.

Selecting Technique of Accident Sections using K-mean Method (K-평균법을 이용한 고속도로 사고분석구간 분할기법 개발)

  • Lee, Ki-Young;Chang, Myung-Soon
    • International Journal of Highway Engineering
    • /
    • v.7 no.4 s.26
    • /
    • pp.211-219
    • /
    • 2005
  • A selection of the analysis section for traffic accidents is used to analyze definitely the cause of accidents sorting similar accidents by a group and to raise the effect of improvement projects deciding the priority of accidents. In the existing method, an uniformly dividing method based on road mileages has been used, which has no consideration for similarities among accidents. Consequently, in recent, a slider-length method considering accident types rather than road mileages is widely used. In this study, using K-mean method, a non-hierarchical grouping technique used in the Cluster Analysis ai a applicatory method for the slider length method, a method classifies accidents that occurred the most nearby mileages into one group is proposed. To verify the proposed method, a comparison between the f-mean method and the dividing method at regular intervals on the data of a total of 25.6km lengths along Kyung-bu freeway in Pusan direction was made so that the K-mean method was proved to an effective method considering the similarities and adjacencies of accidents.

  • PDF

Multivariate Stratification under Consideration of Outliers (이상점을 고려한 다변량 층화)

  • Park, Jin-Woo;Yun, Seok-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.377-385
    • /
    • 2008
  • Most of the sample surveys conducted by several statistics preparation agencies are multipurpose surveys inquiring into several distinguishing items through a single sample. In a multipurpose sample design, the stratification tends to be very complex since the stratification variables which are both multivariate and heterogeneous must be considered collectively. In this paper we point out an outlier effect in a multivariate stratification to which the K-means clustering method is applied and propose to consider outliers prior to the stratification step. We also show an empirical stratification effect under consideration of outliers through a case study of sample design for The Rural Living Indicators.

A Development of Customer Segmentation by Using Data Mining Technique (데이터마이닝에 의한 고객세분화 개발)

  • Jin Seo-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.555-565
    • /
    • 2005
  • To Know customers is very important for the company to survive in its cut-throat competition among coimpetitors. Companies need to manage the relationship with each ana every customer, ant make each of customers as profitable as possible. CRM (Customer relationship management) has emerged as a key solution for managing the profitable relationship. In order to achieve successful CRM customer segmentation is a essential component. Clustering as a data mining technique is very useful to build data-driven segmentation. This paper is concerned with building proper customer segmentation with introducing a credit card company case. Customer segmentation was built based only on transaction data which cattle from customer's activities. Two-step clustering approach which consists of k-means clustering and agglomerative clustering was applied for building a customer segmentation.

Hierarchical Clustering Analysis of Water Main Leak Location Data (상수관로 누수위치 자료를 이용한 계층적 군집분석)

  • Park, Su-Wan;Im, Gwang-Chae;Choi, Chang-Lok;Kim, Kyu-Lee
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.3
    • /
    • pp.177-190
    • /
    • 2009
  • Rehabilitation projects for old water mains typically require considerable capital investments. One of the economical ways of pursuing the rehabilitation projects is to focus on a specific area within the entire region under management. In this paper the hierarchical clustering methods that analyze spatial inter-relationship of location data are applied to about 8,000 water leak location data recorded in a case study area from 1992 to 1997. Among the hierarchical clustering methods Single, Complete, and Average Linkage Methods are used to identify clusters of the water leak locations and to divide the area according to the defined clusters. By comparing the clusters identified by the clustering methods, the best clustering method for the case study area is suggested. Prioritization of the area for maintenance is obtained based on the water leak incident intensity for the clustered area using the suggested best clustering method.

A Major DNA Marker Mining of microsatellite loci in Hanwoo Chromosome 17

  • Lee, Yong-Won;Lee, Je-Yeong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.04a
    • /
    • pp.54-58
    • /
    • 2005
  • 한우 17번 염색체 유전자 지도에서 QTL (quantitative trait loci) 분석을 실시하여 선별된 Loci 값들을 순열검정(Permutation Test)을 이용하여 유의성 검정을 실시하였다. 한편, 우수 경제형질 DNA marker들을 K-평균 군집법을 실시 파악하였다. 또한, 부스트랩 방법을 이용하여 선별된 Locus의 DNA Marker들의 신뢰구간을 구하였다. 이들 QTL과 K-평균법, 부스트랩 방법에 의해 한우의 염색체 17번 BMS941의 우수 DNA Marker 85, 105번을 선별하였다.

  • PDF