• 제목/요약/키워드: k means cluster analysis

검색결과 370건 처리시간 0.031초

A Study on K -Means Clustering

  • Bae, Wha-Soo;Roh, Se-Won
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.497-508
    • /
    • 2005
  • This paper aims at studying on K-means Clustering focusing on initialization which affect the clustering results in K-means cluster analysis. The four different methods(the MA method, the KA method, the Max-Min method and the Space Partition method) were compared and the clustering result shows that there were some differences among these methods, especially that the MA method sometimes leads to incorrect clustering due to the inappropriate initialization depending on the types of data and the Max-Min method is shown to be more effective than other methods especially when the data size is large.

Classification of Daily Precipitation Patterns in South Korea using Mutivariate Statistical Methods

  • Mika, Janos;Kim, Baek-Jo;Park, Jong-Kil
    • 한국환경과학회지
    • /
    • 제15권12호
    • /
    • pp.1125-1139
    • /
    • 2006
  • The cluster analysis of diurnal precipitation patterns is performed by using daily precipitation of 59 stations in South Korea from 1973 to 1996 in four seasons of each year. Four seasons are shifted forward by 15 days compared to the general ones. Number of clusters are 15 in winter, 16 in spring and autumn, and 26 in summer, respectively. One of the classes is the totally dry day in each season, indicating that precipitation is never observed at any station. This is treated separately in this study. Distribution of the days among the clusters is rather uneven with rather low area-mean precipitation occurring most frequently. These 4 (seasons)$\times$2 (wet and dry days) classes represent more than the half (59 %) of all days of the year. On the other hand, even the smallest seasonal clusters show at least $5\sim9$ members in the 24 years (1973-1996) period of classification. The cluster analysis is directly performed for the major $5\sim8$ non-correlated coefficients of the diurnal precipitation patterns obtained by factor analysis In order to consider the spatial correlation. More specifically, hierarchical clustering based on Euclidean distance and Ward's method of agglomeration is applied. The relative variance explained by the clustering is as high as average (63%) with better capability in spring (66%) and winter (69 %), but lower than average in autumn (60%) and summer (59%). Through applying weighted relative variances, i.e. dividing the squared deviations by the cluster averages, we obtain even better values, i.e 78 % in average, compared to the same index without clustering. This means that the highest variance remains in the clusters with more precipitation. Besides all statistics necessary for the validation of the final classification, 4 cluster centers are mapped for each season to illustrate the range of typical extremities, paired according to their area mean precipitation or negative pattern correlation. Possible alternatives of the performed classification and reasons for their rejection are also discussed with inclusion of a wide spectrum of recommended applications.

An Analysis of Replication Enhancement for a High Availability Cluster

  • Park, Sehoon;Jung, Im Y.;Eom, Heonsang;Yeom, Heon Y.
    • Journal of Information Processing Systems
    • /
    • 제9권2호
    • /
    • pp.205-216
    • /
    • 2013
  • In this paper, we analyze a technique for building a high-availability (HA) cluster system. We propose what we have termed the 'Selective Replication Manager (SRM),' which improves the throughput performance and reduces the latency of disk devices by means of a Distributed Replicated Block Device (DRBD), which is integrated in the recent Linux Kernel (version 2.6.33 or higher) and that still provides HA and failover capabilities. The proposed technique can be applied to any disk replication and database system with little customization and with a reasonably low performance overhead. We demonstrate that this approach using SRM increases the disk replication speed and reduces latency by 17% and 7%, respectively, as compared to the existing DRBD solution. This approach represents a good effort to increase HA with a minimum amount of risk and cost in terms of commodity hardware.

간호역량 군집 유형에 따른 성찰 수준, 팀학습 분위기 및 학습조직 구축정도 비교 (Comparison of Reflection Hierarchy, Team Learning Climate, and Learning Organization Building on Nursing Competency in Clinical Nurses)

  • 김희영;장금성
    • 간호행정학회지
    • /
    • 제19권2호
    • /
    • pp.282-291
    • /
    • 2013
  • Purpose: The purpose of this study was to identify clusters of nursing competency, and investigate the influence of reflective thinking, team learning climate, and learning organization building according to nursing competency clusters. Methods: Participants were 244 clinical nurses who worked in 4 general hospitals in Gwangju Metropolitan City. Data were collected by self-report questionnaires during June and July, 2011. Nursing competency, levels of reflection hierarchy, team learning climate, and learning organization building were measured. Data were analyzed using frequencies, means, t-test, one-way ANOVA, Pearson correlation coefficients, and K-means cluster analysis with SPSS/WIN 20.0 version. Results: Nursing competency correlated positively with intensive reflection, reflection, team learning climate, and learning organization building (p<.001). There were three clusters of nursing competency in a clinical ladder, which were derived from cluster analysis, grouped as high, middle, and low competency. Intensive reflection, reflection, team learning climate, and learning organization building showed significant differences according to grouping of nursing competency. Conclusion: The results indicate that developing intensive reflection, reflection, team learning climate, and learning organization building would be useful strategies for enhancement of nursing competency.

날씨 마케팅 적용을 위한 기후 데이터의 군집 분석 (Cluster Analysis of Climate Data for Applying Weather Marketing)

  • 이양구;김원태;정영진;김광득;류근호
    • 한국공간정보시스템학회 논문지
    • /
    • 제7권3호
    • /
    • pp.33-44
    • /
    • 2005
  • 최근 환경오염으로 인한 날씨의 변화, 자원 고갈에 따른 국제 유가의 상승 등 날씨 및 에너지 문제가 기업이나 국가 심지어 개인의 일상생활, 경제활동에 크나큰 영향을 미치고 있다. 이와 같은 이유로 대체에너지 중 태양 에너지 개발에 필요한 일사량 관리와 기후 데이터의 변화 특성 등을 근거로 지역성 규명에 관한 연구가 많이 이루어지고 있다. 그러나 아직까지는 데이터 마이닝을 이용한 지역적 특성에 따른 군집 및 체계적인 분석 데이터 검색 서비스가 효과적으로 제공되지 않고 있다. 따라서 이 논문에서는 국내에서 측정된 기후데이터를 저장 및 관리하기 위한 데이터를 모델링하고, k-means 기법을 이용하여 국내 기후 데이터를 지역적 특성에 따라 군집함으로써 체계적인 데이터 정보를 제공한다. 그리고 이러한 정보들이 날씨 마케팅에 어떻게 적용되는가에 대한 사례를 보인다. 제안 시스템은 기업의 날씨 마케팅 연구 및 이에 영향을 미치는 요소와 분석 정보를 제공할 수 있는 기본 데이터베이스 구축에 유용하게 활용될 것이다.

  • PDF

COUNTING OF FLOWERS BASED ON K-MEANS CLUSTERING AND WATERSHED SEGMENTATION

  • PAN ZHAO;BYEONG-CHUN SHIN
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • 제27권2호
    • /
    • pp.146-159
    • /
    • 2023
  • This paper proposes a hybrid algorithm combining K-means clustering and watershed algorithms for flower segmentation and counting. We use the K-means clustering algorithm to obtain the main colors in a complex background according to the cluster centers and then take a color space transformation to extract pixel values for the hue, saturation, and value of flower color. Next, we apply the threshold segmentation technique to segment flowers precisely and obtain the binary image of flowers. Based on this, we take the Euclidean distance transformation to obtain the distance map and apply it to find the local maxima of the connected components. Afterward, the proposed algorithm adaptively determines a minimum distance between each peak and apply it to label connected components using the watershed segmentation with eight-connectivity. On a dataset of 30 images, the test results reveal that the proposed method is more efficient and precise for the counting of overlapped flowers ignoring the degree of overlap, number of overlap, and relatively irregular shape.

군집분석 비교 및 한우 관능평가데이터 군집화 (A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls)

  • 김재희;고윤실
    • 응용통계연구
    • /
    • 제22권4호
    • /
    • pp.745-758
    • /
    • 2009
  • 자발적인 군집을 유도하는 다변량 통계기법으로 널리 사용되는 군집분석은 데이터에 기반한 탐색적 방법으로 쓰이며 군집원칙에 따라 여러 가지 방법이 제안되어 왔다. 또한 군집화된 결과에 대하여 유효성을 측정하는 측도도 다양한방법이 개발되었다. 본 연구에서는 계층적 군집분석 방법으로 최장연결법과 Ward의 방법, 비계층적 군집분석 방법으로 K-평균법 그리고 확률분포정보를 활용한 모형기반 군집분석방법을 이용하여 모의실험으로 군집분석을 실시하고 군집유효성 측도로는 연결성, Dunn 지수, 실루엣을 구하여 각 군집방법에 대해 유효성을 비교한다. 또한, 한우 관능평가 데이터에 군집분석을 적용하여 최적의 군집 상황을 구하고자 한다.

Detection of onset of failure in prestressed strands by cluster analysis of acoustic emissions

  • Ercolino, Marianna;Farhidzadeh, Alireza;Salamone, Salvatore;Magliulo, Gennaro
    • Structural Monitoring and Maintenance
    • /
    • 제2권4호
    • /
    • pp.339-355
    • /
    • 2015
  • Corrosion of prestressed concrete structures is one of the main challenges that engineers face today. In response to this national need, this paper presents the results of a long-term project that aims at developing a structural health monitoring (SHM) technology for the nondestructive evaluation of prestressed structures. In this paper, the use of permanently installed low profile piezoelectric transducers (PZT) is proposed in order to record the acoustic emissions (AE) along the length of the strand. The results of an accelerated corrosion test are presented and k-means clustering is applied via principal component analysis (PCA) of AE features to provide an accurate diagnosis of the strand health. The proposed approach shows good correlation between acoustic emissions features and strand failure. Moreover, a clustering technique for the identification of false alarms is proposed.

데이터 마이닝을 이용한 한의비만변증 설문지 재평가: 실제 임상에서 수집한 설문응답 기반으로 (Re-evaluation of Obesity Syndrome Differentiation Questionnaire Based on Real-world Survey Data Using Data Mining)

  • 오지홍;왕징화;최선미;김호준
    • 한방비만학회지
    • /
    • 제21권2호
    • /
    • pp.80-94
    • /
    • 2021
  • Objectives: The purpose of this study is to re-evaluate the importance of questions of obesity syndrome differentiation (OSD) questionnaire based on real-world survey and to explore the possibility of simplifying OSD types. Methods: The OSD frequency was identified, and variance threshold feature selection was performed to filter the questions. Filtered questions were clustered by K-means clustering and hierarchical clustering. After principal component analysis (PCA), the distribution patterns of the subjects were identified and the differences in the syndrome distribution were compared. Results: The frequency of OSD in spleen deficiency, phlegm (PH), and blood stasis (BS) was lower than in food retention (FR), liver qi stagnation (LS), and yang deficiency. We excluded 13 questions with low variance, 7 of which were related to BS. Filtered questions were clustered into 3 groups by K-means clustering; Cluster 1 (17 questions) mainly related to PH, BS syndromes; Cluster 2 (11 questions) related to swelling, and indigestion; Cluster 3 (11 questions) related to overeating or emotional symptoms. After PCA, significant different patterns of subjects were observed in the FR, LS, and other obesity syndromes. The questions that mainly affect the FR distribution were digestive symptoms. And emotional symptoms mainly affect the distribution of LS subjects. And other obesity syndrome was partially affected by both digestive and emotional symptoms, and also affected by symptoms related to poor circulation. Conclusions: In-depth data mining analysis identified relatively low importance questions and the potential to simplify OSD types.

평균회귀 심박변이도의 K-평균 군집화 학습을 통한 심실조기수축 부정맥 신호의 특성분석 (Characterization of Premature Ventricular Contraction by K-Means Clustering Learning Algorithm with Mean-Reverting Heart Rate Variability Analysis)

  • 김정환;김동준;이정환;김경섭
    • 전기학회논문지
    • /
    • 제66권7호
    • /
    • pp.1072-1077
    • /
    • 2017
  • Mean-reverting analysis refers to a way of estimating the underlining tendency after new data has evoked the variation in the equilibrium state. In this paper, we propose a new method to interpret the specular portraits of Premature Ventricular Contraction(PVC) arrhythmia by applying K-means unsupervised learning algorithm on electrocardiogram(ECG) data. Aiming at this purpose, we applied a mean-reverting model to analyse Heart Rate Variability(HRV) in terms of the modified poincare plot by considering PVC rhythm as the component of disrupting the homeostasis state. Based on our experimental tests on MIT-BIH ECG database, we can find the fact that the specular patterns portraited by K-means clustering on mean-reverting HRV data can be more clearly visible and the Euclidean metric can be used to identify the discrepancy between the normal sinus rhythm and PVC beats by the relative distance among cluster-centroids.