• Title/Summary/Keyword: K-평균 군집분석

Search Result 449, Processing Time 0.03 seconds

A Study on the Implementation of Walking Environment Projects by Analyzing Characteristics of Pedestrian Accidents by Local Government Types (지방자치단체의 유형별 보행자사고 특성분석 및 보행환경조성사업 개선방안 연구)

  • Park, Jinkyung;Han, Myungjoo
    • Journal of Korean Society of Transportation
    • /
    • v.32 no.6
    • /
    • pp.615-627
    • /
    • 2014
  • In this study, nonhierarchical K-mean cluster analysis is used to classify the types of 230 local governments and the Mann-Whitney U test and Kruskal-Wallis analysis are used to analyze the characteristics of pedestrian accidents by region types. With empirical analysis of pedestrian accidents, this study suggests improvements of walking environments reflecting local characteristics. Type 1-A (relatively dominant urban commercial areas), Type 1-B (predominantly urban residence) and Type 2 (rural areas) have been classified using nonhierarchical K-mean cluster analysis. According to the results, pedestrian accident rate on community roads was more than 60% for all types and incidence rate in rural areas was higher than that in urban areas. In addition, pedestrian accidents of Type 1-B have been found to occur more frequently than Type 2 in intersections and crossings, while the number of roadside casualties for Type 2 was highest.

Clustering analysis of Korea's meteorological data (우리나라 기상자료에 대한 군집분석)

  • Yeo, In-Kwon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.941-949
    • /
    • 2011
  • In this paper, 72 weather stations in Korea are clustered by the hierarchical agglomerative procedure based on the average linkage method. We compare our clusters and stations divided by mountain chains which are applied to study on the impact analysis of foodborne disease outbreak due to climate change.

Regionalization of Extreme Rainfall with Spatio-Temporal Pattern (극치강수량의 시공간적 특성을 이용한 지역빈도분석)

  • Lee, Jeong-Ju;Kwon, Hyun-Han;Kim, Byung-Sik;Yoon, Seok-Yeong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.1429-1433
    • /
    • 2010
  • 수공구조물의 설계, 수자원 관리계획의 수립, 재해영향 검토 등을 수행할 때, 재현기간에 따른 확률개념의 강우량, 홍수량, 저수량 등을 산정하여 사용하게 되며, 보통 대상지역의 장기 수문관측 자료를 이용하여 수문사상의 확률분포를 산정한 후 재현기간을 연장하여 원하는 설계빈도에 해당하는 양을 추정하게 된다. 미계측지역 또는 관측자료의 보유기간이 짧은 지역의 경우는 지역빈도 분석 결과를 이용하게 된다. 지역빈도해석을 위해서는 강우자료들의 동질성을 파악하는 것이 가장 기본적인 과정이 되며 이를 위해 통계학적인 범주화분석이 선행되어야 한다. 지점 빈도분석의 수문학적 동질성 판별을 위해 L-moment 방법, K-means 방법에 의한 군집분석 등이 주로 사용되며 관측소 위치좌표를 이용한 공간보간법을 적용하여 시각화하고 있다. 강수량은 시공간적으로 변하는 수문변량으로서 강수량의 시간적인 특성 또한 강수량의 특성을 정의하는데 매우 중요한 요소이다. 이러한 점에서 본 연구를 통해 강수지점의 공간적인 좌표 및 강수량의 양적인 범주화에 초점을 맞춘 기존 지역빈도분석의 범주화 과정에 덧붙여 시간적인 영향을 고려할 수 있는 요소들을 결정하고 이를 활용할 수 있는 범주화 과정을 제시하고자 한다. 즉, 극치강수량의 발생 시기에 대한 정량적인 분석이 가능한 순환통계기법을 이용하여 관측 지점별 시간 통계량을 산정하고, 이를 극치강수량과 결합하여 시 공간적인 특성자료를 생성한 후 이를 이용한 군집화 해석 모형을 개발하는데 연구의 목적이 있다. 분석 과정에 있어서 시간속성의 정량화 및 일반화는 순환통계기법을 사용하였으며, 극치강수량과 발생시점의 속성자료는 각각의 평균과 표준편차를 이용하였다. K-means 알고리즘을 이용해 결합자료를 군집화 하고, L-moment 방법으로 지역화 결과에 대한 검증을 수행하였다. 속성 결합 자료의 군집화 효과는 모의데이터 실험을 통해 확인하였으며, 우리 나라의 58개 기상관측소 자료를 이용하여 분석을 수행하였다. 예비해석 단계에서 100회의 군집분석을 통해 평균적인 centroid를 산정하고, 해당 값을 본 해석의 초기 centroid로 지정하여, 변동적인 클러스터링 경향을 안정화시켜 해석이 반복됨에 따라 군집화 결과가 달라지는 오류를 방지하였다. 또한 K-means 방법으로 계산된 군집별 공간거리 합의 크기에 따라 군집번호를 부여함으로써 군집의 번호순서대로 물리적인 연관성이 인접하도록 설정하였으며, 군집간의 경계선을 추출할 때 발생할 수 있는 오류를 방지하였다. 지역빈도분석 결과는 3차원 Spline 기법으로 도시하였다.

  • PDF

An Optimal Clustering Using Statistical Learning Theory (통계적 학습이론을 이용한 최적 군집화)

  • 최준혁;전성해;오경환
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.229-233
    • /
    • 2005
  • 모집단의 최적군집 수를 자동으로 결정하고 군집내의 분산은 최소로 하고 군집 간의 분산은 최대로 하는 최적 군집화에 대한 연구는 대부분의 지능형 시스템에서 필요로 하는 모형전략이다. 하지만 아직도 대부분의 군집화 과정에서 분석가의 주관적인 경험에 의존하여 군집수가 결정되어 군집화가 이루어지고 있다. 예를 들어 K-평균 군집화 알고리즘에서도 초기에 K 값을 결정해 주어야 한다. 모집단을 제대로 대표하지 못한 K 값에 의한 군집화 결과는 심각한 오류를 범하게 된다. 본 논문에서는 통계적 학습이론을 이용하여 이러한 문제점을 해결하려고 하였다. VC-차원에 의한 Support Vector를 이용하여 최적의 군집화 기법을 제안하였다. 제안 방법의 성능 평가를 위하여 UCI 기계학습 데이터를 이용하여 객관적인 실험을 수행하였다.

  • PDF

Hierarchical Clustering Analysis of Water Main Leak Location Data (상수관로 누수위치 자료를 이용한 계층적 군집분석)

  • Park, Su-Wan;Im, Gwang-Chae;Choi, Chang-Lok;Kim, Kyu-Lee
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.3
    • /
    • pp.177-190
    • /
    • 2009
  • Rehabilitation projects for old water mains typically require considerable capital investments. One of the economical ways of pursuing the rehabilitation projects is to focus on a specific area within the entire region under management. In this paper the hierarchical clustering methods that analyze spatial inter-relationship of location data are applied to about 8,000 water leak location data recorded in a case study area from 1992 to 1997. Among the hierarchical clustering methods Single, Complete, and Average Linkage Methods are used to identify clusters of the water leak locations and to divide the area according to the defined clusters. By comparing the clusters identified by the clustering methods, the best clustering method for the case study area is suggested. Prioritization of the area for maintenance is obtained based on the water leak incident intensity for the clustered area using the suggested best clustering method.

Exploratory Analysis of Gene Expression Data Using Biplot (행렬도를 이용한 유전자발현자료의 탐색적 분석)

  • Park, Mi-Ra
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.355-369
    • /
    • 2005
  • Genome sequencing and microarray technology produce ever-increasing amounts of complex data that needs statistical analysis. Visualization is an effective analytic technique that exploits the ability of the human brain to process large amounts of data. In this study, biplot approach applied to microarray data to see the relationship between genes and samples. The supplementary data method to classify new sample to known category is suggested. The methods are validated by applying it to well known microarray data such as Golub et al.(1999), Alizadeh et al.(2000), Ross et al.(2000). The results are compared to the results of several clustering methods. Modified graph which combine partitioning method and biplot is also suggested.

Analysis of Apartment Power Consumption and Forecast of Power Consumption Based on Deep Learning (공동주택 전력 소비 데이터 분석 및 딥러닝을 사용한 전력 소비 예측)

  • Yoo, Namjo;Lee, Eunae;Chung, Beom Jin;Kim, Dong Sik
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1373-1380
    • /
    • 2019
  • In order to increase energy efficiency, developments of the advanced metering infrastructure (AMI) in the smart grid technology have recently been actively conducted. An essential part of AMI is analyzing power consumption and forecasting consumption patterns. In this paper, we analyze the power consumption and summarized the data errors. Monthly power consumption patterns are also analyzed using the k-means clustering algorithm. Forecasting the consumption pattern by each household is difficult. Therefore, we first classify the data into 100 clusters and then predict the average of the next day as the daily average of the clusters based on the deep neural network. Using practically collected AMI data, we analyzed the data errors and could successfully conducted power forecasting based on a clustering technique.

Comparison of clustering methods of microarray gene expression data (마이크로어레이 유전자 발현 자료에 대한 군집 방법 비교)

  • Lim, Jin-Soo;Lim, Dong-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.39-51
    • /
    • 2012
  • Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.

Identifying Hotspots on Freeways Using the Continuous Risk Profile With Hierarchical Clustering Analysis (계층적 군집분석 기반의 Continuous Risk Profile을 이용한 고속도로 사고취약구간 선정)

  • Lee, Seoyoung;Kim, Cheolsun;Kim, Dong-Kyu;Lee, Chungwon
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.4
    • /
    • pp.85-94
    • /
    • 2013
  • The Continuous Risk Profile (CRP) has been well known to be the most accurate and efficient among existing network screening methods. However, the classical CRP uses safety performance functions (SPFs) which require a huge investment to construct a database system. This study aims to suggest a new CRP method using average crash frequencies of homogeneous groups, instead of SPFs, as rescaling factors. Hierarchical clustering analysis is performed to classify freeway segments into homogeneous groups based on the data of AADT and number of lanes. Using the data from I-880 in California, the proposed method is compared to other several network screening methods. The results show that the proposed method decrease false positive rates while it does not produce any false negatives. The method developed in this study can be easily applied to screen freeway networks without any additional complex database systems, and contribute to the improvement of freeway safety management systems.

Convergence differences of academic burnout, career preparation behavior etc. by resilience clusters of students majoring in Medical records (의무기록 전공 대학생의 회복탄력성 군집에 따른 학업소진, 진로준비행동 등의 융합적 차이)

  • Lee, Hyun-Ju
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.4
    • /
    • pp.67-77
    • /
    • 2017
  • The purpose of this study is to find convergence differences of academic burnout, career preparation behavior, and general characteristics of the students majoring in medical records according to each cluster of resilience, and draw a proper improvement plan. A self-administered questionnaire survey had been conducted and a total of 168 copies were analyzed. As a results Cluster analysis was conducted on three detailed categories of resilience and was classified into two clusters. Cluster1 was a group that had points higher than the average points of Korean in each one of three categories. Cluster2 had all of which were lower points than that. Cluster1 had higher points than cluster2 in terms of career preparation behavior, hobby, subjectively good health condition, extroverted personality, good academic records, satisfaction with school life, and study satisfaction ratio, but had lower points than cluster2 in terms of academic burnout. Therefore, positiveness enhancement education focused on cluster2 will improve total group resilience.