• Title/Summary/Keyword: k-평균군집방법

Search Result 192, Processing Time 0.03 seconds

Regionalization of Extreme Rainfall with Spatio-Temporal Pattern (극치강수량의 시공간적 특성을 이용한 지역빈도분석)

  • Lee, Jeong-Ju;Kwon, Hyun-Han;Kim, Byung-Sik;Yoon, Seok-Yeong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.1429-1433
    • /
    • 2010
  • 수공구조물의 설계, 수자원 관리계획의 수립, 재해영향 검토 등을 수행할 때, 재현기간에 따른 확률개념의 강우량, 홍수량, 저수량 등을 산정하여 사용하게 되며, 보통 대상지역의 장기 수문관측 자료를 이용하여 수문사상의 확률분포를 산정한 후 재현기간을 연장하여 원하는 설계빈도에 해당하는 양을 추정하게 된다. 미계측지역 또는 관측자료의 보유기간이 짧은 지역의 경우는 지역빈도 분석 결과를 이용하게 된다. 지역빈도해석을 위해서는 강우자료들의 동질성을 파악하는 것이 가장 기본적인 과정이 되며 이를 위해 통계학적인 범주화분석이 선행되어야 한다. 지점 빈도분석의 수문학적 동질성 판별을 위해 L-moment 방법, K-means 방법에 의한 군집분석 등이 주로 사용되며 관측소 위치좌표를 이용한 공간보간법을 적용하여 시각화하고 있다. 강수량은 시공간적으로 변하는 수문변량으로서 강수량의 시간적인 특성 또한 강수량의 특성을 정의하는데 매우 중요한 요소이다. 이러한 점에서 본 연구를 통해 강수지점의 공간적인 좌표 및 강수량의 양적인 범주화에 초점을 맞춘 기존 지역빈도분석의 범주화 과정에 덧붙여 시간적인 영향을 고려할 수 있는 요소들을 결정하고 이를 활용할 수 있는 범주화 과정을 제시하고자 한다. 즉, 극치강수량의 발생 시기에 대한 정량적인 분석이 가능한 순환통계기법을 이용하여 관측 지점별 시간 통계량을 산정하고, 이를 극치강수량과 결합하여 시 공간적인 특성자료를 생성한 후 이를 이용한 군집화 해석 모형을 개발하는데 연구의 목적이 있다. 분석 과정에 있어서 시간속성의 정량화 및 일반화는 순환통계기법을 사용하였으며, 극치강수량과 발생시점의 속성자료는 각각의 평균과 표준편차를 이용하였다. K-means 알고리즘을 이용해 결합자료를 군집화 하고, L-moment 방법으로 지역화 결과에 대한 검증을 수행하였다. 속성 결합 자료의 군집화 효과는 모의데이터 실험을 통해 확인하였으며, 우리 나라의 58개 기상관측소 자료를 이용하여 분석을 수행하였다. 예비해석 단계에서 100회의 군집분석을 통해 평균적인 centroid를 산정하고, 해당 값을 본 해석의 초기 centroid로 지정하여, 변동적인 클러스터링 경향을 안정화시켜 해석이 반복됨에 따라 군집화 결과가 달라지는 오류를 방지하였다. 또한 K-means 방법으로 계산된 군집별 공간거리 합의 크기에 따라 군집번호를 부여함으로써 군집의 번호순서대로 물리적인 연관성이 인접하도록 설정하였으며, 군집간의 경계선을 추출할 때 발생할 수 있는 오류를 방지하였다. 지역빈도분석 결과는 3차원 Spline 기법으로 도시하였다.

  • PDF

An Efficient Slant Correction for Handwritten Hangul Strings using Structural Properties (한글필기체의 구조적 특징을 이용한 효율적 기울기 보정)

  • 유대근;김경환
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.93-102
    • /
    • 2003
  • A slant correction method for handwritten Korean strings based on analysis of stroke distribution, which effectively reflects structural properties of Korean characters, is presented in this paper. The method aims to deal with typical problems which have been frequently observed in slant correction of handwritten Korean strings with conventional approaches developed for English/European languages. Extracted strokes from a line of text image are classified into two clusters by applying the K-means clustering. Gaussian modeling is applied to each of the clusters and the slant angle is estimated from the model which represents the vertical strokes. Experimental results support the effectiveness of the proposed method. For the performance comparison 1,300 handwritten address string images were used, and the results show that the proposed method has more superior performance than other conventional approaches.

A Study on the Relationship between Skill and Competition Score Factors of KLPGA Players Using Canonical Correlation Biplot and Cluster Analysis (정준상관 행렬도와 군집분석을 응용한 KLPGA 선수의 기술과 경기성적요인에 대한 연관성 분석)

  • Choi, Tae-Hoon;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.429-439
    • /
    • 2008
  • Canonical correlation biplot is 2-dimensional plot for investigating the relationship between two sets of variables and the relationship between observations and variables in canonical correlation analysis graphically. In general, biplot is useful for giving a graphical description of the data. However, this general biplot and also canonical correlation biplot do not give some concise interpretations between variables and observations when the number of observations are large. Recently, for overcoming this problem, Choi and Kim (2008) suggested a method to interpret the biplot analysis by applying the K-means clustering analysis. Therefore, in this study, we will apply their method for investigating the relationship between skill and competition score factors of KLPGA players using canonical correlation biplot and cluster analysis.

A Major DNA Marker Mining of microsatellite loci in Hanwoo Chromosome 17

  • Lee, Yong-Won;Lee, Je-Yeong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.04a
    • /
    • pp.54-58
    • /
    • 2005
  • 한우 17번 염색체 유전자 지도에서 QTL (quantitative trait loci) 분석을 실시하여 선별된 Loci 값들을 순열검정(Permutation Test)을 이용하여 유의성 검정을 실시하였다. 한편, 우수 경제형질 DNA marker들을 K-평균 군집법을 실시 파악하였다. 또한, 부스트랩 방법을 이용하여 선별된 Locus의 DNA Marker들의 신뢰구간을 구하였다. 이들 QTL과 K-평균법, 부스트랩 방법에 의해 한우의 염색체 17번 BMS941의 우수 DNA Marker 85, 105번을 선별하였다.

  • PDF

순열검정과 부스트랩 방법에 의한 한우 6번 염색체의 ILSTS035에 대한 우수 DNA Marker 선별

  • Lee, Yong-Won;Lee, Je-Yeong;Kim, Mun-Jeong;Han, Cho-Hui
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.325-329
    • /
    • 2003
  • 한우 6번 염색체 유전자 지도에서 QTL (quantitative trait loci) 분석을 실시하여 선별된Locus 값을 순열검정(Permutation Test)을 이용하여 유의성 검정을 실시하였다. 한편, 우수경제형질 DNA marker들을 K-평균 군집법을 실시 파악하였다. 이들 QTL과 K-평균법에 의해 한우의 염색체 6번 ILSTS035의 우수 DNA marker 235번을 선별하였다. 선별된 DNA Marker 235번을 출품우에 적용하여 Bootstrap 방법을 이용하여 신뢰구간을 구하여 검정하였다.

  • PDF

A Study on Research Paper Classification Using Keyword Clustering (키워드 군집화를 이용한 연구 논문 분류에 관한 연구)

  • Lee, Yun-Soo;Pheaktra, They;Lee, JongHyuk;Gil, Joon-Min
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.12
    • /
    • pp.477-484
    • /
    • 2018
  • Due to the advancement of computer and information technologies, numerous papers have been published. As new research fields continue to be created, users have a lot of trouble finding and categorizing their interesting papers. In order to alleviate users' this difficulty, this paper presents a method of grouping similar papers and clustering them. The presented method extracts primary keywords from the abstracts of each paper by using TF-IDF. Based on TF-IDF values extracted using K-means clustering algorithm, our method clusters papers to the ones that have similar contents. To demonstrate the practicality of the proposed method, we use paper data in FGCS journal as actual data. Based on these data, we derive the number of clusters using Elbow scheme and show clustering performance using Silhouette scheme.

A New Similarity Measure for Categorical Attribute-Based Clustering (범주형 속성 기반 군집화를 위한 새로운 유사 측도)

  • Kim, Min;Jeon, Joo-Hyuk;Woo, Kyung-Gu;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.71-81
    • /
    • 2010
  • The problem of finding clusters is widely used in numerous applications, such as pattern recognition, image analysis, market analysis. The important factors that decide cluster quality are the similarity measure and the number of attributes. Similarity measures should be defined with respect to the data types. Existing similarity measures are well applicable to numerical attribute values. However, those measures do not work well when the data is described by categorical attributes, that is, when no inherent similarity measure between values. In high dimensional spaces, conventional clustering algorithms tend to break down because of sparsity of data points. To overcome this difficulty, a subspace clustering approach has been proposed. It is based on the observation that different clusters may exist in different subspaces. In this paper, we propose a new similarity measure for clustering of high dimensional categorical data. The measure is defined based on the fact that a good clustering is one where each cluster should have certain information that can distinguish it with other clusters. We also try to capture on the attribute dependencies. This study is meaningful because there has been no method to use both of them. Experimental results on real datasets show clusters obtained by our proposed similarity measure are good enough with respect to clustering accuracy.

Analysis of Relative Settlement Behavior of Retaining Wall Backside Ground Using Clustering (군집분류를 이용한 흙막이 벽체 배면 지반의 상대적 침하거동 분석)

  • Young-Jun Kwack;Heui-Soo Han
    • The Journal of Engineering Geology
    • /
    • v.33 no.1
    • /
    • pp.189-200
    • /
    • 2023
  • As urbanization and industrialization increase development in downtown areas, damage due to ground settlement continues to occur. Building collapse in urban has a high risk of leading to large-scale damage to life and property. However, there has rarely been studied on measurement data analysis methods when uneven loads are applied to the excavated ground and no prior knowledge of the ground. Accordingly, it was attempted to analyze the relative settlement behavior and correlation by processing the time-series surface settlement of construction sites in the urban. In this paper, the average index of difference in settlement and average of relative difference in settlement are defined and calculated, then plotted in the coordinate system to analyze the relative settlement behavior over time. In addition, since there was no prior knowledge of the ground, a standard to classify the clusters was needed, and the observation points were classified into using k-means clustering and Dunn Index. As a result of the analysis, it was confirmed that all the clusters moved to the stable region as the settlement amount converges. The clusters were segmented. Based on the analysis results, it was possible to distinguish between the independent displacement area and same behavior area by analyzing the correlation between measurement points. If possible to analyze the relative settlement behavior between the stations and classify the behavior areas, it can be helpful in settlement and stability management, such as uplift of the surrounding area, prediction of ground failure area, and prevention of activity failure.

Exploratory Analysis of Gene Expression Data Using Biplot (행렬도를 이용한 유전자발현자료의 탐색적 분석)

  • Park, Mi-Ra
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.355-369
    • /
    • 2005
  • Genome sequencing and microarray technology produce ever-increasing amounts of complex data that needs statistical analysis. Visualization is an effective analytic technique that exploits the ability of the human brain to process large amounts of data. In this study, biplot approach applied to microarray data to see the relationship between genes and samples. The supplementary data method to classify new sample to known category is suggested. The methods are validated by applying it to well known microarray data such as Golub et al.(1999), Alizadeh et al.(2000), Ross et al.(2000). The results are compared to the results of several clustering methods. Modified graph which combine partitioning method and biplot is also suggested.

Clustering analysis of Korea's meteorological data (우리나라 기상자료에 대한 군집분석)

  • Yeo, In-Kwon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.941-949
    • /
    • 2011
  • In this paper, 72 weather stations in Korea are clustered by the hierarchical agglomerative procedure based on the average linkage method. We compare our clusters and stations divided by mountain chains which are applied to study on the impact analysis of foodborne disease outbreak due to climate change.