• Title/Summary/Keyword: 군집분석자료

Search Result 1,008, Processing Time 0.021 seconds

Application of Multivariate Statistical Analysis Technique in Landfill Investigation (매립물 특성 조사를 위한 다변량 통계분석 기법의 응용)

  • Kwon, Byung-Doo;Kim, Cha-Soup
    • Journal of the Korean earth science society
    • /
    • v.18 no.6
    • /
    • pp.515-521
    • /
    • 1997
  • To investigate the nature of the waste materials in the Nanjido Landfill, we have conducted multivariate statistical analysis of geophysical data set comprised of magnetic, gravity, LandSat TM thermal band and surface depression measurement data. Because these data sets show different responses to the depth, we have transformed the observed total field magnetic data and gravity data to the residual reduced-to-pole(RTP) magnetic anomalies and the three dimensional density anomalies, respectively, and utilized the informations about the upper shallow part of the landfills only in the following process. For the statistical analysis at the points of depression measurement, the magnetic, density and LandSat data values at these points are determined by interpolation process. Since the multivarite statistical analysis technique utilizes a clustering algorithm for classification of data set and we have measured the dissimilarity between objects by using Euclidean distance, standardization was applied prior to distance calculation in order to eliminate any scaling effects due to different measurement unit of each data set. The hierarchial grouping technique was used to construct the dendrogram. The optimum number of statistical groups(clusters), which are classified on the basis of geophysical and geotechnical characteristics, appeared to be six on the resulting dendrogram. The result of this study suggests that the dimension and nature of the multicomponent waste landfills can be identified by application of the multivarite statistical analysis technique to integrated geophysical data sets.

  • PDF

Classification of Terrestrial LiDAR Data Using Factor and Cluster Analysis (요인 및 군집분석을 이용한 지상 라이다 자료의 분류)

  • Choi, Seung-Pil;Cho, Ji-Hyun;Kim, Yeol;Kim, Jun-Seong
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.4
    • /
    • pp.139-144
    • /
    • 2011
  • This study proposed a classification method of LIDAR data by using simultaneously the color information (R, G, B) and reflection intensity information (I) obtained from terrestrial LIDAR and by analyzing the association between these data through the use of statistical classification methods. To this end, first, the factors that maximize variance were calculated using the variables, R, G, B, and I, whereby the factor matrix between the principal factor and each variable was calculated. However, although the factor matrix shows basic data by reducing them, it is difficult to know clearly which variables become highly associated by which factors; therefore, Varimax method from orthogonal rotation was used to obtain the factor matrix and then the factor scores were calculated. And, by using a non-hierarchical clustering method, K-mean method, a cluster analysis was performed on the factor scores obtained via K-mean method as factor analysis, and afterwards the classification accuracy of the terrestrial LiDAR data was evaluated.

Application of Beta Diversity to Analysis the Fish Community Structure in Stream (베타다양성 개념의 적용을 통한 청계천 어류 군집 특성 분석)

  • Kim, Dong-Hwan;Lee, Wan-Ok;Hong, Yang-Ki;Jeon, Hyoung-Joo;Kim, Kyung-Hwan;Kang, Hyejin;Song, Mi-Young
    • Korean Journal of Ecology and Environment
    • /
    • v.52 no.3
    • /
    • pp.274-283
    • /
    • 2019
  • Beta diversity is an efficient means of assessing the spatial variation in community composition among sites. To present fish community variation and LCBD (Local Contribution to Beta Diversity) among sites in stream, 6 sampling sites were selected in Cheonggye stream. Fish communities, environmental and habitat variables were collected at sites from April 2014 to October 2015. We used the total variance of the fish community data table (site-by-species community table) based on different forms, presence-absence, abundance, and Hellinger transformation, to estimate and compare beta diversity and LCBD. Fish community data table transformed by Hellinger distance showed the higher values of beta diversity than presence-absence and abundance data table. A similar patterns of LCBD were observed with presence-absence and Hellinger transformed data table. Low value of beta diversity calculated by community data table with abundance was due to the non-normality of fish assemblage data. Additionally, correlation coefficients were calculated to evaluate the relationships among LCBD, community indices and physicochemical variables. LCBD showed negative correlation coefficients with Shannon diversity. Overall, application of beta diversity analysis is an efficient method of addressing spatial variation of fish communities and ecological uniqueness of the sites in stream.

A Comparative Study on Statistical Clustering Methods and Kohonen Self-Organizing Maps for Highway Characteristic Classification of National Highway (일반국도 도로특성분류를 위한 통계적 군집분석과 Kohonen Self-Organizing Maps의 비교연구)

  • Cho, Jun Han;Kim, Seong Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.3D
    • /
    • pp.347-356
    • /
    • 2009
  • This paper is described clustering analysis of traffic characteristics-based highway classification in order to deviate from methodologies of existing highway functional classification. This research focuses on comparing the clustering techniques performance based on the total within-group errors and deriving the optimal number of cluster. This research analyzed statistical clustering method (Hierarchical Ward's minimum-variance method, Nonhierarchical K-means method) and Kohonen self-organizing maps clustering method for highway characteristic classification. The outcomes of cluster techniques compared for the number of samples and traffic characteristics from subsets derived by the optimal number of cluster. As a comprehensive result, the k-means method is superior result to other methods less than 12. For a cluster of more than 20, Kohonen self-organizing maps is the best result in the cluster method. The main contribution of this research is expected to use important the basic road attribution information that produced the highway characteristic classification.

Cluster and Factor Analyses Using Water Quality Data in the Sapkyo Reservoir Watershed (삽교호유역의 수질자료를 이용한 군집분석 및 요인분석)

  • Rim, Chang-Soo;Shin, Jae-Ki
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2004.05b
    • /
    • pp.937-941
    • /
    • 2004
  • 삽교호유역에 위치한 19개 수질관측지점에서 측정된 월별수질자료를 이용하여 수질관측지점을 2개에서 7개의 수질특성으로 분류하였으며, 그에 따른 수질요인분석을 실시하였다. 군집분석결과 삽교호유역의 각 하천은 개개의 수질특성을 보이고 있으며, 삽교호, 삽교천, 무한천 및 곡교천의 4개 그룹으로 나눌 수가 있었다. 수질분석결과에 의하면 삽교호에서는 부유물질의 농도가 다른 하천보다 높았는데 이는 하천으로부터 유입되는 풍부한 영양염에 의한 식물플랑크톤의 생물량 증가에 따른 것으로 사료된다. 또한 곡교천의 수질은 다른 하천에 비해 생화학적산소요구량은 $3.5\~4.8$배, 화학적산소요구량은 $1.7\~2.5$배 높았으며, 전반적으로 삽교호 유역의 수질은 부영양상대를 훨씬 초과하였다. 요인분석결과 삽교천과 무한천은 농경지와 주거지에 의한 수질요인이 지배적이었고, 곡교천은 천안도시지역으로부터 유입되는 과다한 유기물유입과 상류에 위치한 하수 처리장의 영향을 복합적으로 받고 있는 것으로 사료된다. 삽교호의 수질은 삽교천과 무한천 및 곡교천에서 높은 부하를 보인 인자가 주된 오염요인으로 나타났다.

  • PDF

Categorical time series clustering: Case study of Korean pro-baseball data (범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구)

  • Pak, Ro Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.621-627
    • /
    • 2016
  • A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

Spatial analysis of water shortage areas in South Korea considering spatial clustering characteristics (공간군집특성을 고려한 우리나라 물부족 핫스팟 지역 분석)

  • Lee, Dong Jin;Kim, Tae-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.2
    • /
    • pp.87-97
    • /
    • 2024
  • This study analyzed the water shortage hotspot areas in South Korea using spatial clustering analysis for water shortage estimates in 2030 of the Master Plans for National Water Management. To identify the water shortage cluster areas, we used water shortage data from the past maximum drought (about 50-year return period) and performed spatial clustering analysis using Local Moran's I and Getis-Ord Gi*. The areas subject to spatial clusters of water shortage were selected using the cluster map, and the spatial characteristics of water shortage areas were verified based on the p-value and the Moran scatter plot. The results indicated that one cluster (lower Imjin River (#1023) and neighbor) in the Han River basin and two clusters (Daejeongcheon (#2403) and neighbor, Gahwacheon (#2501) and neighbor) in the Nakdong River basin were found to be the hotspot for water shortage, whereas one cluster (lower Namhan River (#1007) and neighbor) in the Han River Basin and one cluster (Byeongseongcheon (#2006) and neighbor) in the Nakdong River basin were found to be the HL area, which means the specific area have high water shortage and neighbor have low water shortage. When analyzing spatial clustering by standard watershed unit, the entire spatial clustering area satisfied 100% of the statistical criteria leading to statistically significant results. The overall results indicated that spatial clustering analysis performed using standard watersheds can resolve the variable spatial unit problem to some extent, which results in the relatively increased accuracy of spatial analysis.

A Comparison of cluster analysis based on profile of LPGA player profile in 2009 (2009년 여자프로골프선수 프로파일을 이용한 군집방법비교)

  • Min, Dae-Kee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.471-480
    • /
    • 2010
  • Cluster analysis is one of the useful methods to find out number of groups and member’s belongings. With the rapid development of computer application in statistics, variety of new methods in clustering analysis were studied such as EM algorism and Self organization maps. The goals of cluster analysis is finding the number of groupings that are meaningful to me. If data are analyzed perfectly with cluster analysis, we can get the same results from discernment analysis.

머신러닝을 위한 베이지안 방법론: 군집분석을 중심으로

  • Kim, Yong-Dae;Jeong, Gu-Hwan
    • Information and Communications Magazine
    • /
    • v.33 no.10
    • /
    • pp.60-64
    • /
    • 2016
  • 본고에서는 베이지안 기계학습 방법론에 대해서 간략히 살펴본다. 특히, 복잡한 자료들 사이의 관계를 규명하는 것이 목적이며 비지도학습(unsupervised learning)의 한 분야인 군집분석에서 베이지안 방법론들이 어떻게 사용되어지는지를 설명한다. 군집의 수를 사전에 아는 경우에 사용되는 모수적 베이지안 방법을 간단하게 설명하고, 군집의 수까지 추론 할 수 있는 비모수 베이지안방법에 대해서 자세하게 다룬다.

A Divisive Clustering for Mixed Feature-Type Symbolic Data (혼합형태 심볼릭 데이터의 군집분석방법)

  • Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1147-1161
    • /
    • 2015
  • Nowadays we are considering and analyzing not only classical data expressed by points in the p-dimensional Euclidean space but also new types of data such as signals, functions, images, and shapes, etc. Symbolic data also can be considered as one of those new types of data. Symbolic data can have various formats such as intervals, histograms, lists, tables, distributions, models, and the like. Up to date, symbolic data studies have mainly focused on individual formats of symbolic data. In this study, it is extended into datasets with both histogram and multimodal-valued data and a divisive clustering method for the mixed feature-type symbolic data is introduced and it is applied to the analysis of industrial accident data.