• Title/Summary/Keyword: 군집 자료

Search Result 1,192, Processing Time 0.02 seconds

A Comparative Study of Determining the Number of Clusters with a Method Proposed (군집수의 예측에 관한 방법의 제안 및 비교)

  • Chae, Seong-San;Lim, Nam-Kyoo
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.329-341
    • /
    • 2005
  • A method of determining the number of clusters is proposed based on some asymptotic results on the Rand's(1971} $C_k$, k = 2, 3, . . ., N - 1, statistic. Simulation is conducted to compare the proposed method with Chae and Warde(1991), and Huh and Lee(2004).

Tree-structured Clustering for Continuous Data (연속형 자료에 대한 나무형 군집화)

  • Huh Myung-Hoe;Yang Kyung-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.661-671
    • /
    • 2005
  • The aim of this study is to propose a clustering method, called tree-structured clustering, by recursively partitioning continuous multivariate dat a based on overall $R^2$ criterion with a practical node-splitting decision rule. The clustering method produces easily interpretable clustering rules of tree types with the variable selection function. In numerical examples (Fisher's iris data and a Telecom case), we note several differences between tree-structured clustering and K-means clustering.

Cluster Analysis of Snowfall Observatory Using K-means Algorithm (K-평균 알고리즘을 이용한 적설관측소 군집분석)

  • Lee, Munseok;Chung, Gunhui
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.412-412
    • /
    • 2018
  • 최근 지구온난화의 영향으로 겨울철 한파를 야기하는 일이 잦아지고 있다. 우리나라에도 그 영향으로 매년 겨울 한파가 지속되고 있다. 그러므로 겨울철 적설량을 기록하고 갑작스러운 재난에 대비하는 것은 지구온난화의 또 다른 숙제가 되었다. 우리나라는 전통적으로 폭설 피해가 크지 않았기 때문에 적설관측소의 수가 강우관측소에 비해 현저히 적다. 그리하여 추가적인 적설관측소의 설치가 필요하다고 판단되지만, 이에 앞서 우리나라의 현재 적설관측소의 분포현황을 분석하였다. 1월, 2월, 12월의 최대 최심신적설량과 관측소 고도자료를 K-평균 알고리즘의 4개의 변수로 사용하였으며, 전국에서 총 94개의 적설관측소를 자료보유기간으로 분류하여 군집분석을 수행하였다. 군집분석 결과 서해안지역, 태백 소백산맥을 따라 존재하는 내륙산악지역, 경상도와 남해안 그리고 제주도지역, 울릉도와 대관령으로 군집이 형성되었다. 또한, 제주도의 적설관측소가 해안가 위주로 설치되어있어, 비교적 눈이 많이 오는 한라산 산간지역에 추가적인 적설관측소 설치가 고려되어야 할 것이다.

  • PDF

A comparison analysis of factors to affect pedestrian volumes by land-use type using Seoul Pedestrian Survey data (토지이용유형별 보행량 영향 요인 비교·분석 - 서울시 유동인구 조사자료를 바탕으로)

  • Jang, Jin-Young;Choi, Sung-Taek;Lee, Hyang-Sook;Kim, Su-Jae;Choo, Sang-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.14 no.2
    • /
    • pp.39-53
    • /
    • 2015
  • The paper analyzes factors to affect pedestrian volumes by land-use type using 2012 Seoul Pedestrian Survey. First of all, five groups were classified based on land-use types around survey points such as residential, commercial, industrial and green uses, using k-average cluster analysis. Then, differences in average pedestrian volumes by group were compared for a day and time of day. In addition, multiple regression analysis was employed to identify factors to affect pedestrian volumes, considering physical features, land use types, public transportation accessibility, and socio-economic indices as independent variables by spatial hierarchy. Model results show that the walkway width positively influenced on pedestrian volumes for all groups, whereas other variables differently affected by group. Our results can be used as basic data for establishing polices with respect to pedestrian road design and improvement as well as estimating pedestrian demand by land-use type.

Cure Rate Model with Clustered Interval Censored Data (군집화된 구간 중도절단자료에 대한 치유율 모형의 적용)

  • Kim, Yang-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.1
    • /
    • pp.21-30
    • /
    • 2014
  • Ordinary survival analysis cannot be applied when a significant fraction of patients may be cured. A cure rate model is the combination of cure fraction and survival model and can be applied to several types of cancer. In this article, the cure rate model is considered in the interval censored data with a cluster effect. A shared frailty model is introduced to characterize the cluster effect and an EM algorithm is used to estimate parameters. A simulation study is done to evaluate the performance of estimates. The proposed approach is applied to the smoking cessation study in which the event of interest is a smoking relapse. Several covariates (including intensive care) are evaluated to be effective for both the occurrence of relapse and the smoke quitting duration.

Kernel Pattern Recognition using K-means Clustering Method (K-평균 군집방법을 이요한 가중커널분류기)

  • 백장선;심정욱
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.447-455
    • /
    • 2000
  • We propose a weighted kernel pattern recognition method using the K -means clustering algorithm to reduce computation and storage required for the full kernel classifier. This technique finds a set of reference vectors and weights which are used to approximate the kernel classifier. Since the hierarchical clustering method implemented in the 'Weighted Parzen Window (WP\V) classifier is not able to rearrange the proper clusters, we adopt the K -means algorithm to find reference vectors and weights from the more properly rearranged clusters \Ve find that the proposed method outperforms the \VP\V method for the repre~entativeness of the reference vectors and the data reduction.

  • PDF

The Analysis of the Forest Community Structure of Mt. Minjuji (민주지산의 산림군집구조분석)

  • 최송현;조현서;이경재
    • Korean Journal of Environment and Ecology
    • /
    • v.11 no.1
    • /
    • pp.111-125
    • /
    • 1997
  • To investigate the climax forest structure and to construct the ecological basic data, forty nine plots were set up and surveyed in Mt. Minjuji, Chungchongpukdo. According to the analysis of classification by TWINSPAN, the community was divided by seven groups of Pinus densiflora-Carpinus laxiflora-Quercus serrata(community I), Q. mongolica-Q. serrata-Platycarya strobilacea(community II), Q. mongolica(community III), Fraxinus mandshurica-Acer mono(community IV), Cornus controversa-F. mandshurica(community V), F. mandshurica-Carpinus cordata(community VI), and F. mandshurica-C. laxiflora(community VII). In the results of the analysis of species structure, similarity, diversity and DBH, except for community I~III, it was founede out broadleaves-mixed-climax forest. Constructed basic data will be applied to sustainable development such as ecotourism, nature trail etc.

  • PDF

Adjustment of the Mean Field Rainfall Bias by Clustering Technique (레이더 자료의 군집화를 통한 Mean Field Rainfall Bias의 보정)

  • Kim, Young-Il;Kim, Tae-Soon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.8
    • /
    • pp.659-671
    • /
    • 2009
  • Fuzzy c-means clustering technique is applied to improve the accuracy of G/R ratio used for rainfall estimation by radar reflectivity. G/R ratio is computed by the ground rainfall records at AWS(Automatic Weather System) sites to the radar estimated rainfall from the reflectivity of Kwangduck Mt. radar station with 100km effective range. G/R ratio is calculated by two methods: the first one uses a single G/R ratio for the entire effective range and the other two different G/R ratio for two regions that is formed by clustering analysis, and absolute relative error and root mean squared error are employed for evaluating the accuracy of radar rainfall estimation from two G/R ratios. As a result, the radar rainfall estimated by two different G/R ratio from clustering analysis is more accurate than that by a single G/R ratio for the entire range.

An Analysis of Food Consumption Patterns of the Elderly from the Korea National Health and Nutrition Examination Survey (KNHANES Ⅴ-1) (2010년 국민건강영양조사(제5기 1차년도) 자료를 이용한 노인들의 식품섭취 패턴 분석)

  • Kim, Eun Mi;Choi, Mi-Kyung
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.42 no.5
    • /
    • pp.818-827
    • /
    • 2013
  • The purpose of this study was to identify food consumption patterns of the elderly and factors affecting them to improve their dietary health. Data from 1,172 elderly subjects (over 65 years old) from the fifth Korea National Health and Nutrition Examination Survey (KNHANES V-1) were used in our analysis. Validity and reliability analyses of food consumption frequency allowed the identification of seven factors: fruits, foods for Korean style meal, instant foods, alcohols, carbohydrate-rich snacks, vegetables, and legumes/mixed grains. Food consumption patterns were classified into four groups (according to the food consumption frequency) using cluster analysis. Cluster 4 showed a significantly higher food consumption frequency and Cluster 3 had a relatively high overall food consumption frequency but lower alcohol consumption frequency compared to the other clusters. Cluster 2 was characterized by a generally low food consumption frequency but a significantly higher alcohol consumption frequency. Cluster 1 showed a generally low food consumption frequency; however, the consumption frequency of legumes/mixed grains was higher than Cluster 2. Further analysis showed that the food consumption patterns of the elderly were affected by variables such as gender, age, town, economic status, education level, family type, and frequency of eating out. We conclude that a proper nutritional education program should be conducted to address specific dietary problems for each elderly segment.

A Study on the Relationship between Skill and Competition Score Factors of KLPGA Players Using Canonical Correlation Biplot and Cluster Analysis (정준상관 행렬도와 군집분석을 응용한 KLPGA 선수의 기술과 경기성적요인에 대한 연관성 분석)

  • Choi, Tae-Hoon;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.429-439
    • /
    • 2008
  • Canonical correlation biplot is 2-dimensional plot for investigating the relationship between two sets of variables and the relationship between observations and variables in canonical correlation analysis graphically. In general, biplot is useful for giving a graphical description of the data. However, this general biplot and also canonical correlation biplot do not give some concise interpretations between variables and observations when the number of observations are large. Recently, for overcoming this problem, Choi and Kim (2008) suggested a method to interpret the biplot analysis by applying the K-means clustering analysis. Therefore, in this study, we will apply their method for investigating the relationship between skill and competition score factors of KLPGA players using canonical correlation biplot and cluster analysis.