• Title/Summary/Keyword: 군집 분석

Search Result 3,858, Processing Time 0.033 seconds

Clustering Validity Assessment Using Relative Criteria for finding Optimal Clusters (최적의 군집을 찾기 위한 상대적 군집 평가 방법)

  • 김영옥;이수원
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.334-336
    • /
    • 2002
  • 군집 분석은 데이터의 속성을 분석하여 서로 유사한 패턴을 가진 데이터를 묶는 방법이다. 군집 분석은 많은 응용 분야에서 쓰이고 있으나, 수행된 군집 분석 결과가 과연 정확한 결과이고 의미 있는 결과인지를 평가하는데 어려움이 있다. 본 논문에서는 군집이 형성된 데이터를 분석하여 군집 분석 결과를 평가하는 상대적 군집 평가 방법을 제안한다. 본 논문에서는 상대적 군집 평가 방법의 인덱스를 정의하고 형성된 군집 분석 결과에 적용해 최적의 군집, 의미 있는 군집을 찾을 수 있음을 보인다. 또한 실험을 통해 제안한 인덱스의 적합성을 보이며, 제안한 인덱스가 기존의 인덱스에 비해 최적의 군집, 의미 있는 군집을더 잘 찾을 수 있음을 보인다.

  • PDF

Automated K-Means Clustering and R Implementation (자동화 K-평균 군집방법 및 R 구현)

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.723-733
    • /
    • 2009
  • The crucial problems of K-means clustering are deciding the number of clusters and initial centroids of clusters. Hence, the steps of K-means clustering are generally consisted of two-stage clustering procedure. The first stage is to run hierarchical clusters to obtain the number of clusters and cluster centroids and second stage is to run nonhierarchical K-means clustering using the results of first stage. Here we provide automated K-means clustering procedure to be useful to obtain initial centroids of clusters which can also be useful for large data sets, and provide software program implemented using R.

The Analysis of the Forest Community Structure of Mt. Minjuji (민주지산의 산림군집구조분석)

  • 최송현;조현서;이경재
    • Korean Journal of Environment and Ecology
    • /
    • v.11 no.1
    • /
    • pp.111-125
    • /
    • 1997
  • To investigate the climax forest structure and to construct the ecological basic data, forty nine plots were set up and surveyed in Mt. Minjuji, Chungchongpukdo. According to the analysis of classification by TWINSPAN, the community was divided by seven groups of Pinus densiflora-Carpinus laxiflora-Quercus serrata(community I), Q. mongolica-Q. serrata-Platycarya strobilacea(community II), Q. mongolica(community III), Fraxinus mandshurica-Acer mono(community IV), Cornus controversa-F. mandshurica(community V), F. mandshurica-Carpinus cordata(community VI), and F. mandshurica-C. laxiflora(community VII). In the results of the analysis of species structure, similarity, diversity and DBH, except for community I~III, it was founede out broadleaves-mixed-climax forest. Constructed basic data will be applied to sustainable development such as ecotourism, nature trail etc.

  • PDF

Market Segmentation on Recreational Forest Visitors by Cluster Analysis (군집분석을 통한 자연휴양림 이용객의 시장세분화)

  • Shin, Hyun-Kyu;Shin, Hong-Chul
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.3
    • /
    • pp.364-372
    • /
    • 2010
  • The purpose of this study is to segment recreational forest's visitors for marketing based on purpose of visit. Using the factor analysis, cluster analysis, cross tab, and t-test to find out different behavioral intention in each clusters, the result elicited some implications. First, 2 clusters was founded and has difference in behavioral intentions. Cluster 1(married, 200~300hundred won income) has higher satisfaction, revisit intention, recommendation intention. The result shows that market researcher in recreational forest should approach different marketing strategy and has various facility, active program. This research need to survey broad region to generalized result.

Probabilistic reduced K-means cluster analysis (확률적 reduced K-means 군집분석)

  • Lee, Seunghoon;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.905-922
    • /
    • 2021
  • Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.

데이터 마이닝에서의 군집분석 알고리즘 비교 연구

  • Lee, Yeong-Seop;An, Mi-Yeong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.05a
    • /
    • pp.19-25
    • /
    • 2003
  • 데이터베이스에 내재된 패턴이나 관계를 묘사한 것만으로도 의사결정에 필요한 정보를 제공할 수 있는데 이 데이터들의 변수들을 비슷한 특징을 가지는 소그룹으로 나누어 패턴을 찾는 것을 군집분석이라 한다. 이러한 군집 분석에는 분리군집방법과 계층적군집방법이 있는데, 재할당이 가능한 분리군집방법의 여러 알고리즘에 대해 비교해보자. 분리군집알고리즘에는 중심을 평균으로 하는 k-평균 알고리즘과, 중심을 메도이드로하는 PAM, CLARA, CLARANS 알고리즘이 있다. 이러한 알고리즘에 대한 이론과, 장단점을 설명하고, 분산과 중심들간의 평균 거리로 비교해 본다.

  • PDF

인위적 데이터를 이용한 군집분석 프로그램간의 비교에 대한 연구

  • 김성호;백승익
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.2
    • /
    • pp.35-49
    • /
    • 2001
  • Over the years, cluster analysis has become a popular tool for marketing and segmentation researchers. There are various methods for cluster analysis. Among them, K-means partitioning cluster analysis is the most popular segmentation method. However, because the cluster analysis is very sensitive to the initial configurations of the data set at hand, it becomes an important issue to select an appropriate starting configuration that is comparable with the clustering of the whole data so as to improve the reliability of the clustering results. Many programs for K-mean cluster analysis employ various methods to choose the initial seeds and compute the centroids of clusters. In this paper, we suggest a methodology to evaluate various clustering programs. Furthermore, to explore the usability of the methodology, we evaluate four clustering programs by using the methodology.

  • PDF

Plant Community Structure Analysis in Jujeongol Valley of Soraksan National Park (설악산 국립공원 주전골계곡 식물군집구조분석)

  • 이경재;민성환;한봉호
    • Korean Journal of Environment and Ecology
    • /
    • v.10 no.2
    • /
    • pp.283-296
    • /
    • 1997
  • To investigate the plant community structure in valley and suggest the management of Mational Park, fifty plots were set up and surveyed in Jujeongol Valley, Soraksan National Park. The classification by TWINSPAN and DCA ordination technique were applied to the study area in order to classify them into several groups based on woody plants. The dividing groups were Quercus mpnngolica - Q. variabilis - Pinus densiflora community, P. densiflora community, Carpinus laxiflora community, Q. serrata community. The ecological trends of tree species by DCA ordination technique and DBH class distribution analysis was like that Q. mongolica - Q. variabilis - P. densiflora community and P. densiflora community seems to be trended from P. densiflora community to Q. mongolica community. Q. serrata community seems to be trended from Q. serrata community to C. laxiflora community and C. laxiflora will be maintaimed stable state.

  • PDF

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

A Comparative Study on Statistical Clustering Methods and Kohonen Self-Organizing Maps for Highway Characteristic Classification of National Highway (일반국도 도로특성분류를 위한 통계적 군집분석과 Kohonen Self-Organizing Maps의 비교연구)

  • Cho, Jun Han;Kim, Seong Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.3D
    • /
    • pp.347-356
    • /
    • 2009
  • This paper is described clustering analysis of traffic characteristics-based highway classification in order to deviate from methodologies of existing highway functional classification. This research focuses on comparing the clustering techniques performance based on the total within-group errors and deriving the optimal number of cluster. This research analyzed statistical clustering method (Hierarchical Ward's minimum-variance method, Nonhierarchical K-means method) and Kohonen self-organizing maps clustering method for highway characteristic classification. The outcomes of cluster techniques compared for the number of samples and traffic characteristics from subsets derived by the optimal number of cluster. As a comprehensive result, the k-means method is superior result to other methods less than 12. For a cluster of more than 20, Kohonen self-organizing maps is the best result in the cluster method. The main contribution of this research is expected to use important the basic road attribution information that produced the highway characteristic classification.