• Title/Summary/Keyword: 나무형 군집화

Search Result 8, Processing Time 0.028 seconds

연속형 자료에 대한 나무형 군집화

  • Heo, Myeong-Hui;Yang, Gyeong-Suk
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.05a
    • /
    • pp.49-51
    • /
    • 2005
  • 본 연구는 반복분할(recursive partitioning)에 의한 군집화 방법을 제안하고 활용 예를 제시한다. 이 방법은 나무 형태의 해석하기 쉬운 단순한 규칙을 제공하면서 동시에 변수선택기능을 제공한다.

  • PDF

Tree-structured Clustering for Continuous Data (연속형 자료에 대한 나무형 군집화)

  • Huh Myung-Hoe;Yang Kyung-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.661-671
    • /
    • 2005
  • The aim of this study is to propose a clustering method, called tree-structured clustering, by recursively partitioning continuous multivariate dat a based on overall $R^2$ criterion with a practical node-splitting decision rule. The clustering method produces easily interpretable clustering rules of tree types with the variable selection function. In numerical examples (Fisher's iris data and a Telecom case), we note several differences between tree-structured clustering and K-means clustering.

Tree-structured Clustering for Mixed Data (혼합형 데이터에 대한 나무형 군집화)

  • Yang Kyung-Sook;Huh Myung-Hoe
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.271-282
    • /
    • 2006
  • The aim of this study is to propose a tree-structured clustering for mixed data. We suggest a scaling method to reduce the variable selection bias among categorical variables. In numerical examples such as credit data, German credit data, we note several differences between tree-structured clustering and K-means clustering.

Categorical time series clustering: Case study of Korean pro-baseball data (범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구)

  • Pak, Ro Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.621-627
    • /
    • 2016
  • A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

Discretization of continuous-valued attributes considering data distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • 이상훈;박정은;오경환
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.217-220
    • /
    • 2003
  • 본 논문에서는 특정 매개변수의 입력 없이 속성(attribute)에 따른 목적속성(class)값의 분포를 고려하여 연속형(conti-nuous) 값을 범주형(categorical)의 형태로 변환시키는 새로운 방법을 제안하였다. 각각의 속성에 대해 목적속성의 분포를 1차원 공간에 사상(mapping)하고, 각 목적속성의 밀도, 다른 목적속성과의 중복 정도 등의 기준에 따라 구간을 군집화 한다. 이렇게 생성된 군집들은 각각 목적속성을 예측할 수 있는 확률적 수치에 기반한 것으로, 각 속성이 제공하는 정보의 손실을 최소화하는 이산화 경계선을 갖고 있다. 제안된 데이터 이산화 방법의 향상된 성능은 C4.5 알고리즘과 UCI Machine Learning Data Repository 데이터를 사용하여 확인할 수 있다.

  • PDF

Hwasan Wetland Vegetation in Gunwi, South Korea: with a Phytosociological Focus on Alder (Alnus japonica (Thunb.) Steud.) Forests (군위군 화산습지의 식생: 오리나무림을 중심으로)

  • Kim, Jong-Won;Lee, Seung-eun;Lee, Jung-a
    • Korean Journal of Ecology and Environment
    • /
    • v.50 no.1
    • /
    • pp.70-78
    • /
    • 2017
  • The Hwasan wetland vegetation is observed at mountain basin (644~780 m a.s.l.) where had become a potential land for indigenous people since prehistoric period. We phytosociologically investigated old-growth alder (Alnus japonica) forests using the $Z\ddot{u}rich$-Montpellier School's method and analyzed their spatial distribution pattern by actual vegetation map. Species performance was determined by using coverage and r-NCD. Viburnum opulus var. calvescens-Alnus japonica community syntaxonomically belonging to the Alnetea japonicae was first described and composed of three subunits: Salix koreensis subcommunity, typical subcommunity, and Pyrus ussuriensis subcommunity. Present plant community was compared with vicariant syntaxa such as Molinia japonica-Alnus japonica community, Rhamno nipponicae-Alnetum japonicae, and Aceri-Salicetum koreensis. Hwasan's alder forest, an alluvial terrace vegetation type on valley fan in the montane zone, is evaluated as vegetation class [I], which is a sort of benchmark plant community potentially on mountain wetlands in southeastern part of the Korean Peninsula. Simultaneously we suggested an establishment of the national strategy for habitat conservation free from hydrologically radical transform due to military utilization.

Community Structure and Ecological Characteristics of Berchemia berchemiaefolia Stands at Mt. Naeyon (내연산 망개나무 임분의 군집구조와 생태적 특성)

  • Yong Sik, Hong;I-Seul, Yun;Dong Pil, Jin;Chan Beom, Kim;Hak Koo, Kim;Jin Woo, Lee;Shin Koo, Kang
    • Journal of Korean Society of Forest Science
    • /
    • v.111 no.4
    • /
    • pp.538-547
    • /
    • 2022
  • In this study, the population and community structure of Berchemia berchemiaefolia stands located at Mt. Naeyon (Gyeongbuk, Korea) were quantified, and multivariate analysis was done to determine the correlations between vegetation group types and environmental factors and to have reference data for the conservation and restoration of this species. In total, there were 164 B. berchemiaefolia trees in Mt. Naeyon. The average DBH of the trees was 24.5 cm, forming a normal distribution. It rarely appeared in an understory vegetation height of 3 m. About37.1% of the trees were branched. B. berchemiaefolia stands were classified into two groups: B. berchemiaefolia-Quercus serrata community and B. berchemiaefolia-Carpinus laxiflora community. Canopy gap, organic matter, exchangeable Ca, and cation exchange capacity were the major site characteristics affecting the distribution pattern of the stands. Currently, B. berchemiaefolia trees dominate in Mt. Naeyon, but depending on different habitat positions, the species was in a natural successional stage to C. laxiflora or C. cordata, which is a shade-tolerant species.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.