• Title/Summary/Keyword: CHAID

Search Result 75, Processing Time 0.023 seconds

CHAID Algorithm by Cube-based Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.239-247
    • /
    • 2003
  • Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, etc. CHAID(Chi-square Automatic Interaction Detector), is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose and CHAID algorithm by cube-based sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

Selecting variables for evidence-diagnosis of paralysis disease using CHAID algorithm

  • Shin, Yan-Kyu
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.76-78
    • /
    • 2001
  • Variable selection in oriental medical research is considered. Decision tree analysis algorithms such as CHAID, CART, C4.5 and QUEST have been successfully applied to a medical research. Paralysis disease is a highly dangerous and murderous disease which accompanied with a great deal of severe physical handicap. In this paper, we explore the use of CHAID algorithm for selecting variables for evidence-diagnosis of paralysis, disease. Empirical results comparing our proposed method to the method using Wilks $\lambda$ given.

  • PDF

Exploration of CHAID Algorithm by Sampling Proportion

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.215-228
    • /
    • 2003
  • Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, interaction effect identification, category merging and discretizing continuous variable, etc. CHAID(Chi-square Automatic Interaction Detector), is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. CHAID modeling selects a set of predictors and their interactions that optimally predict the dependent measure. In this paper we explore CHAID algorithm in view of accuracy and speed by sampling proportion.

  • PDF

CHAID Algorithm by Cube-based Proportional Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.04a
    • /
    • pp.39-50
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. CHAID(Chi-square Automatic Interaction Detector) uses the chi-squired statistic to determine splitting and is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose CHAID algorithm by cube-based proportional sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

CHAID Algorithm by Cube-based Proportional Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.803-816
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. CHAID uses the chi-squired statistic to determine splitting and is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose CHAID algorithm by cube-based proportional sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

A Feature Analysis of Industrial Accidents Using CHAID Algorithm (CHAID 알고리즘을 이용한 산업재해 특성분석)

  • Leem Young-Moon;Hwang Young-Seob
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.5
    • /
    • pp.59-67
    • /
    • 2005
  • The main objective of the statistical analysis about industrial accidents is to find out what is the dangerous factor in its own industrial field so that it is possible to prevent or decrease the number of the possible accidents by educating those who work in the fields for safety tools. However, so far, there is no technique of quantitative evaluation on danger. Almost all previous researches as to industrial accidents have only relied on the frequency analysis such as the analysis of the constituent ratio on accidents. As an application of data mining technique, this paper presents analysis on the efficiency of the CHAID algorithm to classify types of industrial accidents data and thereby identifies potential weak points in accident risk grouping.

An introductory study on the urban functions using CHAID technique (CHAID 技法에 의한 都市機能의 試論的 硏究)

  • ;Yang, Soon-Jeong
    • Journal of the Korean Geographical Society
    • /
    • v.29 no.3
    • /
    • pp.360-368
    • /
    • 1994
  • To this day, a number of quantitative analytical methods have been employed in clarifying regional characteristics in the discipline of geography. This paper attempted, as a part of application of those quantitative analyses, to make clear the urban functions and consequently the urban characteristics statistically by adopting newly-introduced CHAID, a sort of discriminant analyis technique. The processing of data was sonducted in two phases. To begin with, the urban functions were classified after designating twenty cities - the population of each city counting 250, 000 or more - as predictor variable, and at the same time four major urban functions like administration, marketing, finance and production as response variable. And then, preeminent functions of individual region were discriminated and concurrently classified by treating the remaining traffic, education, medicare, culture and transportation functions as predictor variable, and the following five regions as response variable: Metropolitan Seoul Area. Pusan region, Taegu region, Kwangju region and Chungcheong region. According to the result of this analysis, marketing and administration are emereed as meaningful functions in Seoul and Taegu respectively. As for the finance function only Pusan and Pucheon can be discriminated. Seoul, Pusan and Seongnam reveal their dominancy in production function. To take a look at the result of the latter analysis, the Metropolitan Seoul area shows, among other functions, strong traffic and finance functions. When it comes fo Pusan region, adminstration, education and finance functions are recorded as a leading ones, and Taegu region is preferable in education, medicare and transportation functions. In case of Kwangju region adminstration, production and education functions are discriminated from any other functions. Chungcheong region shows similar aspect with only traffic function replacing the production function of Kwangju region. Based on aforementioned anlysis, it can be said that the CHAID technique, which is capable of processing large amount of categorical data and, by presenting its outcome in the form of dendrogram, facilitates the interpretation work, is an effective, meaningful means to classify and discriminate certain geographical regions and their characteristics.

  • PDF

Development of Selection Model of Interchange Influence Area in Seoul Belt Expressway Using Chi-square Automatic Interaction Detection (CHAID) (CHAID분석을 이용한 나들목 주변 지가의 공간분포 영향모형 개발 - 서울외곽순환고속도로를 중심으로 -)

  • Kim, Tae Ho;Park, Je Jin;Kim, Young Il;Rho, Jeong Hyun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.6D
    • /
    • pp.711-717
    • /
    • 2009
  • This study develops model for analysis of relationship between major node (Interchange in expressway) and land price formation of apartments along with Seoul Belt Expressway by using CHAID analysis. The results show that first, regions(outer side: Gyeongido, inner side: Seoul) on the line of Seoul Belt Expressway are different and a graph generally show llinear relationships between land price and traffic node but it does not; second, CHAID analysis shows two different spatial distribution at the point of 2.6km in the outer side, but three different spatial distribution at the point of 1.4km and 3.8km in the inner side. In other words, traffic access does not necessarily guarantee high housing price since the graphs shows land price related to composite spatial distribution. This implies that residential environments (highway noise and regional discontinuity) and traffic accessibility cause mutual interaction to generate this phenomenon. Therefore, the highway IC landprice model will be beneficial for calculation of land price in New Town which constantly is being built along the highway.

A Study on Exploration of the Recommended Model of Decision Tree to Predict a Hard-to-Measure Mesurement in Anthropometric Survey (인체측정조사에서 측정곤란부위 예측을 위한 의사결정나무 추천 모형 탐지에 관한 연구)

  • Choi, J.H.;Kim, S.K.
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.5
    • /
    • pp.923-935
    • /
    • 2009
  • This study aims to explore a recommended model of decision tree to predict a hard-to-measure measurement in anthropometric survey. We carry out an experiment on cross validation study to obtain a recommened model of decision tree. We use three split rules of decision tree, those are CHAID, Exhaustive CHAID, and CART. CART result is the best one in real world data.

Development of Selection Model of Subway Station Influence Area (SIA) in Seoul City using Chi-square Automatic Interaction Detection (CHAID) (CHAID분석을 이용한 서울시 지하철 역세권 지가 영향모형 개발)

  • Choi, Yu-Ran;Kim, Tae-Ho;Park, Jung-Soo
    • Journal of the Korean Society for Railway
    • /
    • v.11 no.5
    • /
    • pp.504-512
    • /
    • 2008
  • In general, based on criteria of subway law, radius 500m from subway station is defined as SIA (Subway Station Influence Area). Therefore, in this paper, selection models of SIA are developed to identify appropriate SIA for specific legions in Seoul metropolitan city based on CHAID analysis. As a result, following outputs are obtained; (1) walking distance from subway station is the most influential factor to define SIA (2) SIAs vary with regions (i. e. Gangnam area: 767m, Gangbuk area: 452m), and (3) walking distance from subway station is influential to land price of SIA. In addition, in Gangnam, the structure of land price of the closest section has a polynomial trend curve rather than linear compared in comparison with other sections. Therefore, it is desirable for current definition of SIA (radius 500m from subway station) to be redefined to reflect characteristics of land use and walking distance according to each region respectively.