• Title/Summary/Keyword: Chi-Squared Automatic Interaction Detection (CHAID)

Search Result 7, Processing Time 0.026 seconds

A Combinatorial Optimization for Influential Factor Analysis: a Case Study of Political Preference in Korea

  • Yun, Sung Bum;Yoon, Sanghyun;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.5
    • /
    • pp.415-422
    • /
    • 2017
  • Finding influential factors from given clustering result is a typical data science problem. Genetic Algorithm based method is proposed to derive influential factors and its performance is compared with two conventional methods, Classification and Regression Tree (CART) and Chi-Squared Automatic Interaction Detection (CHAID), by using Dunn's index measure. To extract the influential factors of preference towards political parties in South Korea, the vote result of $18^{th}$ presidential election and 'Demographic', 'Health and Welfare', 'Economic' and 'Business' related data were used. Based on the analysis, reverse engineering was implemented. Implementation of reverse engineering based approach for influential factor analysis can provide new set of influential variables which can present new insight towards the data mining field.

A Study on Travel Pattern Analysis and Political Application using Transportation Card Data: In Gyeonggi-Do Case (교통카드자료를 이용한 통행패턴분석과 정책활용방안 연구 -경기도를 중심으로-)

  • Bin, Miyoung;Moon, Juback;Joh, Chang-Hyeon
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.615-627
    • /
    • 2012
  • This study analyzed the travel pattern with respect to use of public transportation by using transportation card data and presented the measures that can be used in a traffic policy. Transportation card data targeted Gyeonggi-Do area and as a utilization plan, a scenario that when a traffic policy decision maker improves bus stop facilities, the person selects a target site by using several variables that can be obtained from transportation card data was set and analyzed. The analysis result showed that K means cluster analysis which is decision making methodology and CHAID(Chi-squared automatic interaction detection) were used and it can be used usefully in policies in significance level of p <0.01. Also, based on these results, this study presented policy implications to be improved to actually use transportation card data in policies.

  • PDF

The Prediction Model for Self-Reported Voice Problem Using a Decision Tree Model (의사결정나무 모형을 이용한 주관적 음성장애 예측모형)

  • Byeon, Haewon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3368-3373
    • /
    • 2013
  • The purpose of this study was to analyze the risk factors of self-reported voice problem. Data were from the Korea National Health and Nutritional Examination Survey 2008. Subjects were 3,600 persons (1,501 men, 2,099 women) aged 19 years and older. A prediction model was developed by the use of a exhaustive CHAID (Chi Squared Automatic Interaction Detection) algorism of decision tree model. In the decision tree analysis, pain and discomfort during the last 2 weeks, age, the longest occupation and thyroid disorders was significantly associated with self-reported voice problem. The findings of associated factors suggest potential ways of targeting counseling and prevention efforts to control self-reported voice problem.

Data Mining for Knowledge Management in a Health Insurance Domain

  • Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.1
    • /
    • pp.73-82
    • /
    • 2000
  • This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.

  • PDF

Data Mining Approach to Clinical Decision Support System for Hypertension Management (고혈압관리를 위한 의사지원결정시스템의 데이터마이닝 접근)

  • 김태수;채영문;조승연;윤진희;김도마
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2002.11a
    • /
    • pp.203-212
    • /
    • 2002
  • This study examined the predictive power of data mining algorithms by comparing the performance of logistic regression and decision tree algorithm, called CHAID (Chi-squared Automatic Interaction Detection), On the contrary to the previous studies, decision tree performed better than logistic regression. We have also developed a CDSS (Clinical Decision Support System) with three modules (doctor, nurse, and patient) based on data warehouse architecture. Data warehouse collects and integrates relevant information from various databases from hospital information system (HIS ). This system can help improve decision making capability of doctors and improve accessibility of educational material for patients.

  • PDF

Classification Tree Analysis to Assess Contributing Factors Influencing Biosecurity Level on Farrow-to-Finish Pig Farms in Korea (분류 트리 기법을 이용한 국내 일괄사육 양돈장의 차단방역 수준에 영향을 미치는 기여 요인 평가)

  • Kim, Kyu-Wook;Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.33 no.2
    • /
    • pp.107-112
    • /
    • 2016
  • The objective of this study was to determine potential contributing factors associated with biosecurity level of farrow-to-finish pig farms and to develop a classification tree model to explore how these factors related to each other based on prediction model. To this end, the author analyzed data (n = 193) extracted from a cross-sectional study of 344 farrow-to-finish farms which was conducted between March and September 2014 aimed to explore swine disease status at farm level. Standardized questionnaires with information about basic demographical data and management practices were collected in each farm by on-site visit of trained veterinarians. For the classification of the data sets regarding biosecurity level as a dependent variable and predictor variables, Chi-squared Automatic Interaction Detection (CHAID) algorithm was applied for modeling classification tree. The statistics of misclassification risk was used to evaluate the fitness of the model in terms of prediction results. Categorical multivariate input data (40 variables) was used to construct a classification tree, and the target variable was biosecurity level dichotomized into low versus high. In general, the level of biosecurity was lower in the majority of farms studied, mainly due to the limited implementation of on-farm basic biosecurity measures aimed at controlling the potential introduction and transmission of swine diseases. The CHAID model illustrated the relative importance of significant predictors in explaining the level of biosecurity; maintenance of medical records of treatment and vaccination, use of dedicated clothing to enter the farm, installing fence surrounding the farm perimeter, and periodic monitoring of the herd using written biosecurity plan in place. The misclassification risk estimate of the prediction model was 0.145 with the standard error of 0.025, indicating that 85.5% of the cases could be classified correctly by using the decision rule based on the current tree. Although CHAID approach could provide detailed information and insight about interactions among factors associated with biosecurity level, further evaluation of potential bias intervened in the course of data collection should be included in future studies. In addition, there is still need to validate findings through the external dataset with larger sample size to improve the external validity of the current model.

Time, Money and Health Promoting Behavior of Aged Men: Looking Through the Lens of Capability Theory (중고령 남성의 시간-소득자원 확보와 건강증진행동의 관련성: 가용이론의 적용)

  • Cha, Seung-Eun
    • Journal of Family Resource Management and Policy Review
    • /
    • v.17 no.2
    • /
    • pp.173-194
    • /
    • 2013
  • The purpose of this study was to examine the association between time-income availability and health-promoting behavior (physical practice, smoking, alcohol consumption) of older males (55-69). This study attempted to shed light on health-behavior changes during the transition period of male retirement. The availability of time resources was examined by addressing the amount of weekly paid labor hours. The availability of financial resources was calculated by using the debt-income ratio. The study sample comprised 1,372 (age range 55-69) male respondents of the 2006 Korean Longitudinal Study of Aging (2006 KLOSA wave 1). The results of CHAID (CHi-squared Automatic Interaction Detection) analysis uncovered four distinctive combinations of resource types: time-money poor, time rich, money rich, time-money rich. According to logit results, these four groups had different socio-demographic profiles and different health-behavior risks. The time-money poor males were unlikely to perform physical activities needed to improve their health or to quit smoking or alcohol consumption. This group was also more likely to consume alcohol compared to the time-money resource types. In contrast, the time-money rich group was more likely to exercise longer and more frequently than the reference group (time and money poor). The time-rich types, those who have time-only resources and less money, were likely to be smokers and have problems with alcohol consumption.

  • PDF