• Title/Summary/Keyword: CHAID

Search Result 75, Processing Time 0.02 seconds

Data Mining Approach to Clinical Decision Support System for Hypertension Management (고혈압관리를 위한 의사지원결정시스템의 데이터마이닝 접근)

  • 김태수;채영문;조승연;윤진희;김도마
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2002.11a
    • /
    • pp.203-212
    • /
    • 2002
  • This study examined the predictive power of data mining algorithms by comparing the performance of logistic regression and decision tree algorithm, called CHAID (Chi-squared Automatic Interaction Detection), On the contrary to the previous studies, decision tree performed better than logistic regression. We have also developed a CDSS (Clinical Decision Support System) with three modules (doctor, nurse, and patient) based on data warehouse architecture. Data warehouse collects and integrates relevant information from various databases from hospital information system (HIS ). This system can help improve decision making capability of doctors and improve accessibility of educational material for patients.

  • PDF

Study on the Application of Decision Trees for Personalization based on e-CRM (e-CRM에서 개인화 향상을 위한 의사결정나무 사용에 관한 연구)

  • 양정희;한서정
    • Journal of the Korea Safety Management & Science
    • /
    • v.5 no.3
    • /
    • pp.107-119
    • /
    • 2003
  • Expectation and interest about e-CRM are rising for more efficient customer management in on-line including electronic commerce. The decision-making tree can be used usefully as the data mining technology for e-CRM. In this paper, the representative decision making techniques, CART, C4.5, CHAID analyzed the differences in personalization point of view with actuality customer data through an experiment. With these analysis data, it is proposed a new decision-making tree system that has big advantage in personalization techniques. Through new system, it can get following advantage. First, it can form superior model more qualitatively in personalization by adding individual's weight value. Second it can supply information personalized more to customer. Third, it can have high position about customer's loyalty than other site of similar types of business. Fourth, it can reduce expense that cost marketing and decision-making. Fifth, it becomes possible that know that customer through smooth communication with customer who use personalized service wants and make from goods or service's quality to more worth thing.

A Study on Analyzing Children's Crossing Behaviors on Non-signalized Crosswalk (비신호 횡단보도에서의 어린이 횡단행태 분석 연구)

  • Lee, Deok Whan;Lee, Yun Suk;Kim, Won Ho;Lee, Back Jin
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.3
    • /
    • pp.19-32
    • /
    • 2013
  • The study aims to find the characteristics of children's crossing behavior on crosswalk in school zones. It considers accident occurrence and physical form of school zones. Seven elementary school zones were investigated. Using data collected by field observation and video recording, statistical analysis, CHAID algorithm analysis, and pattern analysis were performed. As a result, it was found that children's waiting, attention and distraction were related to the accident occurrence. While 69.1% children showed waiting-before-crossing behavior in low-accident occurrence crosswalk, 83.6% children showed non waiting-before-crossing behavior in high-accident occurrence crosswalk. Moreover, the ratio of waiting, attention behavior was found to be higher when the width of the crosswalk was wide and the distance from the school's entrance to the crosswalk was long. These research findings showed that children's behavior-oriented approach was required to improve safety in school zone.

The prediction Models for Clearance Times for the unexpected Incidences According to Traffic Accident Classifications in Highway (고속도로 사고등급별 돌발상황 처리시간 예측모형 및 의사결정나무 개발)

  • Ha, Oh-Keun;Park, Dong-Joo;Won, Jai-Mu;Jung, Chul-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.1
    • /
    • pp.101-110
    • /
    • 2010
  • In this study, a prediction model for incident reaction time was developed so that we can cope with the increasing demand for information related to the accident reaction time. For this, the time for dealing with accidents and dependent variables were classified into incident grade, A, B, and C. Then, fifteen independent variables including traffic volume, number of accident-related vehicles and the accidents time zone were utilized. As a result, traffic volume, possibility of including heavy vehicles, and an accident time zone were found as important variables. The results showed that the model has some degree of explanatory power. In addition, when the CHAID Technique was applied, the Answer Tree was constructed based on the variables included in the prediction model for incident reaction time. Using the developed Answer Tree model, accidents firstly were classified into grades A, B, and C. In the secondary classification, they were grouped according to the traffic volume. This study is expected to make a contribution to provide expressway users with quicker and more effective traffic information through the prediction model for incident reaction time and the Answer Tree, when incidents happen on expressway

Convergence Analysis of Risk factors for Readmission in Cardiovascular Disease: A Machine Learning Approach (의사결정나무분석을 이용한 심혈관질환자의 재입원 위험 요인에 대한 융합적 분석)

  • Kim, Hyun-Su
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.12
    • /
    • pp.115-123
    • /
    • 2019
  • This is descriptive study to 2nd analysis data KNHANES IV-VI about risk factors of readmission among patients with cardiovascular disease. Among the total 65,973 adults, 1,037 with angina or myocardial infarction were analyzed. The analysis was conducted using SPSS window 21 Program and CHAID decision tree was used in the classification analysis. Root nodes are economic activity(χ2=12.063, p=.001), children's nodes are personal income(χ2=6.575, p=.031), weight change(χ2=12.758, p=.001), residential area(χ2=4.025, p=.045), direct smoking(χ2=3.884, p=.031). p=.049), level of education(χ2=9.630, p=.024). Terminal nodes are hypertension(χ2=3.854, p=.050), diabetes mellitus(χ2=6.056, p=.014), occupation type(χ2=7.799, p=.037). We suggest that the development and operation of programs considering the integrated approach of various factors is necessary for the readmission management of cardiovascular patients.

A Study on Decision Rules for Qi·Blood·Yin·Yang Deficiency Pathogenic Factor Based on Clinical Data of Diagnosis System of Oriental Medicine (한방진단설문지 임상자료에 근거한 기혈음양 허증병기 의사결정규칙 연구)

  • Soo Hyung Jeon;In Seon Lee;Gyoo yong Chi;Jong Won Kim;Chang Wan Kang;Yong Tae Lee
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.37 no.6
    • /
    • pp.172-177
    • /
    • 2023
  • In order to deduce the pathogenic factor(PF) diagnosis logic of underlying in pattern identification of Korean medicine, 2,072 cases of DSOM(Diagnosis System of Oriental Medicine) data from May 2005 to April 2022 were collected and analyzed by means of decision tree model(DTM). The entire data were divided into training data and validation data at a ratio of 7:3. The CHAID algorithm was used for analysis of DTM, and then validity was tested by applying the validation data. The decision rules of items and pathways determined from the diagnosis data of Qi Deficiency, Blood Deficiency, Yin Deficiency and Yang Deficiency Pathogenic Factor of DSOM were as follows. Qi Deficiency PF had 7 decision rules and used 5 questions: Q124, Q116a, Q119, Q119a, Q55. The primary indicators(PI) were 'lack of energy' and 'weary of talking'. Blood deficiency PF had 7 decision rules and used 6 questions: Q113, Q84, Q85, Q114, Q129, Q130. The PI were 'numbness in the limbs', 'dizziness when standing up', and 'frequent cramps'. Yin deficiency PF had 3 decision rules and used 2 questions: Q144 and Q56. The PI were 'subjective heat sensation from the afternoon to night' and 'heat sensation in the limbs'. Yang deficiency PF had 3 decision rules and used 3 questions: Q55, Q10, and Q102. The PI were 'sweating even with small movements' and 'lack of energy'. Conclusively, these rules and symptom information to decide the Qi·Blood·Yin·Yang Deficiency PF would be helpful for Korean medicine diagnostics.

Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set (대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개)

  • Lim, Yong-B.;Cho, J.;Um, Kyung-A;Lee, Sun-Ah
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.2
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

Pattern Classification Using Hybrid Monte Carlo Neural Networks (변종 몬테 칼로 신경망을 이용한 패턴 분류)

  • Jeon, Seong-Hae;Choe, Seong-Yong;O, Im-Geol;Lee, Sang-Ho;Jeon, Hong-Seok
    • The KIPS Transactions:PartB
    • /
    • v.8B no.3
    • /
    • pp.231-236
    • /
    • 2001
  • 일반적인 다층 신경망에서 가중치의 갱신 알고리즘으로 사용하는 오류 역전과 방식은 가중치 갱신 결과를 고정된(fixed) 한 개의 값으로 결정한다. 이는 여러 갱신의 가능성을 오직 한 개의 값으로 고정하기 때문에 다양한 가능성들을 모두 수용하지 못하는 면이 있다. 하지만 모든 가능성을 확률적 분포로 표현하는 갱신 알고리즘을 도입하면 이런 문제는 해결된다. 이러한 알고리즘을 사용한 베이지안 신경망 모형(Bayesian Neural Networks Models)은 주어진 입력값(Input)에 대해 블랙 박스(Black-Box)와같은 신경망 구조의 각 층(Layer)을 거친 출력값(Out put)을 계산한다. 이 때 주어진 입력 데이터에 대한 결과의 예측값은 사후분포(posterior distribution)의 기댓값(mean)에 의해 계산할 수 있다. 주어진 사전분포(prior distribution)와 학습데이터에 의한 우도함수(likelihood functions)에 의해 계산한 사후확률의 함수는 매우 복잡한 구조를 가짐으로 기댓값의 적분계산에 대한 어려움이 발생한다. 따라서 수치해석적인 방법보다는 확률적 추정에 의한 근사 방법인 몬테 칼로 시뮬레이션을 이용할 수 있다. 이러한 방법으로서 Hybrid Monte Carlo 알고리즘은 좋은 결과를 제공하여준다(Neal 1996). 본 논문에서는 Hybrid Monte Carlo 알고리즘을 적용한 신경망이 기존의 CHAID, CART 그리고 QUEST와 같은 여러 가지 분류 알고리즘에 비해서 우수한 결과를 제공하는 것을 나타내고 있다.

  • PDF

Evaluation on Performance for Classification of Students Leaving Their Majors Using Data Mining Technique (데이터마이닝 기법을 이용한 전공이탈자 분류를 위한 성능평가)

  • Leem, Young-Moon;Ryu, Chang-Hyun
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2006.11a
    • /
    • pp.293-297
    • /
    • 2006
  • Recently most universities are suffering from students leaving their majors. In order to make a countermeasure for reducing major separation rate, many universities are trying to find a proper solution. As a similar endeavor, this paper uses decision tree algorithm which is one of the data mining techniques which conduct grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on students leaving their majors. The dataset consists of 5,115 features through data selection from total data of 13,346 collected from a university in Kangwon-Do during seven years(2000.3.1 $\sim$ 2006.6.30). The main objective of this study is to evaluate performance of algorithms including CHAID, CART and C4.5 for classification of students leaving their majors with ROC Chart, Lift Chart and Gains Chart. Also, this study provides values about accuracy, sensitivity, specificity using classification table. According to the analysis result, CART showed the best performance for classification of students leaving their majors.

  • PDF

Predicting Model of Students Leaving Their Majors Using Data Mining Technique (데이터마이닝 기법을 이용한 전공이탈자 예측모형)

  • Leem, Young-Moon;Ryu, Chang-Hyun
    • Journal of the Korea Safety Management & Science
    • /
    • v.8 no.5
    • /
    • pp.17-25
    • /
    • 2006
  • Nowadays most colleges are confronting with a serious problem because many students have left their majors at the colleges. In order to make a countermeasure for reducing major separation rate, many universities are trying to find a proper solution. As a similar endeavor, the objective of this paper Is to find a predicting model of students leaving their majors. The sample for this study was chosen from a university in Kangwon-Do during seven years(2000.3.1 $\sim$ 2006. 6.30). In this study, the ratio of training sample versus testing sample among partition data was controlled as 50% : 50% for a validation test of data division. Also, this study provides values about accuracy, sensitivity, specificity about three kinds of algorithms including CHAID, CART and C4.5. In addition, ROC chart and gains chart were used for classification of students leaving their majors. The analysis results were very informative since those enable us to know the most important factors such as semester taking a course, grade on cultural subjects, scholarship, grade on majors, and total completion of courses which can affect students leaving their majors.