• Title/Summary/Keyword: Decision trees

Search Result 308, Processing Time 0.024 seconds

Interpretability Comparison of Popular Decision Tree Algorithms (대표적인 의사결정나무 알고리즘의 해석력 비교)

  • Hong, Jung-Sik;Hwang, Geun-Seong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.2
    • /
    • pp.15-23
    • /
    • 2021
  • Most of the open-source decision tree algorithms are based on three splitting criteria (Entropy, Gini Index, and Gain Ratio). Therefore, the advantages and disadvantages of these three popular algorithms need to be studied more thoroughly. Comparisons of the three algorithms were mainly performed with respect to the predictive performance. In this work, we conducted a comparative experiment on the splitting criteria of three decision trees, focusing on their interpretability. Depth, homogeneity, coverage, lift, and stability were used as indicators for measuring interpretability. To measure the stability of decision trees, we present a measure of the stability of the root node and the stability of the dominating rules based on a measure of the similarity of trees. Based on 10 data collected from UCI and Kaggle, we compare the interpretability of DT (Decision Tree) algorithms based on three splitting criteria. The results show that the GR (Gain Ratio) branch-based DT algorithm performs well in terms of lift and homogeneity, while the GINI (Gini Index) and ENT (Entropy) branch-based DT algorithms performs well in terms of coverage. With respect to stability, considering both the similarity of the dominating rule or the similarity of the root node, the DT algorithm according to the ENT splitting criterion shows the best results.

Zero-suppressed ternary decision diagram algorithm for solving noncoherent fault trees in probabilistic safety assessment of nuclear power plants

  • Woo Sik Jung
    • Nuclear Engineering and Technology
    • /
    • v.56 no.6
    • /
    • pp.2092-2098
    • /
    • 2024
  • Probabilistic safety assessment (PSA) plays a critical role in ensuring the safe operation of nuclear power plants. In PSA, event trees are developed to identify accident sequences that could lead to core damage. These event trees are then transformed into a core-damage fault tree, wherein the accident sequences are represented by usual and complemented logic gates representing failed and successful operations of safety systems, respectively. The core damage frequency (CDF) is estimated by calculating the minimal cut sets (MCSs) of the core-damage fault tree. Delete-term approximation (DTA) is commonly employed to approximately solve MCSs representing accident sequence logics from noncoherent core-damage fault trees. However, DTA can lead to an overestimation of CDF, particularly when fault trees contain many nonrare events. To address this issue, the present study introduces a new zero-suppressed ternary decision diagram (ZTDD) algorithm that averts the CDF overestimation caused by DTA. This ZTDD algorithm can optionally calculate MCSs with DTA or prime implicants (PIs) without any approximation from the core-damage fault tree. By calculating PIs, accurate CDF can be calculated. The present study provides a comprehensive explanation of the ZTDD structure, formula of the ZTDD algorithm, ZTDD minimization, probability calculation from ZTDD, strength of the ZTDD algorithm, and ZTDD application results. Results reveal that the ZTDD algorithm is a powerful tool that can quickly and accurately calculate CDF and drastically improve the safety of nuclear power plants.

A Split Criterion for Binary Decision Trees

  • Choi, Hyun Jip;Oh, Myong Rok
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.411-423
    • /
    • 2002
  • In this paper, we propose a split criterion for binary decision trees. The proposed criterion selects the optimal split by measuring the prediction success of the candidate splits at a given node. The criterion is shown to have the property of exclusive preference. Examples are given to demonstrate the properties of the criterion.

Fault Diagnosis of Induction Motors using Decision Trees (결정목을 이용한 유도전동기 결함진단)

  • Tran Van Tung;Yang Bo-Suk;Oh Myung-Suck
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2006.11a
    • /
    • pp.407-410
    • /
    • 2006
  • Decision tree is one of the most effective and widely used methods for building classification model. Researchers from various disciplines such as statistics, machine teaming, pattern recognition, and data mining have considered the decision tree method as an effective solution to their field problems. In this paper, an application of decision tree method to classify the faults of induction motors is proposed. The original data from experiment is dealt with feature calculation to get the useful information as attributes. These data are then assigned the classes which are based on our experience before becoming data inputs for decision tree. The total 9 classes are defined. An implementation of decision tree written in Matlab is used for four data sets with good performance results

  • PDF

New Splitting Criteria for Classification Trees

  • Lee, Yung-Seop
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.885-894
    • /
    • 2001
  • Decision tree methods is the one of data mining techniques. Classification trees are used to predict a class label. When a tree grows, the conventional splitting criteria use the weighted average of the left and the right child nodes for measuring the node impurity. In this paper, new splitting criteria for classification trees are proposed which improve the interpretablity of trees comparing to the conventional methods. The criteria search only for interesting subsets of the data, as opposed to modeling all of the data equally well. As a result, the tree is very unbalanced but extremely interpretable.

  • PDF

Integrity Assessment Models for Bridge Structures Using Fuzzy Decision-Making (퍼지의사결정을 이용한 교량 구조물의 건전성평가 모델)

  • 안영기;김성칠
    • Journal of the Korea Concrete Institute
    • /
    • v.14 no.6
    • /
    • pp.1022-1031
    • /
    • 2002
  • This paper presents efficient models for bridge structures using CART-ANFIS (classification and regression tree-adaptive neuro fuzzy inference system). A fuzzy decision tree partitions the input space of a data set into mutually exclusive regions, each region is assigned a label, a value, or an action to characterize its data points. Fuzzy decision trees used for classification problems are often called fuzzy classification trees, and each terminal node contains a label that indicates the predicted class of a given feature vector. In the same vein, decision trees used for regression problems are often called fuzzy regression trees, and the terminal node labels may be constants or equations that specify the predicted output value of a given input vector. Note that CART can select relevant inputs and do tree partitioning of the input space, while ANFIS refines the regression and makes it continuous and smooth everywhere. Thus it can be seen that CART and ANFIS are complementary and their combination constitutes a solid approach to fuzzy modeling.

The Decision Tree to Analyze the Cases' Ordinary Symptoms Prescribed Yeoldahanso-tang and Taeeumjowi-tang·Choweseuncheng-tang (열다한소탕과 태음조위탕·조위승청탕의 소증 분석을 위한 의사결정나무 구성)

  • Kim, Sang-Hyuk;Park, Man Young;Lee, Siwoo
    • Journal of Sasang Constitutional Medicine
    • /
    • v.29 no.3
    • /
    • pp.248-261
    • /
    • 2017
  • Objectives The purpose of this study is to analyze the decision making process of prescribing Yeoldahanso-tang and Taeeumjowi-tang Choweseuncheng-tang using decision tree. Methods We used collected the prospective clinical data of TE type from September 2012 to July 2015. In this study, we used gender, BMI, blood pressure, pulse and clinical symptoms (digestion, sweat, defecation, urination, sleep, physical status, emotion, heat-coldness, water consumption, facial color) as variables. Decision trees were analyzed using open source R version 3.3.2. Results & Conclusions We found that the decision trees differed among institutions. However, in all institutions, it was found that stool type (ordinary symptom), urine frequency (ordinary and present symptom) and anxiety (ordinary symptom) were important in the decision of prescription. Besides, clinical informations such as sex, Body Mass Index and blood pressure affected the prescription decision.

Study on the Characteristics of Bus Traffic Accidents by Types Using the Decision Tree (의사결정나무를 활용한 업종별 버스 교통사고 특성 연구)

  • Park, Wonil;Kim, Kyung Hyun;Han, Eum;Park, Sangmin;Yun, Ilsoo
    • International Journal of Highway Engineering
    • /
    • v.18 no.5
    • /
    • pp.105-115
    • /
    • 2016
  • PURPOSES : This study was initiated to analyze the characteristics of bus traffic accidents, by bus types, using the decision tree in order to establish customized safety alternatives by bus types, including the intra-city bus, rural area bus, and inter-city bus. METHODS : In this study, the major elements involved in bus traffic accidents were identified using decision trees and CHAID algorithm. The decision tree was used to identify the characteristics of major elements influencing bus traffic accidents. In addition, the CHAID algorithm was applied to branch the decision trees. RESULTS : The number of casualties and severe injuries are high in bus accidents involving pedestrians, bicycles, motorcycles, etc. In the case of light injury caused by bus accidents, different results are found. In the case of intra-city bus accidents, the probability of light injury is of 77.2% when boarding a non-owned car and breaching of duty to drive safely are involved. In the case of rural area bus accidents, the elements showing the highest probability of light injury are boarding an owned car, vehicle-to-vehicle accidents, and breaching of duty to drive safely. In the case of intra-city bus accidents, boarding owned car, streets, and vehicle-to-vehicle accidents work as the critical elements. CONCLUSIONS : In this study, the bus accident data were categorized by bus types, and then the influential elements were identified using decision trees. As a result, the characteristics of bus accidents were found to be different depending on bus types. The findings in this study are expected to be utilized in establishing effective alternatives to reduce bus accidents.

Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks

  • Sarkar, Kamal;Nasipuri, Mita;Ghose, Suranjan
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.693-712
    • /
    • 2012
  • The paper presents three machine learning based keyphrase extraction methods that respectively use Decision Trees, Na$\ddot{i}$ve Bayes, and Artificial Neural Networks for keyphrase extraction. We consider keyphrases as being phrases that consist of one or more words and as representing the important concepts in a text document. The three machine learning based keyphrase extraction methods that we use for experimentation have been compared with a publicly available keyphrase extraction system called KEA. The experimental results show that the Neural Network based keyphrase extraction method outperforms two other keyphrase extraction methods that use the Decision Tree and Na$\ddot{i}$ve Bayes. The results also show that the Neural Network based method performs better than KEA.

Analyses of the Non-Examinees' Characteristics for the Effective Health Screening Management (효율적 건강검진관리를 위한 미수검자의 특성 분석 - 건강보험 지역 가입자 중심으로 -)

  • Lee, Ae-Kyung;Lee, Sun-Mi;Park, Il-Su
    • Health Policy and Management
    • /
    • v.16 no.1
    • /
    • pp.54-72
    • /
    • 2006
  • This study was conducted as the primary work to develop a customer relationship management (CRM) system to improve the performance of health screening programs. The specific aims of the study was to identify and classify the characteristics of the people who did not receive their health screening using decision trees and to propose management strategies according to their characteristics identified. The data on a total of 5,102,761 subjects of health screening provided by the National Health Insurance Program in the year of 2002 were used. The target variable was whether they underwent their health screening. The input variables included a total of 27. The SAS 9.1 version was used for data preprocessing and statistical analyses. SAS Enterprise Miner was used to develop the decision trees model. The decision trees identified the factors greatly affecting the health screening. In the non-disease group, the highest rate of non-examinees was characterized by: no experience of receiving a health screen, household's age, non-insured episode for the last one year, and patients' age. In the disease group, the one showing the highest rate of non-examinees was characterized by: no experience of receiving a health screening, no experience of going to public health center or midwife clinic for the last one year, and examinees' age. Developing CRM systems for health screening management taking into account the individual characteristics would be considerably helpful to increase the rate of receiving health screening.