• 제목/요약/키워드: Decision-trees

검색결과 307건 처리시간 0.025초

대표적인 의사결정나무 알고리즘의 해석력 비교 (Interpretability Comparison of Popular Decision Tree Algorithms)

  • 홍정식;황근성
    • 산업경영시스템학회지
    • /
    • 제44권2호
    • /
    • pp.15-23
    • /
    • 2021
  • Most of the open-source decision tree algorithms are based on three splitting criteria (Entropy, Gini Index, and Gain Ratio). Therefore, the advantages and disadvantages of these three popular algorithms need to be studied more thoroughly. Comparisons of the three algorithms were mainly performed with respect to the predictive performance. In this work, we conducted a comparative experiment on the splitting criteria of three decision trees, focusing on their interpretability. Depth, homogeneity, coverage, lift, and stability were used as indicators for measuring interpretability. To measure the stability of decision trees, we present a measure of the stability of the root node and the stability of the dominating rules based on a measure of the similarity of trees. Based on 10 data collected from UCI and Kaggle, we compare the interpretability of DT (Decision Tree) algorithms based on three splitting criteria. The results show that the GR (Gain Ratio) branch-based DT algorithm performs well in terms of lift and homogeneity, while the GINI (Gini Index) and ENT (Entropy) branch-based DT algorithms performs well in terms of coverage. With respect to stability, considering both the similarity of the dominating rule or the similarity of the root node, the DT algorithm according to the ENT splitting criterion shows the best results.

Zero-suppressed ternary decision diagram algorithm for solving noncoherent fault trees in probabilistic safety assessment of nuclear power plants

  • Woo Sik Jung
    • Nuclear Engineering and Technology
    • /
    • 제56권6호
    • /
    • pp.2092-2098
    • /
    • 2024
  • Probabilistic safety assessment (PSA) plays a critical role in ensuring the safe operation of nuclear power plants. In PSA, event trees are developed to identify accident sequences that could lead to core damage. These event trees are then transformed into a core-damage fault tree, wherein the accident sequences are represented by usual and complemented logic gates representing failed and successful operations of safety systems, respectively. The core damage frequency (CDF) is estimated by calculating the minimal cut sets (MCSs) of the core-damage fault tree. Delete-term approximation (DTA) is commonly employed to approximately solve MCSs representing accident sequence logics from noncoherent core-damage fault trees. However, DTA can lead to an overestimation of CDF, particularly when fault trees contain many nonrare events. To address this issue, the present study introduces a new zero-suppressed ternary decision diagram (ZTDD) algorithm that averts the CDF overestimation caused by DTA. This ZTDD algorithm can optionally calculate MCSs with DTA or prime implicants (PIs) without any approximation from the core-damage fault tree. By calculating PIs, accurate CDF can be calculated. The present study provides a comprehensive explanation of the ZTDD structure, formula of the ZTDD algorithm, ZTDD minimization, probability calculation from ZTDD, strength of the ZTDD algorithm, and ZTDD application results. Results reveal that the ZTDD algorithm is a powerful tool that can quickly and accurately calculate CDF and drastically improve the safety of nuclear power plants.

A Split Criterion for Binary Decision Trees

  • Choi, Hyun Jip;Oh, Myong Rok
    • Communications for Statistical Applications and Methods
    • /
    • 제9권2호
    • /
    • pp.411-423
    • /
    • 2002
  • In this paper, we propose a split criterion for binary decision trees. The proposed criterion selects the optimal split by measuring the prediction success of the candidate splits at a given node. The criterion is shown to have the property of exclusive preference. Examples are given to demonstrate the properties of the criterion.

결정목을 이용한 유도전동기 결함진단 (Fault Diagnosis of Induction Motors using Decision Trees)

  • Tran Van Tung;Yang Bo-Suk;Oh Myung-Suck
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2006년도 추계학술대회논문집
    • /
    • pp.407-410
    • /
    • 2006
  • Decision tree is one of the most effective and widely used methods for building classification model. Researchers from various disciplines such as statistics, machine teaming, pattern recognition, and data mining have considered the decision tree method as an effective solution to their field problems. In this paper, an application of decision tree method to classify the faults of induction motors is proposed. The original data from experiment is dealt with feature calculation to get the useful information as attributes. These data are then assigned the classes which are based on our experience before becoming data inputs for decision tree. The total 9 classes are defined. An implementation of decision tree written in Matlab is used for four data sets with good performance results

  • PDF

New Splitting Criteria for Classification Trees

  • Lee, Yung-Seop
    • Communications for Statistical Applications and Methods
    • /
    • 제8권3호
    • /
    • pp.885-894
    • /
    • 2001
  • Decision tree methods is the one of data mining techniques. Classification trees are used to predict a class label. When a tree grows, the conventional splitting criteria use the weighted average of the left and the right child nodes for measuring the node impurity. In this paper, new splitting criteria for classification trees are proposed which improve the interpretablity of trees comparing to the conventional methods. The criteria search only for interesting subsets of the data, as opposed to modeling all of the data equally well. As a result, the tree is very unbalanced but extremely interpretable.

  • PDF

퍼지의사결정을 이용한 교량 구조물의 건전성평가 모델 (Integrity Assessment Models for Bridge Structures Using Fuzzy Decision-Making)

  • 안영기;김성칠
    • 콘크리트학회논문집
    • /
    • 제14권6호
    • /
    • pp.1022-1031
    • /
    • 2002
  • 본 연구에서는 분규ㆍ회귀목-적응 뉴고 퍼지추론 시스템을 사용하여 교량 구조물에 대한 유용한 모델을 제시하였다. 퍼지결정목은 데이터집합의 입력영역이 서로 다른 영역으로 분류되고 하나의 부호나 값으로 나타내지며 데이터 정점에서 특정화시키기 위한 활동영역으로 할당되기도 한다. 분류문제로 사용되는 결정목은 가끔 퍼지결정목이라고 불려지는데, 각 최종점은 주어진 특정백터의 예측등급을 나타낸다. 회귀문제에 사용되는 결정목을 가끔 퍼지회귀목이라고 하는데, 이 때 최종점 영역은 주어진 입력백터의 예측 출력 값을 상수나 방정식으로 나타낼 수 있다. 분류ㆍ회귀목은 관련된 입력값을 선택하여 입력구역에서 분류 할 수 있는 반면에 적응 뉴로 퍼지추론 시스템은 회귀문제를 수정하고 이틀의 회귀문제를 보다 연속적이면서 간략하게 만들 수 있음을 주목해야 한다. 따라서 분류ㆍ회귀목과 적응 뉴로 퍼지추론 시스템은 서로 상보적인 것이며, 이들의 조합은 퍼지모델링을 위해 실직적인 근사식으로 구성된다.

열다한소탕과 태음조위탕·조위승청탕의 소증 분석을 위한 의사결정나무 구성 (The Decision Tree to Analyze the Cases' Ordinary Symptoms Prescribed Yeoldahanso-tang and Taeeumjowi-tang·Choweseuncheng-tang)

  • 김상혁;박만영;이시우
    • 사상체질의학회지
    • /
    • 제29권3호
    • /
    • pp.248-261
    • /
    • 2017
  • Objectives The purpose of this study is to analyze the decision making process of prescribing Yeoldahanso-tang and Taeeumjowi-tang Choweseuncheng-tang using decision tree. Methods We used collected the prospective clinical data of TE type from September 2012 to July 2015. In this study, we used gender, BMI, blood pressure, pulse and clinical symptoms (digestion, sweat, defecation, urination, sleep, physical status, emotion, heat-coldness, water consumption, facial color) as variables. Decision trees were analyzed using open source R version 3.3.2. Results & Conclusions We found that the decision trees differed among institutions. However, in all institutions, it was found that stool type (ordinary symptom), urine frequency (ordinary and present symptom) and anxiety (ordinary symptom) were important in the decision of prescription. Besides, clinical informations such as sex, Body Mass Index and blood pressure affected the prescription decision.

의사결정나무를 활용한 업종별 버스 교통사고 특성 연구 (Study on the Characteristics of Bus Traffic Accidents by Types Using the Decision Tree)

  • 박원일;김경현;한음;박상민;윤일수
    • 한국도로학회논문집
    • /
    • 제18권5호
    • /
    • pp.105-115
    • /
    • 2016
  • PURPOSES : This study was initiated to analyze the characteristics of bus traffic accidents, by bus types, using the decision tree in order to establish customized safety alternatives by bus types, including the intra-city bus, rural area bus, and inter-city bus. METHODS : In this study, the major elements involved in bus traffic accidents were identified using decision trees and CHAID algorithm. The decision tree was used to identify the characteristics of major elements influencing bus traffic accidents. In addition, the CHAID algorithm was applied to branch the decision trees. RESULTS : The number of casualties and severe injuries are high in bus accidents involving pedestrians, bicycles, motorcycles, etc. In the case of light injury caused by bus accidents, different results are found. In the case of intra-city bus accidents, the probability of light injury is of 77.2% when boarding a non-owned car and breaching of duty to drive safely are involved. In the case of rural area bus accidents, the elements showing the highest probability of light injury are boarding an owned car, vehicle-to-vehicle accidents, and breaching of duty to drive safely. In the case of intra-city bus accidents, boarding owned car, streets, and vehicle-to-vehicle accidents work as the critical elements. CONCLUSIONS : In this study, the bus accident data were categorized by bus types, and then the influential elements were identified using decision trees. As a result, the characteristics of bus accidents were found to be different depending on bus types. The findings in this study are expected to be utilized in establishing effective alternatives to reduce bus accidents.

Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks

  • Sarkar, Kamal;Nasipuri, Mita;Ghose, Suranjan
    • Journal of Information Processing Systems
    • /
    • 제8권4호
    • /
    • pp.693-712
    • /
    • 2012
  • The paper presents three machine learning based keyphrase extraction methods that respectively use Decision Trees, Na$\ddot{i}$ve Bayes, and Artificial Neural Networks for keyphrase extraction. We consider keyphrases as being phrases that consist of one or more words and as representing the important concepts in a text document. The three machine learning based keyphrase extraction methods that we use for experimentation have been compared with a publicly available keyphrase extraction system called KEA. The experimental results show that the Neural Network based keyphrase extraction method outperforms two other keyphrase extraction methods that use the Decision Tree and Na$\ddot{i}$ve Bayes. The results also show that the Neural Network based method performs better than KEA.

효율적 건강검진관리를 위한 미수검자의 특성 분석 - 건강보험 지역 가입자 중심으로 - (Analyses of the Non-Examinees' Characteristics for the Effective Health Screening Management)

  • 이애경;이선미;박일수
    • 보건행정학회지
    • /
    • 제16권1호
    • /
    • pp.54-72
    • /
    • 2006
  • This study was conducted as the primary work to develop a customer relationship management (CRM) system to improve the performance of health screening programs. The specific aims of the study was to identify and classify the characteristics of the people who did not receive their health screening using decision trees and to propose management strategies according to their characteristics identified. The data on a total of 5,102,761 subjects of health screening provided by the National Health Insurance Program in the year of 2002 were used. The target variable was whether they underwent their health screening. The input variables included a total of 27. The SAS 9.1 version was used for data preprocessing and statistical analyses. SAS Enterprise Miner was used to develop the decision trees model. The decision trees identified the factors greatly affecting the health screening. In the non-disease group, the highest rate of non-examinees was characterized by: no experience of receiving a health screen, household's age, non-insured episode for the last one year, and patients' age. In the disease group, the one showing the highest rate of non-examinees was characterized by: no experience of receiving a health screening, no experience of going to public health center or midwife clinic for the last one year, and examinees' age. Developing CRM systems for health screening management taking into account the individual characteristics would be considerably helpful to increase the rate of receiving health screening.