• 제목/요약/키워드: Decision Tree(DT)

검색결과 50건 처리시간 0.027초

대표적인 의사결정나무 알고리즘의 해석력 비교 (Interpretability Comparison of Popular Decision Tree Algorithms)

  • 홍정식;황근성
    • 산업경영시스템학회지
    • /
    • 제44권2호
    • /
    • pp.15-23
    • /
    • 2021
  • Most of the open-source decision tree algorithms are based on three splitting criteria (Entropy, Gini Index, and Gain Ratio). Therefore, the advantages and disadvantages of these three popular algorithms need to be studied more thoroughly. Comparisons of the three algorithms were mainly performed with respect to the predictive performance. In this work, we conducted a comparative experiment on the splitting criteria of three decision trees, focusing on their interpretability. Depth, homogeneity, coverage, lift, and stability were used as indicators for measuring interpretability. To measure the stability of decision trees, we present a measure of the stability of the root node and the stability of the dominating rules based on a measure of the similarity of trees. Based on 10 data collected from UCI and Kaggle, we compare the interpretability of DT (Decision Tree) algorithms based on three splitting criteria. The results show that the GR (Gain Ratio) branch-based DT algorithm performs well in terms of lift and homogeneity, while the GINI (Gini Index) and ENT (Entropy) branch-based DT algorithms performs well in terms of coverage. With respect to stability, considering both the similarity of the dominating rule or the similarity of the root node, the DT algorithm according to the ENT splitting criterion shows the best results.

Very Fast Decision Tree 기반 Naive Bayesian 알고리즘의 Weight 부여 기법 (An Attribute Weighting Approach for Naive Bayesian based on Very Fast Decision Tree)

  • 김세준;유승언;이병준;김경태;윤희용
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2018년도 제58차 하계학술대회논문집 26권2호
    • /
    • pp.139-140
    • /
    • 2018
  • 본 논문에서는 지도 기계 학습 알고리즘 중 하나인 Naive Bayesian (NB) 알고리즘의 데이터 분류 정확도를 향상시키기 위하여 데이터 속성에 Weight를 부여하는 새로운 기법을 제안하였다. 기존에 Decision Tree(DT) 알고리즘의 깊이를 이용하여 Weigth를 부여하는 방법이 제안되었으나, DT를 구축하는데 오버헤드가 크기 때문에 데이터의 실시간 분석이나 자원 제한적인 환경에서의 적용은 어렵다는 단점이 있다. 이를 해결하기 위하여 본 논문에서는 최소한의 데이터를 사용하여 신속하게 DT를 구축하는 Very Fast Decision Tree (VFDT) 알고리즘 기반의 Weight 부여 기법을 제안함으로써 적은 오버헤드로 NB의 정확도를 향상시킨다.

  • PDF

Prediction of karst sinkhole collapse using a decision-tree (DT) classifier

  • Boo Hyun Nam;Kyungwon Park;Yong Je Kim
    • Geomechanics and Engineering
    • /
    • 제36권5호
    • /
    • pp.441-453
    • /
    • 2024
  • Sinkhole subsidence and collapse is a common geohazard often formed in karst areas such as the state of Florida, United States of America. To predict the sinkhole occurrence, we need to understand the formation mechanism of sinkhole and its karst hydrogeology. For this purpose, investigating the factors affecting sinkholes is an essential and important step. The main objectives of the presenting study are (1) the development of a machine learning (ML)-based model, namely C5.0 decision tree (C5.0 DT), for the prediction of sinkhole susceptibility, which accounts for sinkhole/subsidence inventory and sinkhole contributing factors (e.g., geological/hydrogeological) and (2) the construction of a regional-scale sinkhole susceptibility map. The study area is east central Florida (ECF) where a cover-collapse type is commonly reported. The C5.0 DT algorithm was used to account for twelve (12) identified hydrogeological factors. In this study, a total of 1,113 sinkholes in ECF were identified and the dataset was then randomly divided into 70% and 30% subsets for training and testing, respectively. The performance of the sinkhole susceptibility model was evaluated using a receiver operating characteristic (ROC) curve, particularly the area under the curve (AUC). The C5.0 model showed a high prediction accuracy of 83.52%. It is concluded that a decision tree is a promising tool and classifier for spatial prediction of karst sinkholes and subsidence in the ECF area.

Application of Decision Tree to Classify Fall Risk Using Inertial Measurement Unit Sensor Data and Clinical Measurements

  • Junwoo Park;Jongwon Choi;Seyoung Lee;Kitaek Lim;Woochol Joseph Choi
    • 한국전문물리치료학회지
    • /
    • 제30권2호
    • /
    • pp.102-109
    • /
    • 2023
  • Background: While efforts have been made to differentiate fall risk in older adults using wearable devices and clinical methodologies, technologies are still infancy. We applied a decision tree (DT) algorithm using inertial measurement unit (IMU) sensor data and clinical measurements to generate high performance classification models of fall risk of older adults. Objects: This study aims to develop a classification model of fall risk using IMU data and clinical measurements in older adults. Methods: Twenty-six older adults were assessed and categorized into high and low fall risk groups. IMU sensor data were obtained while walking from each group, and features were extracted to be used for a DT algorithm with the Gini index (DT1) and the Entropy index (DT2), which generated classification models to differentiate high and low fall risk groups. Model's performance was compared and presented with accuracy, sensitivity, and specificity. Results: Accuracy, sensitivity and specificity were 77.8%, 80.0%, and 66.7%, respectively, for DT1; and 72.2%, 91.7%, and 33.3%, respectively, for DT2. Conclusion: Our results suggest that the fall risk classification using IMU sensor data obtained during gait has potentials to be developed for practical use. Different machine learning techniques involving larger data set should be warranted for future research and development.

Machine Learning Based Hybrid Approach to Detect Intrusion in Cyber Communication

  • Neha Pathak;Bobby Sharma
    • International Journal of Computer Science & Network Security
    • /
    • 제23권11호
    • /
    • pp.190-194
    • /
    • 2023
  • By looking the importance of communication, data delivery and access in various sectors including governmental, business and individual for any kind of data, it becomes mandatory to identify faults and flaws during cyber communication. To protect personal, governmental and business data from being misused from numerous advanced attacks, there is the need of cyber security. The information security provides massive protection to both the host machine as well as network. The learning methods are used for analyzing as well as preventing various attacks. Machine learning is one of the branch of Artificial Intelligence that plays a potential learning techniques to detect the cyber-attacks. In the proposed methodology, the Decision Tree (DT) which is also a kind of supervised learning model, is combined with the different cross-validation method to determine the accuracy and the execution time to identify the cyber-attacks from a very recent dataset of different network attack activities of network traffic in the UNSW-NB15 dataset. It is a hybrid method in which different types of attributes including Gini Index and Entropy of DT model has been implemented separately to identify the most accurate procedure to detect intrusion with respect to the execution time. The different DT methodologies including DT using Gini Index, DT using train-split method and DT using information entropy along with their respective subdivision such as using K-Fold validation, using Stratified K-Fold validation are implemented.

Axial capacity of FRP reinforced concrete columns: Empirical, neural and tree based methods

  • Saha Dauji
    • Structural Engineering and Mechanics
    • /
    • 제89권3호
    • /
    • pp.283-300
    • /
    • 2024
  • Machine learning (ML) models based on artificial neural network (ANN) and decision tree (DT) were developed for estimation of axial capacity of concrete columns reinforced with fiber reinforced polymer (FRP) bars. Between the design codes, the Canadian code provides better formulation compared to the Australian or American code. For empirical models based on elastic modulus of FRP, Hadhood et al. (2017) model performed best. Whereas for empirical models based on tensile strength of FRP, as well as all empirical models, Raza et al. (2021) was adjudged superior. However, compared to the empirical models, all ML models exhibited superior performance according to all five performance metrics considered. The performance of ANN and DT models were comparable in general. Under the present setup, inclusion of the transverse reinforcement information did not improve the accuracy of estimation with either ANN or DT. With selective use of inputs, and a much simpler ANN architecture (4-3-1) compared to that reported in literature (Raza et al. 2020: 6-11-11-1), marginal improvement in correlation could be achieved. The metrics for the best model from the study was a correlation of 0.94, absolute errors between 420 kN to 530 kN, and the range being 0.39 to 0.51 for relative errors. Though much superior performance could be obtained using ANN/DT models over empirical models, further work towards improving accuracy of the estimation is indicated before design of FRP reinforced concrete columns using ML may be considered for design codes.

GPS 재밍탐지를 위한 기계학습 적용 및 성능 분석 (Application and Performance Analysis of Machine Learning for GPS Jamming Detection)

  • 정인환
    • 한국정보기술학회논문지
    • /
    • 제17권5호
    • /
    • pp.47-55
    • /
    • 2019
  • 최근 GPS 재밍으로 인한 피해가 증가되면서 GPS 재밍을 탐지하고 대비하기 위한 연구가 활발히 진행되고 있다. 본 논문은 다중 GPS 수신채널과 3가지 기계학습을 이용한 GPS 재밍 탐지 방법을 다루고 있다. 제안된 다중 GPS 채널은 항재밍 기능이 없는 상용 GPS 수신기와 항잡음 재밍능력만 있는 수신기, 항잡음/항기만 재밍능력이 있는 수신기로 구성되고 운용자는 각각의 수신기에 수신된 좌표를 비교하여 재밍신호의 특성을 식별할 수 있다. 본 논문에서는 신호특성이 다른 각각의 5개 재밍신호를 입력하고, 3가지 기계학습방법(AB: Adaptive Boosting, SVM: Support Vector Machine, DT: Decision Tree)을 이용하여 재밍탐지 시험을 수행하였다. 시험 결과 머신러닝 기법을 단독으로 사용하였을 때 DT 기법이 96.9% 탐지율로 가장 우수한 성능을 보였으며 이진분류기 기법에 비해 모호성 낮고 하드웨어가 단순하여 GPS 재밍탐지에 효과적임을 확인하였다. 또한, 모호성을 해결해주는 추가기법을 적용할 경우 SVM 기법을 활용할 수 있음을 확인하였다.

Speech emotion recognition based on genetic algorithm-decision tree fusion of deep and acoustic features

  • Sun, Linhui;Li, Qiu;Fu, Sheng;Li, Pingan
    • ETRI Journal
    • /
    • 제44권3호
    • /
    • pp.462-475
    • /
    • 2022
  • Although researchers have proposed numerous techniques for speech emotion recognition, its performance remains unsatisfactory in many application scenarios. In this study, we propose a speech emotion recognition model based on a genetic algorithm (GA)-decision tree (DT) fusion of deep and acoustic features. To more comprehensively express speech emotional information, first, frame-level deep and acoustic features are extracted from a speech signal. Next, five kinds of statistic variables of these features are calculated to obtain utterance-level features. The Fisher feature selection criterion is employed to select high-performance features, removing redundant information. In the feature fusion stage, the GA is is used to adaptively search for the best feature fusion weight. Finally, using the fused feature, the proposed speech emotion recognition model based on a DT support vector machine model is realized. Experimental results on the Berlin speech emotion database and the Chinese emotion speech database indicate that the proposed model outperforms an average weight fusion method.

Knowledge Extractions, Visualizations, and Inference from the big Data in Healthcare and Medical

  • Kim, Jin Sung
    • 한국지능시스템학회논문지
    • /
    • 제23권5호
    • /
    • pp.400-405
    • /
    • 2013
  • The purpose of this study is to develop a composite platform for knowledge extractions, visualizations, and inference. Generally, the big data sets were frequently used in the healthcare and medical area. To help the knowledge managers/users working in the field, this study is focused on knowledge management (KM) based on Data Mining (DM), Knowledge Distribution Map (KDM), Decision Tree (DT), RDBMS, and SQL-inference. The proposed mechanism is composed of five key processes. Firstly, in Knowledge Parsing, it extracts logical rules from a big data set by using DM technology. Then it transforms the rules into RDB tables. Secondly, through Knowledge Maintenance, it refines and manages the knowledge to be ready for the computing of knowledge distributions. Thirdly, in Knowledge Distribution process, we can see the knowledge distributions by using the DT mechanism.Fourthly, in Knowledge Hierarchy, the platform shows the hierarchy of the knowledge. Finally, in Inference, it deduce the conclusions by using the given facts and data.This approach presents the advantages of diversity in knowledge representations and inference to improve the quality of computer-based medical diagnosis.

A Survey of Applications of Artificial Intelligence Algorithms in Eco-environmental Modelling

  • Kim, Kang-Suk;Park, Joon-Hong
    • Environmental Engineering Research
    • /
    • 제14권2호
    • /
    • pp.102-110
    • /
    • 2009
  • Application of artificial intelligence (AI) approaches in eco-environmental modeling has gradually increased for the last decade. Comprehensive understanding and evaluation on the applicability of this approach to eco-environmental modeling are needed. In this study, we reviewed the previous studies that used AI-techniques in eco-environmental modeling. Decision Tree (DT) and Artificial Neural Network (ANN) were found to be major AI algorithms preferred by researchers in ecological and environmental modeling areas. When the effect of the size of training data on model prediction accuracy was explored using the data from the previous studies, the prediction accuracy and the size of training data showed nonlinear correlation, which was best-described by hyperbolic saturation function among the tested nonlinear functions including power and logarithmic functions. The hyperbolic saturation equations were proposed to be used as a guideline for optimizing the size of training data set, which is critically important in designing the field experiments required for training AI-based eco-environmental modeling.