• Title/Summary/Keyword: Tree-based algorithms

Search Result 380, Processing Time 0.022 seconds

A Decision Tree Induction using Genetic Programming with Sequentially Selected Features (순차적으로 선택된 특성과 유전 프로그래밍을 이용한 결정나무)

  • Kim Hyo-Jung;Park Chong-Sun
    • Korean Management Science Review
    • /
    • v.23 no.1
    • /
    • pp.63-74
    • /
    • 2006
  • Decision tree induction algorithm is one of the most widely used methods in classification problems. However, they could be trapped into a local minimum and have no reasonable means to escape from it if tree algorithm uses top-down search algorithm. Further, if irrelevant or redundant features are included in the data set, tree algorithms produces trees that are less accurate than those from the data set with only relevant features. We propose a hybrid algorithm to generate decision tree that uses genetic programming with sequentially selected features. Correlation-based Feature Selection (CFS) method is adopted to find relevant features which are fed to genetic programming sequentially to find optimal trees at each iteration. The new proposed algorithm produce simpler and more understandable decision trees as compared with other decision trees and it is also effective in producing similar or better trees with relatively smaller set of features in the view of cross-validation accuracy.

Machine learning-based prediction of wind forces on CAARC standard tall buildings

  • Yi Li;Jie-Ting Yin;Fu-Bin Chen;Qiu-Sheng Li
    • Wind and Structures
    • /
    • v.36 no.6
    • /
    • pp.355-366
    • /
    • 2023
  • Although machine learning (ML) techniques have been widely used in various fields of engineering practice, their applications in the field of wind engineering are still at the initial stage. In order to evaluate the feasibility of machine learning algorithms for prediction of wind loads on high-rise buildings, this study took the exposure category type, wind direction and the height of local wind force as the input features and adopted four different machine learning algorithms including k-nearest neighbor (KNN), support vector machine (SVM), gradient boosting regression tree (GBRT) and extreme gradient (XG) boosting to predict wind force coefficients of CAARC standard tall building model. All the hyper-parameters of four ML algorithms are optimized by tree-structured Parzen estimator (TPE). The result shows that mean drag force coefficients and RMS lift force coefficients can be well predicted by the GBRT algorithm model while the RMS drag force coefficients can be forecasted preferably by the XG boosting algorithm model. The proposed machine learning based algorithms for wind loads prediction can be an alternative of traditional wind tunnel tests and computational fluid dynamic simulations.

A Simple and Efficient One-to-Many Large File Distribution Method Exploiting Asynchronous Joins

  • Lee, Soo-Jeon;Kang, Kyung-Ran;Lee, Dong-Man;Kim, Jae-Hoon
    • ETRI Journal
    • /
    • v.28 no.6
    • /
    • pp.709-720
    • /
    • 2006
  • In this paper, we suggest a simple and efficient multiple-forwarder-based file distribution method which can work with a tree-based application layer multicast. Existing multiple-forwarder approaches require high control overhead. The proposed method exploits the assumption that receivers join a session at different times. In tree-based application layer multicast, a set of data packets is delivered from its parent after a receiver has joined but before the next receiver joins without overlapping that of other receivers. The proposed method selects forwarders from among the preceding receivers and the forwarder forwards data packets from the non-overlapping data packet set. Three variations of forwarder selection algorithms are proposed. The impact of the proposed algorithms is evaluated using numerical analysis. A performance evaluation using PlanetLab, a global area overlay testbed, shows that the proposed method enhances throughput while maintaining the data packet duplication ratio and control overhead significantly lower than the existing method, Bullet.

  • PDF

Incorporating BERT-based NLP and Transformer for An Ensemble Model and its Application to Personal Credit Prediction

  • Sophot Ky;Ju-Hong Lee;Kwangtek Na
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.9-15
    • /
    • 2024
  • Tree-based algorithms have been the dominant methods used build a prediction model for tabular data. This also includes personal credit data. However, they are limited to compatibility with categorical and numerical data only, and also do not capture information of the relationship between other features. In this work, we proposed an ensemble model using the Transformer architecture that includes text features and harness the self-attention mechanism to tackle the feature relationships limitation. We describe a text formatter module, that converts the original tabular data into sentence data that is fed into FinBERT along with other text features. Furthermore, we employed FT-Transformer that train with the original tabular data. We evaluate this multi-modal approach with two popular tree-based algorithms known as, Random Forest and Extreme Gradient Boosting, XGBoost and TabTransformer. Our proposed method shows superior Default Recall, F1 score and AUC results across two public data sets. Our results are significant for financial institutions to reduce the risk of financial loss regarding defaulters.

Robust Variable Selection in Classification Tree

  • Jang Jeong Yee;Jeong Kwang Mo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.89-94
    • /
    • 2001
  • In this study we focus on variable selection in decision tree growing structure. Some of the splitting rules and variable selection algorithms are discussed. We propose a competitive variable selection method based on Kruskal-Wallis test, which is a nonparametric version of ANOVA F-test. Through a Monte Carlo study we note that CART has serious bias in variable selection towards categorical variables having many values, and also QUEST using F-test is not so powerful to select informative variables under heavy tailed distributions.

  • PDF

A design of binary decision tree using genetic algorithms and its application to the alphabetic charcter (유전 알고리즘을 이용한 이진 결정 트리의 설계와 영문자 인식에의 응용)

  • 정순원;김경민;박귀태
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1995.10b
    • /
    • pp.218-223
    • /
    • 1995
  • A new design scheme of a binary decision tree is proposed. In this scheme a binary decision tree is constructed by using genetic algorithm and FCM algorithm. At each node optimal or near-optimal feature or feature subset among all the available features is selected based on fitness function in genetic algorithm which is inversely proportional to classification error, balance between cluster, number of feature used. The proposed design scheme is applied to the handwtitten alphabetic characters. Experimental results show the usefulness of the proposed scheme.

  • PDF

J48 and ADTree for forecast of leaving of hospitals

  • Halim, Faisal;Muttaqin, Rizal
    • Korean Journal of Artificial Intelligence
    • /
    • v.4 no.1
    • /
    • pp.11-13
    • /
    • 2016
  • These days, medical technology has been developed rapidly to meet desire of living healthy life. Average lifespan was extended to let people see a doctor because of many reasons. This study has shown rate of leaving of hospitals to investigate the rate of not only department of surgery but also department of internal medicine. Linear model, tree, classification rule, association and algorithm of data mining were used. This study investigated by using J48 and AD tree of decision-making tree In this study, J48 and AD tree of decision-making tree of data mining were used to investigate based on result of both data. Both algorithms were found to have similar performance. Both algorithms were not equivalent to require detailed experiment. Collect more experimental data in the future to apply from various points of view. Development of medical technology gives dream, hope and pleasure. The ones who suffer from incurable diseases need developed medical technology. Environment being similar to the reality shall be made to experiment exactly to investigate data carefully and to let the ones of various ages visit hospital and to increase survival rate.

Selection of an Optimal Algorithm among Decision Tree Techniques for Feature Analysis of Industrial Accidents in Construction Industries (건설업의 산업재해 특성분석을 위한 의사결정나무 기법의 상용 최적 알고리즘 선정)

  • Leem Young-Moon;Choi Yo-Han
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.5
    • /
    • pp.1-8
    • /
    • 2005
  • The consequences of rapid industrial advancement, diversified types of business and unexpected industrial accidents have caused a lot of damage to many unspecified persons both in a human way and a material way Although various previous studies have been analyzed to prevent industrial accidents, these studies only provide managerial and educational policies using frequency analysis and comparative analysis based on data from past industrial accidents. The main objective of this study is to find an optimal algorithm for data analysis of industrial accidents and this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. Enterprise Miner of SAS and AnswerTree of SPSS will be used to evaluate the validity of the results of the four algorithms. The sample for this work chosen from 19,574 data related to construction industries during three years ($2002\sim2004$) in Korea.

A Study on the Highly Parallel Multiple-Valued Logic Circuit Design with DTG Properties (DTG의 性質을 갖는 高速竝列多値論理回路의 設計에 관한 硏究)

  • Na, Gi-Su;Shin, Boo-Sik;Choi, Jai-Sok;Park, Chun-Myoung;Kim, Heung-Soo
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.6
    • /
    • pp.27-36
    • /
    • 1999
  • This paper proposes algorithms that design the highly parallel multiple-valued logic circuit of DTG(Directed Tree Graph) to be represented by tree structure relationship between input and output of nodes. The conventional Nakajima's algorithms have some problems so that this paper introduce the concept of mathematical analysis based on tree structure to design optimized locally computable circuit. Using the proposed circuit design algorithms in this paper it is possible to design circuit in that DTG have any node number - not to design by Nakajima's algorithms. Also, making a comparison between the circuit design using Nakajim's algorithms and this paper's, we testify that proposed algorithms in this paper optimizes circuit design all case of DTG. Some examples are shown to demonstrate the usefulness of the circuit design algorithm.

  • PDF

Economic Design of Tree Network Using Tabu List Coupled Genetic Algorithms (타부 리스트가 결합된 유전자 알고리즘을 이용한 트리형 네트워크의 경제적 설계)

  • Lee, Seong-Hwan;Lee, Han-Jin;Yum, Chang-Sun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.35 no.1
    • /
    • pp.10-15
    • /
    • 2012
  • This paper considers an economic design problem of a tree-based network which is a kind of computer network. This problem can be modeling to be an objective function to minimize installation costs, on the constraints of spanning tree and maximum traffic capacity of sub tree. This problem is known to be NP-hard. To efficiently solve the problem, a tabu list coupled genetic algorithm approach is proposed. Two illustrative examples are used to explain and test the proposed approach. Experimental results show evidence that the proposed approach performs more efficiently for finding a good solution or near optimal solution in comparison with a genetic algorithm approach.