• Title/Summary/Keyword: Classification and Regression Trees

Search Result 63, Processing Time 0.021 seconds

Note on classification and regression tree analysis (분류와 회귀나무분석에 관한 소고)

  • 임용빈;오만숙
    • Journal of Korean Society for Quality Management
    • /
    • v.30 no.1
    • /
    • pp.152-161
    • /
    • 2002
  • The analysis of large data sets with hundreds of thousands observations and thousands of independent variables is a formidable computational task. A less parametric method, capable of identifying important independent variables and their interactions, is a tree structured approach to regression and classification. It gives a graphical and often illuminating way of looking at data in classification and regression problems. In this paper, we have reviewed and summarized tile methodology used to construct a tree, multiple trees and the sequential strategy for identifying active compounds in large chemical databases.

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

Integrity Assessment for Reinforced Concrete Structures Using Fuzzy Decision Making (퍼지의사결정을 이용한 RC구조물의 건전성평가)

  • 박철수;손용우;이증빈
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2002.04a
    • /
    • pp.274-283
    • /
    • 2002
  • This paper presents an efficient models for reinforeced concrete structures using CART-ANFIS(classification and regression tree-adaptive neuro fuzzy inference system). a fuzzy decision tree parttitions the input space of a data set into mutually exclusive regions, each of which is assigned a label, a value, or an action to characterize its data points. Fuzzy decision trees used for classification problems are often called fuzzy classification trees, and each terminal node contains a label that indicates the predicted class of a given feature vector. In the same vein, decision trees used for regression problems are often called fuzzy regression trees, and the terminal node labels may be constants or equations that specify the Predicted output value of a given input vector. Note that CART can select relevant inputs and do tree partitioning of the input space, while ANFIS refines the regression and makes it everywhere continuous and smooth. Thus it can be seen that CART and ANFIS are complementary and their combination constitutes a solid approach to fuzzy modeling.

  • PDF

Evaluations of predicted models fitted for data mining - comparisons of classification accuracy and training time for 4 algorithms (데이터마이닝기법상에서 적합된 예측모형의 평가 -4개분류예측모형의 오분류율 및 훈련시간 비교평가 중심으로)

  • Lee, Sang-Bock
    • Journal of the Korean Data and Information Science Society
    • /
    • v.12 no.2
    • /
    • pp.113-124
    • /
    • 2001
  • CHAID, logistic regression, bagging trees, and bagging trees are compared on SAS artificial data set as HMEQ in terms of classification accuracy and training time. In error rates, bagging trees is at the top, although its run time is slower than those of others. The run time of logistic regression is best among given models, but there is no uniformly efficient model satisfied in both criteria.

  • PDF

The Development of Models and the Characteristics for Subway Noise Using the Classification and Regression Trees (CART 분석을 이용한 지하철 소음모형 개발 및 특성 연구)

  • Kim, Tae-Ho;Lee, Jae-Myung;Won, Jai-Mu;Song, In-Suk
    • Journal of the Korean Society for Railway
    • /
    • v.10 no.5
    • /
    • pp.480-486
    • /
    • 2007
  • The subway is a necessary public transportation in big cities, which many citizens are using now. However, the demands for subway inner circumstance by citizens are growing recently. Among them, the noise problem is the hot issue to be solved. So, in this study we classified the characteristics of subway noise using the classification and regression trees (CART) based on noise level data in line No. 5 in Seoul. After that We developed the models for effect of subway noise and analyzed the characteristics through it. The result of this study is that we need to consider the type of geometry design and operational factors when the problem of subway noise improves, because the factors which weigh with subway noise are different by type of geometry and operational part.

Integrity Assessment Models for Bridge Structures Using Fuzzy Decision-Making (퍼지의사결정을 이용한 교량 구조물의 건전성평가 모델)

  • 안영기;김성칠
    • Journal of the Korea Concrete Institute
    • /
    • v.14 no.6
    • /
    • pp.1022-1031
    • /
    • 2002
  • This paper presents efficient models for bridge structures using CART-ANFIS (classification and regression tree-adaptive neuro fuzzy inference system). A fuzzy decision tree partitions the input space of a data set into mutually exclusive regions, each region is assigned a label, a value, or an action to characterize its data points. Fuzzy decision trees used for classification problems are often called fuzzy classification trees, and each terminal node contains a label that indicates the predicted class of a given feature vector. In the same vein, decision trees used for regression problems are often called fuzzy regression trees, and the terminal node labels may be constants or equations that specify the predicted output value of a given input vector. Note that CART can select relevant inputs and do tree partitioning of the input space, while ANFIS refines the regression and makes it continuous and smooth everywhere. Thus it can be seen that CART and ANFIS are complementary and their combination constitutes a solid approach to fuzzy modeling.

Natural Spread Pattern of Damaged Area by Pine Wilt Disease Using Geostatistical Analysis (공간통계학적 방법에 의한 소나무 재선충 피해의 자연적 확산유형분석)

  • Son, Min-Ho;Lee, Woo-Kyun;Lee, Seung-Ho;Cho, Hyun-Kook;Lee, Jun-Hak
    • Journal of Korean Society of Forest Science
    • /
    • v.95 no.3
    • /
    • pp.240-249
    • /
    • 2006
  • Recently, dispersion of damaged forest by pine wilt disease has been regarded as a serious social issue. Damages by pine wilt disease have been spreaded by natural area expansion of the vectors in the damaged area, while the national wide damage spread has induced by human-involved carrying infected trees out of damaged area. In this study, damaged trees were detected and located on the digital map by aerial photograph and terrestrial surveys. The spatial distribution pattern of damaged trees, and the relationship of spatial distribution of damaged trees and some geomorphological factors were geostatistically analysed. Finally, we maked natural spread pattern map of pine wilt disease using geostatistical CART(Classification and Regression Trees) model. This study verified that geostatistical analysis and CART model are useful tools for understanding spatial distribution and natural spread pattern of pine wilt diseases.

SUPPORT Applications for Classification Trees

  • Lee, Sang-Bock;Park, Sun-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.565-574
    • /
    • 2004
  • Classification tree algorithms including as CART by Brieman et al.(1984) in some aspects, recursively partition the data space with the aim of making the distribution of the class variable as pure as within each partition and consist of several steps. SUPPORT(smoothed and unsmoothed piecewise-polynomial regression trees) method of Chaudhuri et al(1994), a weighted averaging technique is used to combine piecewise polynomial fits into a smooth one. We focus on applying SUPPORT to a binary class variable. Logistic model is considered in the caculation techniques and the results are shown good classification rates compared with other methods as CART, QUEST, and CHAID.

  • PDF

Performance Comparison of Mahalanobis-Taguchi System and Logistic Regression : A Case Study (마할라노비스-다구치 시스템과 로지스틱 회귀의 성능비교 : 사례연구)

  • Lee, Seung-Hoon;Lim, Geun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.39 no.5
    • /
    • pp.393-402
    • /
    • 2013
  • The Mahalanobis-Taguchi System (MTS) is a diagnostic and predictive method for multivariate data. In the MTS, the Mahalanobis space (MS) of reference group is obtained using the standardized variables of normal data. The Mahalanobis space can be used for multi-class classification. Once this MS is established, the useful set of variables is identified to assist in the model analysis or diagnosis using orthogonal arrays and signal-to-noise ratios. And other several techniques have already been used for classification, such as linear discriminant analysis and logistic regression, decision trees, neural networks, etc. The goal of this case study is to compare the ability of the Mahalanobis-Taguchi System and logistic regression using a data set.

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.1
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.