• Title/Summary/Keyword: Regression trees

Search Result 251, Processing Time 0.029 seconds

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.1
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.

Investment, Export, and Exchange Rate on Prediction of Employment with Decision Tree, Random Forest, and Gradient Boosting Machine Learning Models (투자와 수출 및 환율의 고용에 대한 의사결정 나무, 랜덤 포레스트와 그래디언트 부스팅 머신러닝 모형 예측)

  • Chae-Deug Yi
    • Korea Trade Review
    • /
    • v.46 no.2
    • /
    • pp.281-299
    • /
    • 2021
  • This paper analyzes the feasibility of using machine learning methods to forecast the employment. The machine learning methods, such as decision tree, artificial neural network, and ensemble models such as random forest and gradient boosting regression tree were used to forecast the employment in Busan regional economy. The following were the main findings of the comparison of their predictive abilities. First, the forecasting power of machine learning methods can predict the employment well. Second, the forecasting values for the employment by decision tree models appeared somewhat differently according to the depth of decision trees. Third, the predictive power of artificial neural network model, however, does not show the high predictive power. Fourth, the ensemble models such as random forest and gradient boosting regression tree model show the higher predictive power. Thus, since the machine learning method can accurately predict the employment, we need to improve the accuracy of forecasting employment with the use of machine learning methods.

Special-Days Load Handling Method using Neural Networks and Regression Models (신경회로망과 회귀모형을 이용한 특수일 부하 처리 기법)

  • 고희석;이세훈;이충식
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.16 no.2
    • /
    • pp.98-103
    • /
    • 2002
  • In case of power demand forecasting, the most important problems are to deal with the load of special-days. Accordingly, this paper presents the method that forecasting long (the Lunar New Year, the Full Moon Festival) and short(the Planting Trees Day, the Memorial Day, etc) special-days peak load using neural networks and regression models. long and short special-days peak load forecast by neural networks models uses pattern conversion ratio and four-order orthogonal polynomials regression models. There are using that special-days peak load data during ten years(1985∼1994). In the result of special-days peak load forecasting, forecasting % error shows good results as about 1 ∼2[%] both neural networks models and four-order orthogonal polynomials regression models. Besides, from the result of analysis of adjusted coefficient of determination and F-test, the significance of the are convinced four-order orthogonal polynomials regression models. When the neural networks models are compared with the four-order orthogonal polynomials regression models at a view of the results of special-days peak load forecasting, the neural networks models which uses pattern conversion ratio are more effective on forecasting long special-days peak load. On the other hand, in case of forecasting short special-days peak load, both are valid.

Satellite-based Hybrid Drought Assessment using Vegetation Drought Response Index in South Korea (VegDRI-SKorea) (식생가뭄반응지수 (VegDRI)를 활용한 위성영상 기반 가뭄 평가)

  • Nam, Won-Ho;Tadesse, Tsegaye;Wardlow, Brian D.;Jang, Min-Won;Hong, Suk-Young
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.57 no.4
    • /
    • pp.1-9
    • /
    • 2015
  • The development of drought index that provides detailed-spatial-resolution drought information is essential for improving drought planning and preparedness. The objective of this study was to develop the concept of using satellite-based hybrid drought index called the Vegetation Drought Response Index in South Korea (VegDRI-SKorea) that could improve spatial resolution for monitoring local and regional drought. The VegDRI-SKorea was developed using the Classification And Regression Trees (CART) algorithm based on remote sensing data such as Normalized Difference Vegetation Index (NDVI) from MODIS satellite images, climate drought indices such as Self Calibrating Palmer Drought Severity Index (SC-PDSI) and Standardized Precipitation Index (SPI), and the biophysical data such as land cover, eco region, and soil available water capacity. A case study has been done for the 2012 drought to evaluate the VegDRI-SKorea model for South Korea. The VegDRI-SKorea represented the drought areas from the end of May and to the severe drought at the end of June. Results show that the integration of satellite imageries and various associated data allows us to get improved both spatially and temporally drought information using a data mining technique and get better understanding of drought condition. In addition, VegDRI-SKorea is expected to contribute to monitor the current drought condition for evaluating local and regional drought risk assessment and assisting drought-related decision making.

Development of medical/electrical convergence software for classification between normal and pathological voices (장애 음성 판별을 위한 의료/전자 융복합 소프트웨어 개발)

  • Moon, Ji-Hye;Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.187-192
    • /
    • 2015
  • If the software is developed to analyze the speech disorder, the application of various converged areas will be very high. This paper implements the user-friendly program based on CART(Classification and regression trees) analysis to distinguish between normal and pathological voices utilizing combination of the acoustical and HOS(Higher-order statistics) parameters. It means convergence between medical information and signal processing. Then the acoustical parameters are Jitter(%) and Shimmer(%). The proposed HOS parameters are means and variances of skewness(MOS and VOS) and kurtosis(MOK and VOK). Database consist of 53 normal and 173 pathological voices distributed by Kay Elemetrics. When the acoustical and proposed parameters together are used to generate the decision tree, the average accuracy is 83.11%. Finally, we developed a program with more user-friendly interface and frameworks.

Development of Predictive Model of Social Activity for the Elderly in Korea using CRT Algorithm (CRT 알고리즘을 이용한 우리나라 노인의 사회활동 영향요인 예측 모형 개발)

  • Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.243-248
    • /
    • 2018
  • The social activities of the elderly are important in successfully achieving aging by providing opportunities for social interaction to enhance life satisfaction. The purpose of this study is to identify the related factors of the elderly social activities and build a statistical classification model to predict social activities. Subjects were 1,864 elderly people (829 males, 1,035 females) who completed the community health survey in 2015. Outcome variables were defined as the experience of social activity during the past month(yes, no). The prediction model was constructed using decision tree model based on Classification and Regression Trees (CRT) algorithm. The results of this study were subjective health, frequency of meeting with neighbors, frequency of meeting with relatives, and living with spouse were significant variables of social participation. The most prevalent predictor was the subjective health level. In order to prepare for the successful aging of the super aged society based on the results of this study, social attention and support for the social activities of the elderly are required.

A Study on the Evaluation for the Improvement of Streetscape through Relationship Analysis between Psychological Consciousness and Physical Elements - Focused on the Gwangbok Street, Busan - (심리적 의식과 물리적 요소의 상관성 분석을 통한 가로경관 개선사업 평가에 관한 연구 - 부산시 광복로를 대상으로 -)

  • Yang, Jae Hyuk;Lee, Kang Hee
    • KIEAE Journal
    • /
    • v.9 no.6
    • /
    • pp.37-44
    • /
    • 2009
  • This study conducted an evaluation of the streetscape improvement project of the Gwangbok Street through analysis of correlations between psychological consciousness and physical elements in the street by comparing selected pictures of streetscape before and after the project. This work has conducted an analysis of Psychological characteristics by semantic differential method and an analysis of physical elements which influence psychological characteristics by regression analysis. According to the results of psychological analysis, the senses of interest, orderliness, esthetic and preference were highly improved in A district and the senses of openess, stability, orderliness were highly improved in B district. The results of analyzing physical elements influencing the improvement of these psychological consciousness in A district elucidated ; 1) the works of improving signboard and elevation of the building affected the whole these psychological consciousness, 2) the work of planting trees, making plant rests in the street improved the senses of orderliness, esthetic, 3) the work of changing finishing materials in the street enhanced the senses of esthetic, preference, 4) the work of adjusting the width between the road and the pavement, the work of improving traffic enhanced the senses of interest, preference. In the meantime in B district, the works of improving signboard and elevation of the building and those of planting trees, making plant rests in the street improved the senses of openess and orderliness. Also the work of improving traffic enhanced the senses of openess, stability and the work of expanding the pavement, that of changing finishing materials in the street improved the senses of openess, stability, orderliness.

Study on Development of Classification Model and Implementation for Diagnosis System of Sasang Constitution (사상체질 분류모형 개발 및 진단시스템의 구현에 관한 연구)

  • Beum, Soo-Gyun;Jeon, Mi-Ran;Oh, Am-Suk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.08a
    • /
    • pp.155-159
    • /
    • 2008
  • In this thesis, in order to develop a new classification model of Sasang Constitutional medical types, which is helpful for improving the accuracy of diagnosis of medical types. various data-mining classification models such as discriminant analysis. decision trees analysis, neural networks analysis, logistics regression analysis, clustering analysis which are main classification methods were applied to the questionnaires of medical type classification. In this manner, a model which scientifically classifies constitutional medical types in the field of Sasang Constitutional Medicine, one of a traditional Korean medicine, has been developed. Also, the above-mentioned analysis models were systematically compared and analyzed. In this study, a classification of Sasang constitutional medical types was developed based on the discriminate analysis model and decision trees analysis model of which accuracy is relatively high, of which analysis procedure is easy to understand and to explain and which are easy to implement. Also, a diagnosis system of Sasang constitution was implemented applying the two analysis models.

  • PDF

CORRELATION ANALYSIS BETWEEN FOREST VOLUME, ETM+ BANDS, AND HEIGHT ESTIMATED FROM C-BAND SRTM PRODUCT

  • Kim, Jin-Woo;Kim, Jong-Hong;Lee, Jung-Bin;Heo, Joon
    • Proceedings of the KSRS Conference
    • /
    • v.1
    • /
    • pp.512-515
    • /
    • 2006
  • Forest stand height and volume are important indicators for management purpose as well as for the environmental analysis. Shuttle Radar Topography Mission (SRTM) is backscattered over forest canopy and DSM can be acquired from such scattering characteristic, while National Elevation Dataset (NED) provides bare earth elevation data. The difference between SRTM and NED is estimated as tree height, and it is correlated with forest parameters, it is correlated with forest parameters, including average DBH, Trees per acre, net BF per acre, and total Net MBF. Especially, among them, net Board Foot(BF) per acre is the index that well represents forest volume. The Project site was Douglas-fir dominating plantation area in the western Washington an the northern Oregon in the U.S. This study shows a relationship of high correlation between the forest parameters and the product from SRTM, NED, and ETM+. This research performs multi regression analysis and regression tree algorithm, and can get more improved relationship between several parameters.

  • PDF