• Title/Summary/Keyword: Classification and Regression tree

Search Result 208, Processing Time 0.028 seconds

Classification and Regression Tree Analysis for Molecular Descriptor Selection and Binding Affinities Prediction of Imidazobenzodiazepines in Quantitative Structure-Activity Relationship Studies

  • Atabati, Morteza;Zarei, Kobra;Abdinasab, Esmaeil
    • Bulletin of the Korean Chemical Society
    • /
    • v.30 no.11
    • /
    • pp.2717-2722
    • /
    • 2009
  • The use of the classification and regression tree (CART) methodology was studied in a quantitative structure-activity relationship (QSAR) context on a data set consisting of the binding affinities of 39 imidazobenzodiazepines for the α1 benzodiazepine receptor. The 3-D structures of these compounds were optimized using HyperChem software with semiempirical AM1 optimization method. After optimization a set of 1481 zero-to three-dimentional descriptors was calculated for each molecule in the data set. The response (dependent variable) in the tree model consisted of the binding affinities of drugs. Three descriptors (two topological and one 3D-Morse descriptors) were applied in the final tree structure to describe the binding affinities. The mean relative error percent for the data set is 3.20%, compared with a previous model with mean relative error percent of 6.63%. To evaluate the predictive power of CART cross validation method was also performed.

Comparison of machine learning algorithms for regression and classification of ultimate load-carrying capacity of steel frames

  • Kim, Seung-Eock;Vu, Quang-Viet;Papazafeiropoulos, George;Kong, Zhengyi;Truong, Viet-Hung
    • Steel and Composite Structures
    • /
    • v.37 no.2
    • /
    • pp.193-209
    • /
    • 2020
  • In this paper, the efficiency of five Machine Learning (ML) methods consisting of Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Tree Booting (GTB) for regression and classification of the Ultimate Load Factor (ULF) of nonlinear inelastic steel frames is compared. For this purpose, a two-story, a six-story, and a twenty-story space frame are considered. An advanced nonlinear inelastic analysis is carried out for the steel frames to generate datasets for the training of the considered ML methods. In each dataset, the input variables are the geometric features of W-sections and the output variable is the ULF of the frame. The comparison between the five ML methods is made in terms of the mean-squared-error (MSE) for the regression models and the accuracy for the classification models, respectively. Moreover, the ULF distribution curve is calculated for each frame and the strength failure probability is estimated. It is found that the GTB method has the best efficiency in both regression and classification of ULF regardless of the number of training samples and the space frames considered.

Comparison of Variable Importance Measures in Tree-based Classification (나무구조의 분류분석에서 변수 중요도에 대한 고찰)

  • Kim, Na-Young;Lee, Eun-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.717-729
    • /
    • 2014
  • Projection pursuit classification tree uses a 1-dimensional projection with the view of the most separating classes in each node. These projection coefficients contain information distinguishing two groups of classes from each other and can be used to calculate the importance measure of classification in each variable. This paper reviews the variable importance measure with increasing interest in line with growing data size. We compared the performances of projection pursuit classification tree with those of classification and regression tree(CART) and random forest. Projection pursuit classification tree are found to produce better performance in most cases, particularly with highly correlated variables. The importance measure of projection pursuit classification tree performs slightly better than the importance measure of random forest.

Voice Personality Transformation Using a Multiple Response Classification and Regression Tree (다중 응답 분류회귀트리를 이용한 음성 개성 변환)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.253-261
    • /
    • 2004
  • In this paper, a new voice personality transformation method is proposed. which modifies speaker-dependent feature variables in the speech signals. The proposed method takes the cepstrum vectors and pitch as the transformation paremeters, which represent vocal tract transfer function and excitation signals, respectively. To transform these parameters, a multiple response classification and regression tree (MR-CART) is employed. MR-CART is the vector extended version of a conventional CART, whose response is given by the vector form. We evaluated the performance of the proposed method by comparing with a previously proposed codebook mapping method. We also quantitatively analyzed the performance of voice transformation and the complexities according to various observations. From the experimental results for 4 speakers, the proposed method objectively outperforms a conventional codebook mapping method. and we also observed that the transformed speech sounds closer to target speech.

A Comparative Study of Predictive Factors for Passing the National Physical Therapy Examination using Logistic Regression Analysis and Decision Tree Analysis

  • Kim, So Hyun;Cho, Sung Hyoun
    • Physical Therapy Rehabilitation Science
    • /
    • v.11 no.3
    • /
    • pp.285-295
    • /
    • 2022
  • Objective: The purpose of this study is to use logistic regression and decision tree analysis to identify the factors that affect the success or failurein the national physical therapy examination; and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 76,727 subjects from the physical therapy national examination data provided by the Korea Health Personnel Licensing Examination Institute. The target variable was pass or fail, and the input variables were gender, age, graduation status, and examination area. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In the logistic regression analysis, subjects in their 20s (Odds ratio, OR=1, reference), expected to graduate (OR=13.616, p<0.001) and from the examination area of Jeju-do (OR=3.135, p<0.001), had a high probability of passing. In the decision tree, the predictive factors for passing result had the greatest influence in the order of graduation status (x2=12366.843, p<0.001) and examination area (x2=312.446, p<0.001). Logistic regression analysis showed a specificity of 39.6% and sensitivity of 95.5%; while decision tree analysis showed a specificity of 45.8% and sensitivity of 94.7%. In classification accuracy, logistic regression and decision tree analysis showed 87.6% and 88.0% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. Additionally, whether actual test takers passed the national physical therapy examination could be determined, by applying the constructed prediction model and prediction rate.

A Comparative Study of Predictive Factors for Hypertension using Logistic Regression Analysis and Decision Tree Analysis

  • SoHyun Kim;SungHyoun Cho
    • Physical Therapy Rehabilitation Science
    • /
    • v.12 no.2
    • /
    • pp.80-91
    • /
    • 2023
  • Objective: The purpose of this study is to identify factors that affect the incidence of hypertension using logistic regression and decision tree analysis, and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 9,859 subjects from the Korean health panel annual 2019 data provided by the Korea Institute for Health and Social Affairs and National Health Insurance Service. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In logistic regression analysis, those who were 60 years of age or older (Odds ratio, OR=68.801, p<0.001), those who were divorced/widowhood/separated (OR=1.377, p<0.001), those who graduated from middle school or younger (OR=1, reference), those who did not walk at all (OR=1, reference), those who were obese (OR=5.109, p<0.001), and those who had poor subjective health status (OR=2.163, p<0.001) were more likely to develop hypertension. In the decision tree, those over 60 years of age, overweight or obese, and those who graduated from middle school or younger had the highest probability of developing hypertension at 83.3%. Logistic regression analysis showed a specificity of 85.3% and sensitivity of 47.9%; while decision tree analysis showed a specificity of 81.9% and sensitivity of 52.9%. In classification accuracy, logistic regression and decision tree analysis showed 73.6% and 72.6% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. It is thought that both analysis methods can be used as useful data for constructing a predictive model for hypertension.

Wireless Internet Service Classification using Data Mining (데이터 마이닝을 이용한 무선 인터넷 서비스 분류기법)

  • Lee, Seong-Jin;Song, Jong-Woo;Ahn, Soo-Han;Won, You-Jip;Chang, Jae-Sung
    • Journal of KIISE:Information Networking
    • /
    • v.36 no.3
    • /
    • pp.153-162
    • /
    • 2009
  • It is a challenging work for service operators to accurately classify different services, which runs on various wireless networks based upon numerous platforms. This works focuses on design and implementation of a classifier, which accurately classifies applications, which are captured horn WiBro Network. Notion of session is introduced for the classifier, instead of commonly used Flow to develop a classifier. Based on session information of given traffic, two classification algorithms are presented, Classification and Regression Tree and Support Vector Machine. Both algorithms are capable of classifying accurately and effectively with misclassification rate of 0.85%, and 0.94%, respectively. This work shows that classifier using CART provides ease of interpreting the result and implementation.

Development of Traffic Accident Models in Seoul Considering Land Use Characteristics (토지이용특성을 고려한 서울시 교통사고 발생 모형 개발)

  • Lim, Samjin;Park, Juntae
    • Journal of the Society of Disaster Information
    • /
    • v.9 no.1
    • /
    • pp.30-49
    • /
    • 2013
  • In this research we developed a new traffic accident forecasting model on the basis of land use. A new traffic accident forecasting model by type was developed based on market segmentation and further introduction of variables that may reflect characteristics of various regions using Classification and Regression Tree Method. From the results of analysis, activities variables such as the registered population, commuters as well as road size, traffic accidents causing facilities being the subjects of activities were derived as variables explaining traffic accidents.

Prediction of Academic Performance of College Students with Bipolar Disorder using different Deep learning and Machine learning algorithms

  • Peerbasha, S.;Surputheen, M. Mohamed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.7
    • /
    • pp.350-358
    • /
    • 2021
  • In modern years, the performance of the students is analysed with lot of difficulties, which is a very important problem in all the academic institutions. The main idea of this paper is to analyze and evaluate the academic performance of the college students with bipolar disorder by applying data mining classification algorithms using Jupiter Notebook, python tool. This tool has been generally used as a decision-making tool in terms of academic performance of the students. The various classifiers could be logistic regression, random forest classifier gini, random forest classifier entropy, decision tree classifier, K-Neighbours classifier, Ada Boost classifier, Extra Tree Classifier, GaussianNB, BernoulliNB are used. The results of such classification model deals with 13 measures like Accuracy, Precision, Recall, F1 Measure, Sensitivity, Specificity, R Squared, Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, TPR, TNR, FPR and FNR. Therefore, conclusion could be reached that the Decision Tree Classifier is better than that of different algorithms.

Study on Detection Technique for Cochlodinium polykrikoides Red tide using Logistic Regression Model and Decision Tree Model (로지스틱 회귀모형과 의사결정나무 모형을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

  • Bak, Su-Ho;Kim, Heung-Min;Kim, Bum-Kyu;Hwang, Do-Hyun;Unuzaya, Enkhjargal;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.4
    • /
    • pp.777-786
    • /
    • 2018
  • This study propose a new method to detect Cochlodinium polykrikoides on satellite images using logistic regression and decision tree. We used spectral profiles(918) extracted from red tide, clear water and turbid water as training data. The 70% of the entire data set was extracted and used for model training, and the classification accuracy of the model was evaluated by using the remaining 30%. As a result of the accuracy evaluation, the logistic regression model showed about 97% classification accuracy, and the decision tree model showed about 86% classification accuracy.