• Title/Summary/Keyword: Regression tree algorithm

Search Result 118, Processing Time 0.027 seconds

A Study on the Comparison of Predictive Models of Cardiovascular Disease Incidence Based on Machine Learning

  • Ji Woo SEOK;Won ro LEE;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.1
    • /
    • pp.1-7
    • /
    • 2023
  • In this paper, a study was conducted to compare the prediction model of cardiovascular disease occurrence. It is the No.1 disease that accounts for 1/3 of the world's causes of death, and it is also the No. 2 cause of death in Korea. Primary prevention is the most important factor in preventing cardiovascular diseases before they occur. Early diagnosis and treatment are also more important, as they play a role in reducing mortality and morbidity. The Results of an experiment using Azure ML, Logistic Regression showed 88.6% accuracy, Decision Tree showed 86.4% accuracy, and Support Vector Machine (SVM) showed 83.7% accuracy. In addition to the accuracy of the ROC curve, AUC is 94.5%, 93%, and 92.4%, indicating that the performance of the machine learning algorithm model is suitable, and among them, the results of applying the logistic regression algorithm model are the most accurate. Through this paper, visualization by comparing the algorithms can serve as an objective assistant for diagnosis and guide the direction of diagnosis made by doctors in the actual medical field.

A GA-based Classification Model for Predicting Consumer Choice (유전 알고리듬 기반 제품구매예측 모형의 개발)

  • Min, Jae-H.;Jeong, Chul-Woo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.34 no.3
    • /
    • pp.29-41
    • /
    • 2009
  • The purpose of this paper is to develop a new classification method for predicting consumer choice based on genetic algorithm, and to validate Its prediction power over existing methods. To serve this purpose, we propose a hybrid model, and discuss Its methodological characteristics in comparison with other existing classification methods. Also, we conduct a series of experiments employing survey data of consumer choices of MP3 players to assess the prediction power of the model. The results show that the suggested model in this paper is statistically superior to the existing methods such as logistic regression model, artificial neural network model and decision tree model in terms of prediction accuracy. The model is also shown to have an advantage of providing several strategic information of practical use for consumer choice.

A GA-based Classification Model for Predicting Consumer Choice (유전 알고리듬 기반 제품구매예측 모형의 개발)

  • Min, Jae-Hyeong;Jeong, Cheol-U
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2008.10a
    • /
    • pp.1-7
    • /
    • 2008
  • The purpose of this paper is to develop a new classification method for predicting consumer choice based on genetic algorithm, and to validate its prediction power over existing methods. To serve this purpose, we propose a hybrid model, and discuss its methodological characteristics in comparison with other existing classification methods. Also, to assess the prediction power of the model, we conduct a series of experiments employing survey data of consumer choices of MP3 players. The results show that the suggested model in this paper is statistically superior to the existing methods such as logistic regression model, artificial neural network model and decision tree model in terms of prediction accuracy. The model is also shown to have an advantage of providing several strategic information of practical use for consumer choice.

  • PDF

Study on the Comparison and Analysis of Data Mining Models for the Efficient Customer Credit Evaluation (효율적인 신용평가를 위한 데이터마이닝 모형의 비교.분석에 관한 연구)

  • 김갑식
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.1
    • /
    • pp.161-174
    • /
    • 2004
  • This study is intended to suggest1 the optimized data mining model for the efficient customer credit evaluation in the capital finance industry. To accomplish the research objective, various data mining models for the customer credit evaluation are compared and analyzed. Furthermore, existing models such as Multi-Layered Perceptrons, Multivariate Discrimination Analysis, Radial Basis Function, Decision Tree, and Logistic Regression are employed for analyzing the customer information in the capital finance market and the detailed data of capital financing transactions. Finally, the data from the integrated model utilizing a genetic algorithm is compared with those of each individual model mentioned above. The results reveals that the integrated model is superior to other existing models.

  • PDF

A Study on the Combined Decision Tree(C4.5) and Neural Network Algorithm for Classification of Mobile Telecommunication Customer (이동통신고객 분류를 위한 의사결정나무(C4.5)와 신경망 결합 알고리즘에 관한 연구)

  • 이극노;이홍철
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.139-155
    • /
    • 2003
  • This paper presents the new methodology of analyzing and classifying patterns of customers in mobile telecommunication market to enhance the performance of predicting the credit information based on the decision tree and neural network. With the application of variance selection process from decision tree, the systemic process of defining input vector's value and the rule generation were developed. In point of customer management, this research analyzes current customers and produces the patterns of them so that the company can maintain good customer relationship and makes special management on the customer who has huh potential of getting out of contract in advance. The real implementation of proposed method shows that the predicted accuracy is higher than existing methods such as decision tree(CART, C4.5), regression, neural network and combined model(CART and NN).

  • PDF

A Study on Factors of Education's Outcome using Decision Trees (의사결정트리를 이용한 교육성과 요인에 관한 연구)

  • Kim, Wan-Seop
    • Journal of Engineering Education Research
    • /
    • v.13 no.4
    • /
    • pp.51-59
    • /
    • 2010
  • In order to manage the lectures efficiently in the university and improve the educational outcome, the process is needed that make diagnosis of the present educational outcome of each classes on a lecture and find factors of educational outcome. In most studies for finding the factors of the efficient lecture, statistical methods such as association analysis, regression analysis are used usually, and recently decision tree analysis is employed, too. The decision tree analysis have the merits that is easy to understand a result model, and to be easy to apply for the decision making, but have the weaknesses that is not strong for characteristic of input data such as multicollinearity. This paper indicates the weaknesses of decision tree analysis, and suggests the experimental solution using multiple decision tree algorithm to supplement these problems. The experimental result shows that the suggested method is more effective in finding the reliable factors of the educational outcome.

  • PDF

Improvement of MLLR Speaker Adaptation Algorithm to Reduce Over-adaptation Using ICA and PCA (과적응 감소를 위한 주성분 분석 및 독립성분 분석을 이용한 MLLR 화자적응 알고리즘 개선)

  • 김지운;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.539-544
    • /
    • 2003
  • This paper describes how to reduce the effect of an occupation threshold by that the transform of mixture components of HMM parameters is controlled in hierarchical tree structure to prevent from over-adaptation. To reduce correlations between data elements and to remove elements with less variance, we employ PCA (Principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible, and decline the effect of over-adaptation. When we set lower occupation threshold and increase the number of transformation function, ordinary MLLR adaptation algorithm represents lower recognition rate than SI models, whereas the proposed MLLR adaptation algorithm represents the improvement of over 2% for the word recognition rate as compared to performance of SI models.

Prediction of Multi-Physical Analysis Using Machine Learning (기계학습을 이용한 다중물리해석 결과 예측)

  • Lee, Keun-Myoung;Kim, Kee-Young;Oh, Ung;Yoo, Sung-kyu;Song, Byeong-Suk
    • Journal of IKEEE
    • /
    • v.20 no.1
    • /
    • pp.94-102
    • /
    • 2016
  • This paper proposes a new prediction method to reduce times and labor of repetitive multi-physics simulation. To achieve exact results from the whole simulation processes, complex modeling and huge amounts of time are required. Current multi-physics analysis focuses on the simulation method itself and the simulation environment to reduce times and labor. However this paper proposes an alternative way to reduce simulation times and labor by exploiting machine learning algorithm trained with data set from simulation results. Through comparing each machine learning algorithm, Gaussian Process Regression showed the best performance with under 100 training data and how similar results can be achieved through machine-learning without a complex simulation process. Given trained machine learning algorithm, it's possible to predict the result after changing some features of the simulation model just in a few second. This new method will be helpful to effectively reduce simulation times and labor because it can predict the results before more simulation.

Mapping for Biodiversity Using National Forest Inventory Data and GIS (국가 생태정보를 활용한 생물다양성 지도 구축)

  • Jung, Da-Jung;Kang, Kyung-Ho;Heo, Joon;Kim, Chang-Jae;Kim, Sung-Ho;Lee, Jung-Bin
    • Journal of Environmental Impact Assessment
    • /
    • v.19 no.6
    • /
    • pp.573-581
    • /
    • 2010
  • Natural ecosystem is an essential part to connect with the plan for biodiversity conservation in response strategy against climate change. For connecting biodiversity conservation with climate change strategy, Europe, America, Japan, and China are making an effort to discuss protection necessity through national biodiversity valuation but precedent studies lack in Korea. In this study, we made biodiversity maps representing biodiversity distribution range using species richness in National Forest Inventory (NFI) and Forest Description data. Using regression tree algorithm, we divided various classes by decision rule and constructed biodiversity maps, which has accuracy level of over 70%. Therefore, the biodiversity maps produced in this study can be used as base information for decision makers and plan for conservation of biodiversity & continuous management. Furthermore, this study can suggest a strategy for increasing efficiency of forest information in national level.

Factor Analysis on Injured People Using Data Mining Technique (데이터 마이닝 기법을 활용한 산업재해자들에 대한 요인분석)

  • Leem Young-Moon;Hwang Young-Seob;Choi Yo-Han
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.4
    • /
    • pp.61-71
    • /
    • 2005
  • Many researches have been focused on the analysis of industry disasters in order to reduce them. As a similar endeavor, this paper provides a propensity analysis of injured people from various industries using classification and regression tree(CART), a data mining algorithm. The sample for this work was chosen from 25,157data related to various industries during one year ( $2003.2\sim2004.1$ ) at Kangwon-Do in Korea. For the purpose of this paper, eight independent variables (injured date, injured time, injured month, type of Injured person, continuous service period, sex, company size, age)are taken from injured person group. According to the analysis result, it is found that five out of the eight factors that are predicted as significant have salient effects. Factors of season, time/hour, day of the week, or month which disasters happened do not show any significant effect. This paper provides common features of injured people. The provided analysis result will be helpful as a starting point for root cause analysis and reduction of industry disasters and also for development of a guideline of safety management.