• Title/Summary/Keyword: Predictive decision tree

Search Result 116, Processing Time 0.024 seconds

A Study on a car Insurance purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree

  • AN, Su Hyun;YEO, Seong Hee;KANG, Minsoo
    • Korean Journal of Artificial Intelligence
    • /
    • v.9 no.1
    • /
    • pp.9-14
    • /
    • 2021
  • This paper predicted a model that indicates whether to buy a car based on primary health insurance customer data. Currently, automobiles are being used to land transportation and living, and the scope of use and equipment is expanding. This rapid increase in automobiles has caused automobile insurance to emerge as an essential business target for insurance companies. Therefore, if the car insurance sales are predicted and sold using the information of existing health insurance customers, it can generate continuous profits in the insurance company's operating performance. Therefore, this paper aims to analyze existing customer characteristics and implement a predictive model to activate advertisements for customers interested in such auto insurance. The goal of this study is to maximize the profits of insurance companies by devising communication strategies that can optimize business models and profits for customers. This study was conducted through the Microsoft Azure program, and an automobile insurance purchase prediction model was implemented using Health Insurance Cross-sell Prediction data. The program algorithm uses Two-Class Logistic Regression and Two-Class Boosted Decision Tree at the same time to compare two models and predict and compare the results. According to the results of this study, when the Threshold is 0.3, the AUC is 0.837, and the accuracy is 0.833, which has high accuracy. Therefore, the result was that customers with health insurance could induce a positive reaction to auto insurance purchases.

Risk factors of alcohol use disorder in Korean adults based on the decision tree analysis (의사결정나무분석을 이용한 성인의 알코올사용장애 위험요인)

  • Mi Young Kwon;Ji In Kim
    • The Journal of Korean Society for School & Community Health Education
    • /
    • v.24 no.1
    • /
    • pp.47-59
    • /
    • 2023
  • Objectives: The aim of this study was to identify risk factors of alcohol use disorder among Korean adults. Methods: Cross-sectional exploratory study based on data collected from Data from the 6th Korea National Health and Nutrition Examination Survey in 2015 were performed in this study. There were 3,248 participants who were 2,558 normal drinkers while 690 had alcohol use disorder. Decision tree analysis were used to exam socio-demographic and health-related factors to predict alcohol use disorder. Results: As a result of decision tree analysis, the predictive model for factors related to alcohol use disorder in Korean adults presented with 8 pathways. The significant predictors of alcohol use disorder were age, gender, smoking, marital status, and house income. Male smokers whose household income is 'high' or 'low' are most vulnerable to alcohol use disorders. Conclusions: This study indicates that need to consider health behavior and house income when we practice prevention policies and health education of alcohol use disorder.

Developing the high risk group predictive model for student direct loan default using data mining (데이터마이닝을 이용한 학자금 대출 부실 고위험군 예측모형 개발)

  • Choi, Jae-Seok;Han, Jun-Tae;Kim, Myeon-Jung;Jeong, Jina
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1417-1426
    • /
    • 2015
  • We develop the high risk group predictive model for loan default by utilizing the direct loan data from 2012 to 2014 of the Korea Student Aid Foundation. We perform the decision tree analysis using the data mining methodology and use SAS Enterprise Miner 13.2. As a result of this model, subject types were classified into 25 types. This study shows that the major influencing factors for the loan default are household income, national grant, age, overdue record, level of schooling, field of study, monthly repayment. The high risk group predictive model in this study will be the basis for segmented management service for preventing loan default.

Evaluation of Predictive Models for Early Identification of Dropout Students

  • Lee, JongHyuk;Kim, Mihye;Kim, Daehak;Gil, Joon-Min
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.630-644
    • /
    • 2021
  • Educational data analysis is attracting increasing attention with the rise of the big data industry. The amounts and types of learning data available are increasing steadily, and the information technology required to analyze these data continues to develop. The early identification of potential dropout students is very important; education is important in terms of social movement and social achievement. Here, we analyze educational data and generate predictive models for student dropout using logistic regression, a decision tree, a naïve Bayes method, and a multilayer perceptron. The multilayer perceptron model using independent variables selected via the variance analysis showed better performance than the other models. In addition, we experimentally found that not only grades but also extracurricular activities were important in terms of preventing student dropout.

Using Data Mining Techniques for Analysis of the Impacts of COVID-19 Pandemic on the Domestic Stock Prices: Focusing on Healthcare Industry (데이터 마이닝 기법을 통한 COVID-19 팬데믹의 국내 주가 영향 분석: 헬스케어산업을 중심으로)

  • Kim, Deok Hyun;Yoo, Dong Hee;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.30 no.3
    • /
    • pp.21-45
    • /
    • 2021
  • Purpose This paper analyzed the impacts of domestic stock market by a global pandemic such as COVID-19. We investigated how the overall pattern of the stock market changed due to the impact of the COVID-19 pandemic. In particular, we analyzed in depth the pattern of stock price, as well, tried to find what factors affect on stock market index(KOSPI) in the healthcare industry due to the COVID-19 pandemic. Design/methodology/approach We built a data warehouse from the databases in various industrial and economic fields to analyze the changes in the KOSPI due to COVID-19, particularly, the changes in the healthcare industry centered on bio-medicine. We collected daily stock price data of the KOSPI centered on the KOSPI-200 about two years before and one year after the outbreak of COVID-19. In addition, we also collected various news related to COVID-19 from the stock market by applying text mining techniques. We designed four experimental data sets to develop decision tree-based prediction models. Findings All prediction models from the four data sets showed the significant predictive power with explainable decision tree models. In addition, we derived significant 10 to 14 decision rules for each prediction model. The experimental results showed that the decision rules were enough to explain the domestic healthcare stock market patterns for before and after COVID-19.

Prediction of Safety Grade of Bridges Using the Classification Models of Decision Tree and Random Forest (의사결정나무 및 랜덤포레스트 분류 모델을 이용한 교량 안전등급 예측)

  • Hong, Jisu;Jeon, Se-Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.397-411
    • /
    • 2023
  • The number of deteriorated bridges with a service period of more than 30 years has been rapidly increasing in Korea. Accordingly, the importance of advanced maintenance technologies through the predictions of age-induced deterioration degree, condition, and performance of bridges is more and more noticed. The prediction method of the safety grade of bridges was proposed in this study using the classification models of the Decision Tree and the Random Forest based on machine learning. As a result of analyzing these models for the 8,850 bridges located in national roads with various evaluation indexes such as confusion matrix, balanced accuracy, recall, ROC curve, and AUC, the Random Forest largely showed better predictive performance than that of the Decision Tree. In particular, random under-sampling in the Random Forest showed higher predictive performance than that of other sampling techniques for the C and D grade bridges, with the recall of 83.4%, which need more attention to maintenance because of the significant deterioration degree. The proposed model can be usefully applied to rapidly identify the safety grade and to establish an efficient and economical maintenance plan of bridges that have not recently been inspected.

Using Predictive Analytics to Profile Potential Adopters of Autonomous Vehicles

  • Lee, Eun-Ju;Zafarzon, Nordirov;Zhang, Jing
    • Asia Marketing Journal
    • /
    • v.20 no.2
    • /
    • pp.65-83
    • /
    • 2018
  • Technological advances are bringing autonomous vehicles to the ever-evolving transportation system. Anticipating adoption of these technologies by users is essential to vehicle manufacturers for making more precise production and marketing strategies. The research investigates regulatory focus and consumer innovativeness with consumers' adoption of autonomous vehicles (AVs) and to consumers' subsequent willingness to pay for AVs. An online questionnaire was fielded to confirm predictions, and regression analysis was conducted to verify the model's validity. The results show that a promotion focus does not have a significantly positive effect on the automation level at which consumers will adopt AVs, but a prevention focus has a significantly positive effect on conditional AV adoption. Consumer innovativeness, consumers' novelty-seeking have a significantly positive relationship with high and full AV adoption, and consumers' independent decision-making has a significantly positive effect on full AV adoption. The higher the level of automation at which a consumer adopts AVs, the higher the willingness to pay for them. Finally, using a neural network and decision tree analyses, we show methods with which to describe three categories for potential adopters of AVs.

Feature Selection Effect of Classification Tree Using Feature Importance : Case of Credit Card Customer Churn Prediction (특성중요도를 활용한 분류나무의 입력특성 선택효과 : 신용카드 고객이탈 사례)

  • Yoon Hanseong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.2
    • /
    • pp.1-10
    • /
    • 2024
  • For the purpose of predicting credit card customer churn accurately through data analysis, a model can be constructed with various machine learning algorithms, including decision tree. And feature importance has been utilized in selecting better input features that can improve performance of data analysis models for several application areas. In this paper, a method of utilizing feature importance calculated from the MDI method and its effects are investigated in the credit card customer churn prediction problem with classification trees. Compared with several random feature selections from case data, a set of input features selected from higher value of feature importance shows higher predictive power. It can be an efficient method for classifying and choosing input features necessary for improving prediction performance. The method organized in this paper can be an alternative to the selection of input features using feature importance in composing and using classification trees, including credit card customer churn prediction.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

Signal Processing for Perpendicular Recording Systems

  • Lee, Jun;Woo, Choong-Chae
    • Journal of IKEEE
    • /
    • v.15 no.1
    • /
    • pp.70-75
    • /
    • 2011
  • Longitudinal recording has been the cornerstone of all two generations of magnetic recording systems, FDD and HDD. In recent, perpendicular recording has received much attention as promising technology for future high-density recording system Research into signal processing techniques is paramount for the issued storage system and is indispensable like longitudinal recording systems. This paper focuses on the performance evaluation of the various detectors under perpendicular recording system. Parameters for improving the their performance are examined for some detectors. Detectors considered in this work are the partial response maximum likelihood (PRML), noise-predictive maximum likelihood (NPML), fixed delay tree search with decision feedback (FDTS/DF), dual decision feedback equalizer (DDFE) and multilevel decision feedback equalizer (MDFE). Their performances are analyzed in terms of mean squared error (MSE) and noise power spectra, and similarity between recording channel and partial response (PR) channel.