• Title/Summary/Keyword: Logistic models

Search Result 804, Processing Time 0.029 seconds

A Study on a car Insurance purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree

  • AN, Su Hyun;YEO, Seong Hee;KANG, Minsoo
    • Korean Journal of Artificial Intelligence
    • /
    • v.9 no.1
    • /
    • pp.9-14
    • /
    • 2021
  • This paper predicted a model that indicates whether to buy a car based on primary health insurance customer data. Currently, automobiles are being used to land transportation and living, and the scope of use and equipment is expanding. This rapid increase in automobiles has caused automobile insurance to emerge as an essential business target for insurance companies. Therefore, if the car insurance sales are predicted and sold using the information of existing health insurance customers, it can generate continuous profits in the insurance company's operating performance. Therefore, this paper aims to analyze existing customer characteristics and implement a predictive model to activate advertisements for customers interested in such auto insurance. The goal of this study is to maximize the profits of insurance companies by devising communication strategies that can optimize business models and profits for customers. This study was conducted through the Microsoft Azure program, and an automobile insurance purchase prediction model was implemented using Health Insurance Cross-sell Prediction data. The program algorithm uses Two-Class Logistic Regression and Two-Class Boosted Decision Tree at the same time to compare two models and predict and compare the results. According to the results of this study, when the Threshold is 0.3, the AUC is 0.837, and the accuracy is 0.833, which has high accuracy. Therefore, the result was that customers with health insurance could induce a positive reaction to auto insurance purchases.

Development for City Bus Dirver's Accident Occurrence Prediction Model Based on Digital Tachometer Records (디지털 운행기록에 근거한 시내버스 운전자의 사고발생 예측모형 개발)

  • Kim, Jung-yeul;Kum, Ki-jung
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.1
    • /
    • pp.1-15
    • /
    • 2016
  • This study aims to develop a model by which city bus drivers who are likely to cause an accident can be figured out based on the information about their actual driving records. For this purpose, from the information about the actual driving records of the drivers who have caused an accident and those who have not caused any, significance variables related to traffic accidents are drawn, and the accuracy between models is compared for the classification models developed, applying a discriminant analysis and logistic regression analysis. In addition, the developed models are applied to the data on other drivers' driving records to verify the accuracy of the models. As a result of developing a model for the classification of drivers who are likely to cause an accident, when deceleration ($X_{deceleration}$) and acceleration to the right ($Y_{right}$) are simultaneously in action, this variable was drawn as the optimal factor variable of the classification of drivers who had caused an accident, and the prediction model by discriminant analysis classified drivers who had caused an accident at a rate up to 62.8%, and the prediction model by logistic regression analysis could classify those who had caused an accident at a rate up to 76.7%. In addition, as a result of the verification of model predictive power of the models showed an accuracy rate of 84.1%.

Analyzing Customer Management Data by Data Mining: Case Study on Chum Prediction Models for Insurance Company in Korea

  • Cho, Mee-Hye;Park, Eun-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1007-1018
    • /
    • 2008
  • The purpose of this case study is to demonstrate database-marketing management. First, we explore original variables for insurance customer's data, modify them if necessary, and go through variable selection process before analysis. Then, we develop churn prediction models using logistic regression, neural network and SVM analysis. We also compare these three data mining models in terms of misclassification rate.

  • PDF

Estimation of Asymmetric Bell Shaped Probability Curve using Logistic Regression (로지스틱 회귀모형을 이용한 비대칭 종형 확률곡선의 추정)

  • 박성현;김기호;이소형
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.71-80
    • /
    • 2001
  • Logistic regression model is one of the most popular linear models for a binary response variable and used for the estimation of probability function. In many practical situations, the probability function can be expressed by a bell shaped curve and such a function can be estimated by a second order logistic regression model. However, when the probability curve is asymmetric, the estimation results using a second order logistic regression model may not be precise because a second order logistic regression model is a symmetric function. In addition, even if a second order logistic regression model is used, the interpretation for the effect of second order term may not be easy. In this paper, in order to alleviate such problems, an estimation method for asymmetric probabiity curve based on a first order logistic regression model and iterative bi-section method is proposed and its performance is compared with that of a second order logistic regression model by a simulation study.

  • PDF

Designing Neural Network Using Genetic Algorithm (유전자 알고리즘을 이용한 신경망 설계)

  • Park, Jeong-Sun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.9
    • /
    • pp.2309-2314
    • /
    • 1997
  • The study introduces a neural network to predict the bankruptcy of insurance companies. As a method to optimize the network, a genetic algorithm suggests optimal structure and network parameters. The neural network designed by genetic algorithm is compared with discriminant analysis, logistic regression, ID3, and CART. The robust neural network model shows the best performance among those models compared.

  • PDF

A study of methodology for identification models of cardiovascular diseases based on data mining (데이터마이닝을 이용한 심혈관질환 판별 모델 방법론 연구)

  • Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.339-345
    • /
    • 2022
  • Cardiovascular diseases is one of the leading causes of death in the world. The objectives of this study were to build various models using sociodemographic variables based on three variable selection methods and seven machine learning algorithms for the identification of hypertension and dyslipidemia and to evaluate predictive powers of the models. In experiments based on full variables and correlation-based feature subset selection methods, our results showed that performance of models using naive Bayes was better than those of models using other machine learning algorithms in both two diseases. In wrapper-based feature subset selection method, performance of models using logistic regression was higher than those of models using other algorithms. Our finding may provide basic data for public health and machine learning fields.

Selecting the Best Soil Particle-Size Distribution Model for Korean Soils

  • Hwang, Sang-Il
    • Journal of Environmental Policy
    • /
    • v.2 no.1
    • /
    • pp.77-86
    • /
    • 2003
  • Particle-size distributions (PSDs) are widely used for the estimation of soil hydraulic properties. The objective of this study was to select the best model among the nine PSD models with different underlying assumptions, by using a variety of Korean soils. The Fredlund model with four parameters, the logistic growth curve, and Weibull distribution model showed the highest performance compared to the other models with the majority of soils studied. It was interesting to find that the logistic growth function with no fitting parameters showed a great fitting performance.

  • PDF

Influential Points in GLMs via Backwards Stepping

  • Jeong, Kwang-Mo;Oh, Hae-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.197-212
    • /
    • 2002
  • When assessing goodness-of-fit of a model, a small subset of deviating observations can give rise to a significant lack of fit. It is therefore important to identify such observations and to assess their effects on various aspects of analysis. A Cook's distance measure is usually used to detect influential observation. But it sometimes is not fully effective in identifying truly influential set of observations because there may exist masking or swamping effects. In this paper we confine our attention to influential subset In GLMs such as logistic regression models and loglinear models. We modify a backwards stepping algorithm, which was originally suggested for detecting outlying cells in contingency tables, to detect influential observations in GLMs. The algorithm consists of two steps, the identification step and the testing step. In identification step we Identify influential observations based on influencial measures such as Cook's distances. On the other hand in testing step we test the subset of identified observations to be significant or not Finally we explain the proposed method through two types of dataset related to logistic regression model and loglinear model, respectively.

Selection of Important Variables in the Classification Model for Successful Flight Training (조종사 비행훈련 성패예측모형 구축을 위한 중요변수 선정)

  • Lee, Sang-Heon;Lee, Sun-Doo
    • IE interfaces
    • /
    • v.20 no.1
    • /
    • pp.41-48
    • /
    • 2007
  • The main purpose of this paper is cost reduction in absurd pilot positive expense and human accident prevention which is caused by in the pilot selection process. We use classification models such as logistic regression, decision tree, and neural network based on aptitude test results of 505 ROK Air Force applicants in 2001~2004. First, we determine the reliability and propriety against the aptitude test system which has been improved. Based on this conference flight simulator test item was compared to the new aptitude test item in order to make additional yes or no decision from different models in terms of classification accuracy, ROC and Response Threshold side. Decision tree was selected as the most efficient for each sequential flight training result and the last flight training results predict excellent. Therefore, we propose that the standard of pilot selection be adopted by the decision tree and it presents in the aptitude test item which is new a conference flight simulator test.

Conceptual Data Modeling of Integrated Information System for Research & Development Configuration Management (연구개발 형상관리 자동화체계에 대한 개념적 데이터모델링)

  • 김인주
    • Journal of the military operations research society of Korea
    • /
    • v.25 no.1
    • /
    • pp.87-106
    • /
    • 1999
  • There are many technical datum in related with design, test & evaluation and logistic support which will be exchanged between geographically isolated units and heterogeneous hardwares & softwares in developing and operating the weapon systems. The paper proposes the conceptual database schema to establish configuration management information systems in which these datum can be automatically interchanged, tracked, audited and status-accounted without errors under the various environments. The paper investigates how to identify and classify the data in accordance with document identification, task analysis, system development, logistic support, system test & evaluation and data management. Furthermore, the investigation includes drawing the subject areas and modeling the conceptual database schema to explain the relationships between these datum. Thus, the paper results in the conceptual framework and data models of configuration management information systems, while additional customization efforts be required in applying the models to a specific weapon systems R&D.

  • PDF