• Title/Summary/Keyword: logistic model

Search Result 1,976, Processing Time 0.029 seconds

Logistic Regression for Investigating Credit Card Default

  • Yang, Jeong-Won;Ha, Sung-Ho;Min, Ji-Hong
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2008.10b
    • /
    • pp.164-169
    • /
    • 2008
  • The increasing late-payment rate of credit card customers caused by a recent economic downturn are incurring not only reduced profit of department stores but also significant loss. Under this pressure, the objective of credit forecasting is extended from presumption of good or bad customers to contribution to revenue growth. As a method of managing defaults of department store credit card, this study classifies credit delinquents into some clusters, analyzes repaying patterns of customers in each cluster, and develops credit forecasting system to manage delinquents of department store credit card using data of Korean D department store's delinquents. The model presented by this study uses Kohonen network, a kind of artificial neural network of data mining techniques to cluster credit delinquents into groups. Logistic regression model is also used to predict repayment rate of customers of each cluster per period. The accuracy of presented system for the whole clusters is 92.3%.

  • PDF

Evaluations of predicted models fitted for data mining - comparisons of classification accuracy and training time for 4 algorithms (데이터마이닝기법상에서 적합된 예측모형의 평가 -4개분류예측모형의 오분류율 및 훈련시간 비교평가 중심으로)

  • Lee, Sang-Bock
    • Journal of the Korean Data and Information Science Society
    • /
    • v.12 no.2
    • /
    • pp.113-124
    • /
    • 2001
  • CHAID, logistic regression, bagging trees, and bagging trees are compared on SAS artificial data set as HMEQ in terms of classification accuracy and training time. In error rates, bagging trees is at the top, although its run time is slower than those of others. The run time of logistic regression is best among given models, but there is no uniformly efficient model satisfied in both criteria.

  • PDF

Using Classification function to integrate Discriminant Analysis, Logistic Regression and Backpropagation Neural Networks for Interest Rates Forecasting

  • Oh, Kyong-Joo;Ingoo Han
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.11a
    • /
    • pp.417-426
    • /
    • 2000
  • This study suggests integrated neural network models for Interest rate forecasting using change-point detection, classifiers, and classification functions based on structural change. The proposed model is composed of three phases with tee-staged learning. The first phase is to detect successive and appropriate structural changes in interest rare dataset. The second phase is to forecast change-point group with classifiers (discriminant analysis, logistic regression, and backpropagation neural networks) and their. combined classification functions. The fecal phase is to forecast the interest rate with backpropagation neural networks. We propose some classification functions to overcome the problems of two-staged learning that cannot measure the performance of the first learning. Subsequently, we compare the structured models with a neural network model alone and, in addition, determine which of classifiers and classification functions can perform better. This article then examines the predictability of the proposed classification functions for interest rate forecasting using structural change.

  • PDF

The probabilistic estimation of inundation region using a multiple logistic regression analysis (다중 Logistic 회귀분석을 통한 침수지역의 확률적 도출)

  • Jung, Minkyu;Kim, Jin-Guk;Uranchimeg, Sumiya;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.2
    • /
    • pp.121-129
    • /
    • 2020
  • The increase of impervious surface and development along the river due to urbanization not only causes an increase in the number of associated flood risk factors but also exacerbates flood damage, leading to difficulties in flood management. Flood control measures should be prioritized based on various geographical information in urban areas. In this study, a probabilistic flood hazard assessment was applied to flood-prone areas near an urban river. Flood hazard maps were alternatively considered and used to describe the expected inundation areas for a given set of predictors such as elevation, slope, runoff curve number, and distance to river. This study proposes a Bayesian logistic regression-based flood risk model that aims to provide a probabilistic risk metric such as population-at-risk (PAR). Finally, the logistic regression model demonstrates the probabilistic flood hazard maps for the entire area.

Building a Nonlinear Relationship between Air and Water Temperature for Climate-Induced Future Water Temperature Prediction (기후변화에 따른 미래 하천 수온 예측을 위한 비선형 기온-수온 상관관계 구축)

  • Lee, Khil-Ha
    • Journal of Environmental Policy
    • /
    • v.13 no.2
    • /
    • pp.21-38
    • /
    • 2014
  • In response to global warming, the effect of the air temperature on water temperature has been noticed. The change in water temperature in river environment results in the change in water quality and ecosystem, especially Dissolved Oxygen (DO) level, and shifts in aquatic biota. Efforts need to be made to predict future water temperature in order to understand the timing of the projected river temperature. To do this, the data collected by the Ministry of Environment and the Korea Meteororlogical Administration has been used to build a nonlinear relationship between air and water temperature. The logistic function that includes four different parameters was selected as a working model and the parameters were optimized using SCE algorithm. Weekly average values were used to remove time scaling effect because the time scale affects maximum and minimum temperature and then river environment. Generally speaking nonlinear logistic model shows better performance in NSC and RMSE and nonlinear logistic function is recommendable to build a relationship between air and water temperature in Korea. The results will contribute to determine the future policy regarding water quality and ecosystem for the decision-driving organization.

  • PDF

Two-Stage Logistic Regression for Cancer Classi cation and Prediction from Copy-Numbe Changes in cDNA Microarray-Based Comparative Genomic Hybridization

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.847-859
    • /
    • 2011
  • cDNA microarray-based comparative genomic hybridization(CGH) data includes low-intensity spots and thus a statistical strategy is needed to detect subtle differences between different cancer classes. In this study, genes displaying a high frequency of alteration in one of the different classes were selected among the pre-selected genes that show relatively large variations between genes compared to total variations. Utilizing copy-number changes of the selected genes, this study suggests a statistical approach to predict patients' classes with increased performance by pre-classifying patients with similar genetic alteration scores. Two-stage logistic regression model(TLRM) was suggested to pre-classify homogeneous patients and predict patients' classes for cancer prediction; a decision tree(DT) was combined with logistic regression on the set of informative genes. TLRM was constructed in cDNA microarray-based CGH data from the Cancer Metastasis Research Center(CMRC) at Yonsei University; it predicted the patients' clinical diagnoses with perfect matches (except for one patient among the high-risk and low-risk classified patients where the performance of predictions is critical due to the high sensitivity and specificity requirements for clinical treatments. Accuracy validated by leave-one-out cross-validation(LOOCV) was 83.3% while other classification methods of CART and DT performed as comparisons showed worse performances than TLRM.

Empirical Analysis on the Relationship between R&D Inputs and Performance Using Successive Binary Logistic Regression Models (연속적 이항 로지스틱 회귀모형을 이용한 R&D 투입 및 성과 관계에 대한 실증분석)

  • Park, Sungmin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.3
    • /
    • pp.342-357
    • /
    • 2014
  • The present study analyzes the relationship between research and development (R&D) inputs and performance of a national technology innovation R&D program using successive binary Logistic regression models based on a typical R&D logic model. In particular, this study focuses on to answer the following three main questions; (1) "To what extent, do the R&D inputs have an effect on the performance creation?"; (2) "Is an obvious relationship verified between the immediate predecessor and its successor performance?"; and (3) "Is there a difference in the performance creation between R&D government subsidy recipient types and between R&D collaboration types?" Methodologically, binary Logistic regression models are established successively considering the "Success-Failure" binary data characteristic regarding the performance creation. An empirical analysis is presented analyzing the sample n = 2,178 R&D projects completed. This study's major findings are as follows. First, the R&D inputs have a statistically significant relationship only with the short-term, technical output, "Patent Registration." Second, strong dependencies are identified between the immediate predecessor and its successor performance. Third, the success probability of the performance creation is statistically significantly different between the R&D types aforementioned. Specifically, compared with "Large Company", "Small and Medium-Sized Enterprise (SMS)" shows a greater success probability of "Sales" and "New Employment." Meanwhile, "R&D Collaboration" achieves a larger success probability of "Patent Registration" and "Sales."

A Study of Freshman Dropout Prediction Model Using Logistic Regression with Shift-Sigmoid Classification Function (시프트 시그모이드 분류함수를 가진 로지스틱 회귀를 이용한 신입생 중도탈락 예측모델 연구)

  • Kim Donghyung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.19 no.4
    • /
    • pp.137-146
    • /
    • 2023
  • The dropout of university freshmen is a very important issue in the financial problems of universities. Moreover, the dropout rate is one of the important indicators among the external evaluation items of universities. Therefore, universities need to predict dropout students in advance and apply various dropout prevention programs targeting them. This paper proposes a method to predict such dropout students in advance. This paper is about a method for predicting dropout students. It proposes a method to select dropouts by applying logistic regression using a shift sigmoid classification function using only quantitative data from the first semester of the first year, which most universities have. It is based on logistic regression and can select the number of prediction subjects and prediction accuracy by using the shift sigmoid function as an classification function. As a result of the experiment, when the proposed algorithm was applied, the number of predicted dropout subjects varied from 100% to 20% compared to the actual number of dropout subjects, and it was found to have a prediction accuracy of 75% to 98%.

Predicting Land Use Change Affected by Population Growth by Integrating Logistic Regression, Markov Chain and Cellular Automata Models

  • Nguyen, Van Trung;Le, Thi Thu Ha;La, Phu Hien
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.4
    • /
    • pp.221-230
    • /
    • 2017
  • Demographic change was considered to be the most major driver of land use change although there were several interacting factors involved, especially in the developing countries. This paper presents an approach to predict the future land use change using a hybrid model. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Experiment was conducted in Giao Thuy district, Nam Dinh Province, Vietnam. Demography and socio-economic variables dealing with urban sprawl were used to create a probability surface of spatio-temporal states of built-up land use for the years 2009, 2019, and 2029. The predicted land use maps for the years 2019 and 2029 show substantial urban development in the area, much of which are located in areas sensitive to source protections. It also showed that aquacultural land changes substantially in areas where are in the vicinity of estuary or near the sea dike. There was considerable variation between the communes; notably, communes with higher household density and higher proportion of people in working age have larger increases in aquacultural areas. The results of the analysis can provide valuable information for local planners and policy makers, assisting their efforts in constructing alternative sustainable urban development schemes and environmental management strategies.

Machine Learning Approach to Blood Stasis Pattern Identification Based on Self-reported Symptoms (기계학습을 적용한 자기보고 증상 기반의 어혈 변증 모델 구축)

  • Kim, Hyunho;Yang, Seung-Bum;Kang, Yeonseok;Park, Young-Bae;Kim, Jae-Hyo
    • Korean Journal of Acupuncture
    • /
    • v.33 no.3
    • /
    • pp.102-113
    • /
    • 2016
  • Objectives : This study is aimed at developing and discussing the prediction model of blood stasis pattern of traditional Korean medicine(TKM) using machine learning algorithms: multiple logistic regression and decision tree model. Methods : First, we reviewed the blood stasis(BS) questionnaires of Korean, Chinese, and Japanese version to make a integrated BS questionnaire of patient-reported outcomes. Through a human subject research, patients-reported BS symptoms data were acquired. Next, experts decisions of 5 Korean medicine doctor were also acquired, and supervised learning models were developed using multiple logistic regression and decision tree. Results : Integrated BS questionnaire with 24 items was developed. Multiple logistic regression models with accuracy of 0.92(male) and 0.95(female) validated by 10-folds cross-validation were constructed. By decision tree modeling methods, male model with 8 decision node and female model with 6 decision node were made. In the both models, symptoms of 'recent physical trauma', 'chest pain', 'numbness', and 'menstrual disorder(female only)' were considered as important factors. Conclusions : Because machine learning, especially supervised learning, can reveal and suggest important or essential factors among the very various symptoms making up a pattern identification, it can be a very useful tool in researching diagnostics of TKM. With a proper patient-reported outcomes or well-structured database, it can also be applied to a pre-screening solutions of healthcare system in Mibyoung stage.