• Title/Summary/Keyword: default probability

Search Result 52, Processing Time 0.029 seconds

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Analysis of the 2015 reform plan of government employees pension system (GEPS) through monte carlo simulations (모의실험을 통한 2015년 공무원 연금제도 개정안의 효과분석)

  • Lee, Jieun;Song, Seongjoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.19-32
    • /
    • 2016
  • Due to the increasing fiscal burden and structural unbalanced premium/benefit costs, the new reform on the government employees pension system (GEPS) was considered even after the recent reform in 2009. This article examines the various effects of recent amendment in 2015 on GEPS using a simple probabilistic model. We consider effects on both sides, the pensioners and the government. First of all, the expected net value of pension payment for an individual employee was calculated based on the supposed survival distribution. The fairness of individual pension holders was compared using the benefit-cost ratio. Secondly, from pension system users' point of view, the default probability and the government subsidy were examined by Monte-carlo simulation. From the simulation experiment, we could see that the 2015 reform plan indeed reduces the default probability and the size of the fiscal burden of government by increasing the premium and decreasing the benefit. However, the size of the effect is not very standout at this moment because the number of new employees who are fully subject to the reform will be much smaller than the number of previous employees for a while. Thus, the effect of the reform is expected to appear in a slow manner.

A Predictive Two-Group Multinormal Classification Rule Accounting for Model Uncertainty

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.26 no.4
    • /
    • pp.477-491
    • /
    • 1997
  • A new predictive classification rule for assigning future cases into one of two multivariate normal population (with unknown normal mixture model) is considered. The development involves calculation of posterior probability of each possible normal-mixture model via a default Bayesian test criterion, called intrinsic Bayes factor, and suggests predictive distribution for future cases to be classified that accounts for model uncertainty by weighting the effect of each model by its posterior probabiliy. In this paper, our interest is focused on constructing the classification rule that takes care of uncertainty about the types of covariance matrices (homogeneity/heterogeneity) involved in the model. For the constructed rule, a Monte Carlo simulation study demonstrates routine application and notes benefits over traditional predictive calssification rule by Geisser (1982).

  • PDF

Student Academic Performance, Dropout Decisions and Loan Defaults: Evidence from the Government College Loan Program

  • HAN, SUNG MIN
    • KDI Journal of Economic Policy
    • /
    • v.38 no.1
    • /
    • pp.71-91
    • /
    • 2016
  • This paper examines the effect of the government college loan program in Korea on student academic performance, dropout decisions and loan defaults. While fairness in educational opportunities has been guaranteed to some degree through this program, which started in 2009, there has been a great deal of controversy over its effectiveness. Empirical findings suggest that recipients of general student loan (GSL) lower academic performance than those who received income contingent loan (ICL). Moreover, for students attending private universities, a higher number of loans received increased the probability of a dropout decision, and students from middle-income households had a higher probability of being overdue than students from low-income households. These findings indicate that expanding the ICL program within the allowance of the government budget is necessary. Furthermore, providing opportunities for students to find various jobs and introducing a rating system for defaulters are two necessary tasks.

  • PDF

Empirical Analysis on the Stress Test Using Credit Migration Matrix (신용등급 전이행렬을 활용한 위기상황분석에 관한 실증분석)

  • Kim, Woo-Hwan
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.253-268
    • /
    • 2011
  • In this paper, we estimate systematic risk from credit migration (or transition) matrices under "Asymptotic Single Risk Factor" model. We analyzed transition matrices issued by KR(Korea Ratings) and concluded that systematic risk implied on credit migration somewhat coincide with the real economic cycle. Especially, we found that systematic risk implied on credit migration is better than that implied on the default rate. We also emphasize how to conduct a stress test using systematic risk extracted from transition migration. We argue that the proposed method in this paper is better than the usual method that is only considered for the conditional probability of default(PD). We found that the expected loss critically increased when we explicitly consider the change of credit quality in a given portfolio, compared to the method considering only PD.

Parameter estimation for the imbalanced credit scoring data using AUC maximization (AUC 최적화를 이용한 낮은 부도율 자료의 모수추정)

  • Hong, C.S.;Won, C.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.2
    • /
    • pp.309-319
    • /
    • 2016
  • For binary classification models, we consider a risk score that is a function of linear scores and estimate the coefficients of the linear scores. There are two estimation methods: one is to obtain MLEs using logistic models and the other is to estimate by maximizing AUC. AUC approach estimates are better than MLEs when using logistic models under a general situation which does not support logistic assumptions. This paper considers imbalanced data that contains a smaller number of observations in the default class than those in the non-default for credit assessment models; consequently, the AUC approach is applied to imbalanced data. Various logit link functions are used as a link function to generate imbalanced data. It is found that predicted coefficients obtained by the AUC approach are equivalent to (or better) than those from logistic models for low default probability - imbalanced data.

Standard Criterion of VUS for ROC Surface (ROC 곡면에서 VUS의 판단기준)

  • Hong, C.S.;Jung, E.S.;Jung, D.G.
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.977-985
    • /
    • 2013
  • Many situations are classified into more than two categories in real world. In this work, we consider ROC surface and VUS, which are graphical representation methods for classification models with three categories. The standard criteria of AUC for the probability of default based on Basel II is extended to the VUS for ROC surface; therefore, the standardized criteria of VUS for the classification model is proposed. The ranges of AUC, K-S and mean difference statistics corresponding to VUS values for each class of the standard criteria are obtained. The standard criteria of VUS for ROC surface can be established by exploring the relationships of these statistics.

Choice versus Given: Influence of Choice on Effectiveness of Retailers' Sweepstakes Promotion

  • Meeja IM
    • Journal of Distribution Science
    • /
    • v.21 no.6
    • /
    • pp.39-49
    • /
    • 2023
  • Purpose: This paper aims to investigate the influence of different methods of distributing sweepstakes (i.e., whether consumers choose to enter into the sweepstakes themselves or they are given the sweepstake ticket by default) on the effectiveness of the sweepstakes promotion (i.e., interest in the sweepstakes and intention to participate in the sweepstakes). Research design, data and methodology: The paper verifies this effect through three experimental studies: an online experiment using a sweepstakes promotion scenario at a department store, an online SNS sweepstakes promotion event, and a face-to-face card lottery game. Results: Participants belonging the group that chose sweepstakes tickets by themselves showed higher interest and intention to participate in the sweepstakes than those who were given the sweepstakes ticket by default. Furthermore, the group that chose the sweepstakes card thought it had a higher probability of winning than the group given the sweepstakes card. Conclusions: This paper shows a way to enhance the promotional effect of sweepstakes in the retail stores, without incurring additional costs, by approaching from sweepstakes design from the psychological perspective of the consumer. The study also sheds new light on the effect of sense of control manipulation using choice behavior in the promotional context.

A default-rate comparison of the construction and other industries using survival analysis method (생존분석기법을 이용한 건설업과 타 업종간의 부도율 비교 분석)

  • Park, Jin-Kyung;Oh, Kwang-Ho;Kim, Min-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.4
    • /
    • pp.747-756
    • /
    • 2010
  • With the recent recession, studies on the economy are actively being conducted throughout the industry. Based on the Small Business data registered in the Credit Guarantee Fund, we estimated the survival probability in the context of the survival analysis. We also analyzed the survival time for the construction and the other industries which are distinguished depending on the types of business and assets in the Small Business. The survival probability was estimated by using the life-table and the difference between the survival probabilities for the different types of business was described via the method of the Log-rank test and the Wilcoxon test. We found that the small business with over one billion asset has the highest survival probability and that with less than 1000 million asset showed the similar survival probability. In terms of types of business Wholesale and Retail trade industry and Services were relatively high in the survival probability than Light, Heavy, and the construction industries. Especially the construction industry showed the lowest survival probability. Most of the Small Business tend to increase in the hazard rate over time.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.