• Title/Summary/Keyword: logistic linear models

Search Result 86, Processing Time 0.033 seconds

Parameter estimation for the imbalanced credit scoring data using AUC maximization (AUC 최적화를 이용한 낮은 부도율 자료의 모수추정)

  • Hong, C.S.;Won, C.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.2
    • /
    • pp.309-319
    • /
    • 2016
  • For binary classification models, we consider a risk score that is a function of linear scores and estimate the coefficients of the linear scores. There are two estimation methods: one is to obtain MLEs using logistic models and the other is to estimate by maximizing AUC. AUC approach estimates are better than MLEs when using logistic models under a general situation which does not support logistic assumptions. This paper considers imbalanced data that contains a smaller number of observations in the default class than those in the non-default for credit assessment models; consequently, the AUC approach is applied to imbalanced data. Various logit link functions are used as a link function to generate imbalanced data. It is found that predicted coefficients obtained by the AUC approach are equivalent to (or better) than those from logistic models for low default probability - imbalanced data.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

Introduction to variational Bayes for high-dimensional linear and logistic regression models (고차원 선형 및 로지스틱 회귀모형에 대한 변분 베이즈 방법 소개)

  • Jang, Insong;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.445-455
    • /
    • 2022
  • In this paper, we introduce existing Bayesian methods for high-dimensional sparse regression models and compare their performance in various simulation scenarios. Especially, we focus on the variational Bayes approach proposed by Ray and Szabó (2021), which enables scalable and accurate Bayesian inference. Based on simulated data sets from sparse high-dimensional linear regression models, we compare the variational Bayes approach with other Bayesian and frequentist methods. To check the practical performance of the variational Bayes in logistic regression models, a real data analysis is conducted using leukemia data set.

Parameter estimation of linear function using VUS and HUM maximization (VUS와 HUM 최적화를 이용한 선형함수의 모수추정)

  • Hong, Chong Sun;Won, Chi Hwan;Jeong, Dong Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1305-1315
    • /
    • 2015
  • Consider the risk score which is a function of a linear score for the classification models. The AUC optimization method can be applied to estimate the coefficients of linear score. These estimates obtained by this AUC approach method are shown to be better than the maximum likelihood estimators using logistic models under the general situation which does not fit the logistic assumptions. In this work, the VUS and HUM approach methods are suggested by extending AUC approach method for more realistic discrimination and prediction worlds. Some simulation results are obtained with both various distributions of thresholds and three kinds of link functions such as logit, complementary log-log and modified logit functions. It is found that coefficient prediction results by using the VUS and HUM approach methods for multiple categorical classification are equivalent to or better than those by using logistic models with some link functions.

Model assessment with residual plot in logistic regression (로지스틱회귀에서 잔차산점도를 이용한 모형평가)

  • Kahng, Myung Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.141-150
    • /
    • 2015
  • Graphical paradigms for assessing the adequacy of models in logistic regression are discussed. The residual plot has been widely used as a graphical tool for evaluating the adequacy of the model. However, this approach works well only for linear models with constant variance, and the alternative approach, the marginal model plot, has its defects as well. We suggest a Chi-residual plot that overcomes the potential shortcomings of the marginal model plot.

The health effects of low blood lead level in oxidative stress as a marker, serum gamma-glutamyl transpeptidase level, in male steelworkers

  • Su-Yeon Lee;Yong-Jin Lee;Young-Sun Min;Eun-Chul Jang;Soon-Chan Kwon;Inho Lee
    • Annals of Occupational and Environmental Medicine
    • /
    • v.34
    • /
    • pp.34.1-34.13
    • /
    • 2022
  • Background: This study aimed to investigate the association between lead exposure and serum gamma-glutamyl transpeptidase (γGT) levels as an oxidative stress marker in male steelworkers. Methods: Data were collected during the annual health examination of workers in 2020. A total of 1,654 steelworkers were selected, and the variables for adjustment included the workers' general characteristics, lifestyle, and occupational characteristics. The association between the blood lead level (BLL) and serum γGT level was investigated by multiple linear and logistic regression analyses. The BLL and serum γGT values that were transformed into natural logarithms were used in multiple linear regression analysis, and the tertile of BLL was used in logistic regression analysis. Results: The geometric mean of the participants' BLLs and serum γGT level was 1.36 ㎍/dL and 27.72 IU/L, respectively. Their BLLs differed depending on age, body mass index (BMI), smoking status, drinking status, shift work, and working period, while their serum γGT levels differed depending on age, BMI, smoking status, drinking status, physical activity, and working period. In multiple linear regression analysis, the difference in models 1, 2, and 3 was significant, obtaining 0.326, 0.176, and 0.172 (all: p < 0.001), respectively. In the multiple linear regression analysis stratified according to drinking status, BMI, and age, BLLs were positively associated with serum γGT levels. Regarding the logistic regression analysis, the odds ratio of the third BLL tertile in models 1, 2, and 3 (for having an elevated serum γGT level within the first tertile reference) was 2.74, 1.83, and 1.81, respectively. Conclusions: BLL was positively associated with serum γGT levels in male steelworkers even at low lead concentrations (< 5 ㎍/dL).

Maximum likelihood estimation of Logistic random effects model (로지스틱 임의선형 혼합모형의 최대우도 추정법)

  • Kim, Minah;Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.957-981
    • /
    • 2017
  • A generalized linear mixed model is an extension of a generalized linear model that allows random effect as well as provides flexibility in developing a suitable model when observations are correlated or when there are other underlying phenomena that contribute to resulting variability. We describe maximum likelihood estimation methods for logistic regression models that include random effects - the Laplace approximation, Gauss-Hermite quadrature, adaptive Gauss-Hermite quadrature, and pseudo-likelihood. Applications are provided with social science problems by analyzing the effect of mental health and life satisfaction on volunteer activities from Korean welfare panel data; in addition, we observe that the inclusion of random effects in the model leads to improved analyses with more reasonable inferences.

Nonlinear Regression Analysis to Determine Infection Models of Colletotrichum acutatum Causing Anthracnose of Chili Pepper Using Logistic Equation

  • Kang, Wee-Soo;Yun, Sung-Chul;Park, Eun-Woo
    • The Plant Pathology Journal
    • /
    • v.26 no.1
    • /
    • pp.17-24
    • /
    • 2010
  • A logistic model for describing combined effects of both temperature and wetness period on appressorium formation was developed using laboratory data on percent appressorium formation of Colletotrichum acutatum. In addition, the possible use of the logistic model for forecasting infection risks was also evaluated as compared with a first-order linear model. A simplified equilibrium model for enzymatic reactions was applied to obtain a temperature function for asymptote parameter (A) of logistic model. For the position (B) and the rate (k) parameters, a reciprocal model was used to calculate the respective temperature functions. The nonlinear logistic model described successfully the response of appressorium formation to the combined effects of temperature and wetness period. Especially the temperature function for asymptote parameter A reflected the response of upper limit of appressorium formation to temperature, which showed the typical temperature response of enzymatic reactions in the cells. By having both temperature and wetness period as independent variables, the nonlinear logistic model can be used to determine the length of wetness periods required for certain levels of appressorium formation under different temperature conditions. The infection model derived from the nonlinear logistic model can be used to calculate infection risks using hourly temperature and wetness period data monitored by automated weather stations in the fields. Compared with the nonlinear infection model, the linear infection model always predicted a shorter wetness period for appressorium formation, and resulted in significantly under- and over-estimation of response at low and high temperatures, respectively.

Binary regression model using skewed generalized t distributions (기운 일반화 t 분포를 이용한 이진 데이터 회귀 분석)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.775-791
    • /
    • 2017
  • We frequently encounter binary data in real life. Logistic, Probit, Cauchit, Complementary log-log models are often used for binary data analysis. In order to analyze binary data, Liu (2004) proposed a Robit model, in which the inverse of cdf of the Student's t distribution is used as a link function. Kim et al. (2008) also proposed a generalized t-link model to make the binary regression model more flexible. The more flexible skewed distributions allow more flexible link functions in generalized linear models. In the sense, we propose a binary data regression model using skewed generalized t distributions introduced in Theodossiou (1998). We implement R code of the proposed models using the glm function included in R base and R sgt package. We also analyze Pima Indian data using the proposed model in R.

An Application of Support Vector Machines to Personal Credit Scoring: Focusing on Financial Institutions in China (Support Vector Machines을 이용한 개인신용평가 : 중국 금융기관을 중심으로)

  • Ding, Xuan-Ze;Lee, Young-Chan
    • Journal of Industrial Convergence
    • /
    • v.16 no.4
    • /
    • pp.33-46
    • /
    • 2018
  • Personal credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Recently, many classification algorithms and models are used in personal credit scoring. Personal credit scoring technology is usually divided into statistical method and non-statistical method. Statistical method includes linear regression, discriminate analysis, logistic regression, and decision tree, etc. Non-statistical method includes linear programming, neural network, genetic algorithm and support vector machine, etc. But for the development of the credit scoring model, there is no consistent conclusion to be drawn regarding which method is the best. In this paper, we will compare the performance of the most common scoring techniques such as logistic regression, neural network, and support vector machines using personal credit data of the financial institution in China. Specifically, we build three models respectively, classify the customers and compare analysis results. According to the results, support vector machine has better performance than logistic regression and neural networks.