DOI QR코드

DOI QR Code

Variable Selection for Logistic Regression Model Using Adjusted Coefficients of Determination

수정 결정계수를 사용한 로지스틱 회귀모형에서의 변수선택법

  • Hong C. S. (Department of Statistics, Sungkyunkwan University) ;
  • Ham J. H. (Department of Statistics, Sungkyunkwan University) ;
  • Kim H. I. (Department of Statistics, Sungkyunkwan University)
  • 홍종선 (성균관대학교 경제학부 통계학전공) ;
  • 함주형 (성균관대학교 통계학과) ;
  • 김호일 (성균관대학교 통계학과)
  • Published : 2005.07.01

Abstract

Coefficients of determination in logistic regression analysis are defined as various statistics, and their values are relatively smaller than those for linear regression model. These coefficients of determination are not generally used to evaluate and diagnose logistic regression model. Liao and McGee (2003) proposed two adjusted coefficients of determination which are robust at the addition of inappropriate predictors and the variation of sample size. In this work, these adjusted coefficients of determination are applied to variable selection method for logistic regression model and compared with results of other methods such as the forward selection, backward elimination, stepwise selection, and AIC statistic.

로지스틱 회귀모형에서 결정계수는 선형 회귀모형보다 다양하게 정의되며 그 값들도 매우 작아 로지스틱 회귀모형 평가기준으로 사용되는 통계량이 라고 할 수 없다. Liao와 McGee(2003)는 부적절한 설명변수의 추가 또는 표본크기의 변화에 민감하지 않은 두 종류의 수정 결정계수를 제안하였다. 본 연구에서는 실제자료에 적용한 로지스틱 회귀모형에서 수정 결정계수를 포함한 네 종류의 결정계수들을 변수선택의 기준으로 사용하여 기존의 변수선택 방법인 전진선택, 후진제거, 단계적 선택방법, AIC 통계량 등을 사용한 방법들과 비교하여 그 적절함과 효율성을 토론한다.

Keywords

References

  1. 성웅현 (2001). <응용 로지스틱 회귀분석>, 탐진, 서울
  2. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle, In Proceedings of the 2nd International Symposium on Information theory, edited by B. N. Petrov and F. Czaki. Akademiai Kiado, Budapest
  3. Brown, B. W. (1980). Prediction analyses for binary data, Biostatistics Casebook, John Wiley and Sons, New York
  4. Efron, B. (1978). Regression and ANOVA With Zero-One Data: Measures of Residual Variation, Journal of the American Statistical Association, 73, 113-121 https://doi.org/10.2307/2286531
  5. Kvalseth, T. O. (1985). Cautionary Note About $R^2$, The American Statistician, 39, 279-285 https://doi.org/10.2307/2683704
  6. Liao, J. G. and McGee, D. (2003). Adjusted Coefficients of Determination for Logistic Regression, The American Statistician, 57, 161-165 https://doi.org/10.1198/0003130031964
  7. Menard, S. (2000). Coefficients of Determination for Multiple Logistic Regression Analysis, The American Statistician, 54, 17-24 https://doi.org/10.2307/2685605
  8. Mittlbock, M. and Schemper, M. (1996). Explained Variation for Logistic Regression, Statistics in Medicine, 15, 1987-1997 https://doi.org/10.1002/(SICI)1097-0258(19961015)15:19<1987::AID-SIM318>3.0.CO;2-9
  9. Zheng, B. and Agresti, A. (2000). Summarizing the Predictive Power of a Generalized Linear Model, Statistics in Medicine, 19, 1771-1781 https://doi.org/10.1002/1097-0258(20000715)19:13<1771::AID-SIM485>3.0.CO;2-P

Cited by

  1. Coefficient of determination for multiple measurement error models vol.126, 2014, https://doi.org/10.1016/j.jmva.2014.01.006
  2. Prediction of Seasonal Nitrate Concentration in Springs on the Southern Slope of Jeju Island using Multiple Linear Regression of Geographic Spatial Data vol.44, pp.2, 2011, https://doi.org/10.9719/EEG.2011.44.2.135
  3. Goodness of fit in restricted measurement error models vol.145, 2016, https://doi.org/10.1016/j.jmva.2015.12.005