Browse > Article
http://dx.doi.org/10.5351/KJAS.2009.22.3.531

Principal Components Logistic Regression based on Robust Estimation  

Kim, Bu-Yong (Department of Statistics, Sookmyung Women's University)
Kahng, Myung-Wook (Department of Statistics, Sookmyung Women's University)
Jang, Hea-Won (Enterprise Risk Team, FIST Global, Inc)
Publication Information
The Korean Journal of Applied Statistics / v.22, no.3, 2009 , pp. 531-539 More about this Journal
Abstract
Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.
Keywords
Datamining; multicollinearity; outlier; principal components logistic regression; robust estimation;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Rousseeuw, P. J. and Leroy, A. M. (2003). Robust Regression and Outlier Detection, John Wiley & Sons, New York
2 Schaefer, R. L. (1986). Alternative estimators in logistic regression when the data are collinear, Journal of Statistical Computation and Simulations, 25, 75-91   DOI
3 Woodruff, D. L. and Rocke, D. M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators, Journal of the American Statistical Association, 89, 888-896   DOI   ScienceOn
4 Copas, J. B. (1988). Binary regression models for contaminated data, Journal of the Royal Statistical Society, Series E, 50, 225-265
5 Croux, C. and Haesbroeck, G. (2003). Implementing the Bianco and Yohai estimator for logistic regression, Computational Statistics & Data Analysis, 44, 273-295   DOI   ScienceOn
6 Hadi, A. S. (1994). A modification of a method for the detection of outliers in multivariate samples, Journal of the Royal Statistical Society, Series E, 56, 393-396
7 Hardin, J. and Rocke, D. M. (2004). Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics & Data Analysis, 44, 625-638   DOI   ScienceOn
8 Kim, B. Y. (2005). V-mask type criterion for identification of outliers in logistic regression, The Korean Communications in Statistics, 12, 625-634   DOI   ScienceOn
9 Kim, B. Y., Kahng, M. W. and Choi, M. A. (2007). Algorithm for the robust estimation in logistic regression, The Korean Journal of Applied Statistics, 20, 551-559   과학기술학회마을   DOI   ScienceOn
10 Kim, B. Y. and Kahng, M. W. (2008). Principal components regression in logistic model, The Korean Journal of Applied Statistics, 21, 571-580   DOI   ScienceOn
11 Kordzakhia, N., Mishra, G. D. and Reiersolmoen, L. (2001). Robust estimation in the logistic regression model, Journal of Statistical Planning and Inference, 98, 211-223   DOI   ScienceOn
12 Mason, R. L. and Gunst, R. F. (1985). Selecting principal components in regression, Statistics & Probability Letters, 3, 299-301   DOI   ScienceOn
13 Rousseeuw, P. J. and Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223   DOI   ScienceOn
14 Carroll, R. J. and Pederson, S. (1993), On robustness in the logistic regression model, Journal of the Royal Statistical Society, Series E, 55, 693-706
15 Aguilera, A. M., Escabias, M. and Valderrama, M. J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data, Computational Statistics & Data Analysis, 50, 1905-1924   DOI   ScienceOn