DOI QR코드

DOI QR Code

Variable Selection in PLS Regression with Penalty Function

벌점함수를 이용한 부분최소제곱 회귀모형에서의 변수선택

  • Published : 2008.07.16

Abstract

Variable selection algorithm for partial least square regression using penalty function is proposed. We use the fact that usual partial least square regression problem can be expressed as a maximization problem with appropriate constraints and we will add penalty function to this maximization problem. Then simulated annealing algorithm can be used in searching for optimal solutions of above maximization problem with penalty functions added. The HARD penalty function would be suggested as the best in several aspects. Illustrations with real and simulated examples are provided.

본 논문에서는 반응변수가 하나 이상이고 설명변수들의 수가 관측치에 비하여 상대적으로 많은 경우에 널리 사용되는 부분최소제곱회귀모형에 벌점함수를 적용하여 모형에 필요한 설명변수들을 선택하는 문제를 고려하였다. 모형에 필요한 설명변수들은 각각의 잠재변수들에 대한 최적해 문제에 벌점함수를 추가한 후 모의담금질을 이용하여 선택하였다. 실제 자료에 대한 적용 결과 모형의 설명력 및 예측력을 크게 떨어뜨리지 않으면서 필요없는 변수들을 효과적으로 제거하는 것으로 나타나 부분최소제곱회귀모형에서 최적인 설명변수들의 부분집합을 선택하는데 적용될 수 있을 것이다.

Keywords

References

  1. Aarts, E. and Korst, J. (1989). Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing, John Wiley & Sons, New York
  2. Abdi, H. (2003). Partial least squares(PLS) regression, In Lewis-Beck M., Bryman, A. and Futing, T. (eds.), Encyclopedia of Social Sciences Research Methods, Thousand Oaks (CA): Sage
  3. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360 https://doi.org/10.1198/016214501753382273
  4. Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools, Technometrics, 35, 109-135 https://doi.org/10.2307/1269656
  5. Gauchi, J. P. and Chagnon, P. (2001). Comparison of selection methods of explanatory variables in PLS regression with application to manufacturing process data, Chemometrics and Intelligent Laboratory Systems, 58, 171-193 https://doi.org/10.1016/S0169-7439(01)00158-7
  6. Geisser, S. (1974). A predictive approach to the random effect model, Biometrika, 61, 101-107 https://doi.org/10.1093/biomet/61.1.101
  7. Hoskuldsson, A. (2001). Variable and subset selection in PLS regression, Chemometrics and Intelligent Laboratory Systems, 55, 23-38 https://doi.org/10.1016/S0169-7439(00)00113-1
  8. Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the lasso, Journal of Computational and Graphical Statistics, 12, 531-547 https://doi.org/10.1198/1061860032148
  9. Kirkpatrick, S., Gelatt, C. D. Jr. and Vecchi, M. P. (1983). Optimization by simulated annealing, Science, 220, 671-680 https://doi.org/10.1126/science.220.4598.671
  10. Lazraq, A., Cleroux, R. and Gauchi, J. P. (2003). Selecting both latent and explanatory variables in the PLS1 regression model, Chemometrics and Intelligent Laboratory Systems, 66, 117-126 https://doi.org/10.1016/S0169-7439(03)00027-3
  11. Leardi, R. and Gonzealez, A. L. (1998). Genetic algorithms applied to feature selection in PLS regression: How and when to use them, Chemometrics and Intelligent Laboratory Systems, 41, 195-207 https://doi.org/10.1016/S0169-7439(98)00051-3
  12. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equations of state calculations by fast computing machines, Journal of Chemical Physics, 21, 1087-1092 https://doi.org/10.1063/1.1699114
  13. Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society, Series B, 36, 111-147
  14. Tibshirani, R. (1996). Regression shirinkage and selection via the Lasso, Journal of the Royal Statistical Society, Series B, 58, 267-288
  15. Wold, H. (1975). Path models with latent variables: The NIPALS approach, In H.M. Blalock et al., Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building, pages 307-357, Academic Press, New York