DOI QR코드

DOI QR Code

Input Variable Importance in Supervised Learning Models

  • Published : 2003.04.01

Abstract

Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

Keywords

References

  1. Classification and Regression Trees Breiman, L.;Friedman, J.H.;Olshen, R.A.;Stone, C.J.
  2. Korean Communications in Statistics v.9 Bayesian analysis for neural network models Chung, Y.S.;Jung, J.Y.;Kim, C.S. https://doi.org/10.5351/CKSS.2002.9.1.155
  3. Korean Journal of Applied Statistics v.15 A comparison on the efficiency of data mining softwares Han, S.T.;Kang, H.C.;Lee, S.K.;Lee, D.K. https://doi.org/10.5351/KJAS.2002.15.2.201
  4. The Elements of Statistical Learning Hastie, T.;Tibshirani, R.;Friedman, J.
  5. Korean Communications in Statistics v.4 Bootstrap model selection criterion for determining the number of hidden units in neural network model Hwang, C.H.;Kim, D.H.
  6. Journal of Data Science and Classification (Korean Classification Society) v.1 Tree-structured classification for high risk dental caries Lee, T.R.;Moon, H.S.
  7. Korean Communications in Statistics v.7 Interpretation of data mining prediction model using decision tree Kang, H.C.;Han, S.T.;Choi, J.H.
  8. Journal of Korean Statistical Society v.25 Model selection for tree-structured regression Kim, S.H.
  9. Korean Journal of Applied Statistics v.14 A combined multiple regression trees predictor for screening large chemical databases Lim, Y.B.;Lee, S.Y.;Chung, J.H.
  10. Pattern Recognition and Neural Network Ripley, R.D.
  11. How to measure importance of inputs? Unpublished White Paper Sarle, W.S.
  12. Korean Journal of Applied Statistics v.14 A study on variable selection bias in data mining softwares Song, M.S.;Yoon, Y.J.
  13. Clementine 7.0 User's Guide SPSS Inc.
  14. Unpublished White Paper Clementine's neural networks technical overview Watkins, D.
  15. Korean Communications in Statistics v.9 A study on unbiased methods in constructing classification trees Lee, Y.M.;Song, M.S. https://doi.org/10.5351/CKSS.2002.9.3.809

Cited by

  1. Bias Reduction in Split Variable Selection in C4.5 vol.10, pp.3, 2003, https://doi.org/10.5351/CKSS.2003.10.3.627