DOI QR코드

DOI QR Code

Visualizing Multi-Variable Prediction Functions by Segmented k-CPG's

  • Published : 2009.01.31

Abstract

Machine learning methods such as support vector machines and random forests yield nonparametric prediction functions of the form y = $f(x_1,{\ldots},x_p)$. As a sequel to the previous article (Huh and Lee, 2008) for visualizing nonparametric functions, I propose more sensible graphs for visualizing y = $f(x_1,{\ldots},x_p)$ herein which has two clear advantages over the previous simple graphs. New graphs will show a small number of prototype curves of $f(x_1,{\ldots},x_{j-1},x_j,x_{j+1}{\ldots},x_p)$, revealing statistically plausible portion over the interval of $x_j$ which changes with ($x_1,{\ldots},x_{j-1},x_{j+1},{\ldots},x_p$). To complement the visual display, matching importance measures for each of p predictor variables are produced. The proposed graphs and importance measures are validated in simulated settings and demonstrated for an environmental study.

Keywords

References

  1. Breiman, L. (2001). Random forests, Machine Learning, 45, 5-32 https://doi.org/10.1023/A:1010933404324
  2. Breiman, L. and Friedman, J. (1985). Estimating optimal transformations for multiple regression and correlation, Journal of the American Statistical Association, 80, 580-598 https://doi.org/10.2307/2288473
  3. Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning, Springer, New York
  4. Huh, M. H. and Lee, Y. (2008). Simple graphs for complex prediction functions, Communications of the Korean Statistical Society, 15, 343-351 https://doi.org/10.5351/CKSS.2008.15.3.343
  5. Strobl, C., Boulesteix, A., Kneib., T., Augustin, T. and Zeileis, A. (2008). Conditioning variable importance for random forests, BMC Bioinformatics, 9, 307 https://doi.org/10.1186/1471-2105-9-307