Browse > Article
http://dx.doi.org/10.5351/CKSS.2009.16.1.185

Visualizing Multi-Variable Prediction Functions by Segmented k-CPG's  

Huh, Myung-Hoe (Dept. of Statistics, Korea Univ.)
Publication Information
Communications for Statistical Applications and Methods / v.16, no.1, 2009 , pp. 185-193 More about this Journal
Abstract
Machine learning methods such as support vector machines and random forests yield nonparametric prediction functions of the form y = $f(x_1,{\ldots},x_p)$. As a sequel to the previous article (Huh and Lee, 2008) for visualizing nonparametric functions, I propose more sensible graphs for visualizing y = $f(x_1,{\ldots},x_p)$ herein which has two clear advantages over the previous simple graphs. New graphs will show a small number of prototype curves of $f(x_1,{\ldots},x_{j-1},x_j,x_{j+1}{\ldots},x_p)$, revealing statistically plausible portion over the interval of $x_j$ which changes with ($x_1,{\ldots},x_{j-1},x_{j+1},{\ldots},x_p$). To complement the visual display, matching importance measures for each of p predictor variables are produced. The proposed graphs and importance measures are validated in simulated settings and demonstrated for an environmental study.
Keywords
Visualization of prediction functions; k-Means clustering; variable importance; support vector machine; random forests; environmental data;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Breiman, L. (2001). Random forests, Machine Learning, 45, 5-32   DOI
2 Breiman, L. and Friedman, J. (1985). Estimating optimal transformations for multiple regression and correlation, Journal of the American Statistical Association, 80, 580-598   DOI   ScienceOn
3 Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning, Springer, New York
4 Huh, M. H. and Lee, Y. (2008). Simple graphs for complex prediction functions, Communications of the Korean Statistical Society, 15, 343-351   과학기술학회마을   DOI   ScienceOn
5 Strobl, C., Boulesteix, A., Kneib., T., Augustin, T. and Zeileis, A. (2008). Conditioning variable importance for random forests, BMC Bioinformatics, 9, 307   DOI   ScienceOn