Browse > Article
http://dx.doi.org/10.5351/KJAS.2021.34.2.177

Correlated variable importance for random forests  

Shin, Seung Beom (Department of Statistics, Korea University)
Cho, Hyung Jun (Department of Statistics, Korea University)
Publication Information
The Korean Journal of Applied Statistics / v.34, no.2, 2021 , pp. 177-190 More about this Journal
Abstract
Random forests is a popular method that improves the instability and accuracy of decision trees by ensembles. In contrast to increasing the accuracy, the ease of interpretation is sacrificed; hence, to compensate for this, variable importance is provided. The variable importance indicates which variable plays a role more importantly in constructing the random forests. However, when a predictor is correlated with other predictors, the variable importance of the existing importance algorithm may be distorted. The downward bias of correlated predictors may reduce the importance of truly important predictors. We propose a new algorithm remedying the downward bias of correlated predictors. The performance of the proposed algorithm is demonstrated by the simulated data and illustrated by the real data.
Keywords
random forests; variable importance; correlation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Archer, K. J. and Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures, Computational Statistics & Data Analysis, 52, 2249-2260.   DOI
2 Biau, G. and Scornet, E. (2016). A random forest guided tour, Text, 25, 197-227.
3 Breiman, L. (2001). Random forests, Machine Learning, 45, 5-32.   DOI
4 Breiman, L. (2002). Manual on setting up, using, and understanding random forests v3.1. Statistics Department University of California Berkeley, CA, USA.
5 Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees, Wadsworth, Belmont.
6 Genuer, R., Poggi, J. M., and Tuleau-Malot, C. (2010). Variable selection using random forests, Pattern Recognition Letters, 31, 2225-2236.   DOI
7 Gregorutti, B., Michel, B., and Saint-Pierre, P. (2017). Correlation and variable importance in random forests, Statistics and Computing, 27, 659-678.   DOI
8 Nicodemus, K. K., Malley, J. D., Strobl, C., and Ziegler, A. (2010). The behaviour of random forest permutation-based variable importance measures under predictor correlation, Bioinformatics, 11, 1-13.   DOI
9 RColorBrewer, S. and Liaw, M. A. (2018). Package 'random Forest', University of California Berkeley, CA, USA.
10 Rumao, S. (2019). Exploration of Variable Importance and Variable selection techniques in presence of correlated variables. Rochester Institute of Technology.
11 Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., and Zeileis, A. (2008). Conditional variable importance for random forests, Bioinformatics, 9, 307.
12 Strobl, C., Boulesteix, A. L., Zeileis, A., and Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution, Bioinformatics, 8, 25.