Browse > Article
http://dx.doi.org/10.5351/KJAS.2017.30.5.793

A comparative study of feature screening methods for ultrahigh dimensional multiclass classification  

Lee, Kyungeun (Department of Statistics, Korea University)
Kim, Kyoung Hee (Department of Statistics, Sungshin Women's University)
Shin, Seung Jun (Department of Statistics, Korea University)
Publication Information
The Korean Journal of Applied Statistics / v.30, no.5, 2017 , pp. 793-808 More about this Journal
Abstract
We compare various variable screening methods on multiclass classification problems when the data is ultrahigh-dimensional. Two different approaches were considered: (1) pairwise extension from binary classification via one versus one or one versus rest comparisons and (2) direct classification of multiclass responses. We conducted extensive simulation studies under different conditions: heavy tailed explanatory variables, correlated signal and noise variables, correlated joint distributions but uncorrelated marginals, and unbalanced response variables. We then analyzed real data to examine the performance of the methods. The results showed that model-free methods perform better for multiclass classification problems as well as binary ones.
Keywords
multi-categorical classification; simulation; ultrahigh-dimensional classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Fan, J. and Fan, Y. (2008). High dimensional classification using features annealed independence rules, The Annals of Statistics, 36, 2605.   DOI
2 Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models, Journal of the American Statistical Association, 106, 544-557.   DOI
3 Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 849-911.   DOI
4 Fan, J., Samworth, R., and Wu, Y. (2009). Ultrahigh dimensional feature selection: beyond the linear model, The Journal of Machine Learning Research, 10, 2013-2038.
5 Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality, The Annals of Statistics, 38, 3567-3604.   DOI
6 Gui, J. and Li, H. (2005), Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data, Bioinformatics, 21, 3001-3008.   DOI
7 He, X., Wang, L., and Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data, The Annals of Statistics, 41, 342-369.   DOI
8 Kimeldorf, G. and Wahba, G. (1971). Some Results on Tchebycheffian Spline Functions, Journal of Mathematical Analysis and Applications, 33, 82-95.   DOI
9 Li, R., Zhong, W., and Zhu, L. (2012). Feature screening via distance correlation learning, Journal of the American Statistical Association, 107, 1129-1139.   DOI
10 Ma, S. and Huang, J. (2008). Penalized feature selection and classification in Bioinformatics, Briefings in Bioinformatics, 9, 392-403.   DOI
11 Mai, Q. and Zou, H. (2012). The Kolmogorov filter for variable screening in high-dimensional binary classification, Biometrika, 100, 229-234.
12 Mai, Q. and Zou, H. (2015). The fused Kolmogorov filter: a nonparametric model-free screening method, The Annals of Statistics, 43, 1471-1497.   DOI
13 Metzker, M. L. (2010). Sequencing technologies-the next generation, Nature Reviews Genetics, 11, 31-46.   DOI
14 Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E., and Lange, K. (2009). Genome-wide association analysis by lasso penalized logistic regression, Bioinformatics, 25, 714-721.   DOI
15 Zhang, H. H., Ahn, J., Lin, X., and Park, C. (2006). Gene selection using support vector machines with non-convex penalty, Bioinformatics, 22, 88-95.   DOI
16 Zhu, L. P., Li, L., Li, R., and Zhu, L. X. (2011). Model-free feature screening for ultrahigh-dimensional data, Journal of the American Statistical Association, 106, 1464-1475,   DOI