[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/KJAS.2017.30.6.867

Optimal number of dimensions in linear discriminant analysis for sparse data

Shin, Ga In (Department of Statistics, Sungkyunkwan University)
Kim, Jaejik (Department of Statistics, Sungkyunkwan University)

Publication Information

The Korean Journal of Applied Statistics / v.30, no.6, 2017 , pp. 867-876 More about this Journal

Abstract

Datasets with small n and large p are often found in various fields and the analysis of the datasets is still a challenge in statistics. Discriminant analysis models for such datasets were recently developed in classification problems. One approach of those models tries to detect dimensions that distinguish between groups well and the number of the detected dimensions is typically smaller than p. In such models, the number of dimensions is important because the prediction and visualization of data and can be usually determined by the K-fold cross-validation (CV). However, in sparse data scenarios, the CV is not reliable for determining the optimal number of dimensions since there can be only a few observations for each fold. Thus, we propose a method to determine the number of dimensions using a measure based on the standardized distance between the mean values of each group in the reduced dimensions. The proposed method is verified through simulations.

Keywords

discriminant analysis; sparse data; standardized distance; dimensions;

Citations & Related Records

Reference

1	Chung, D. and Keles, S. (2010). Sparse partial least squares classification for high dimensional data, Statistical Applications in Genetics and Molecular Biology, 9, 1544-6115.
2	Clemmensen, L., Hastie, T., Witten, D., and Ersboll, B. (2011). Sparse discriminant analysis, Technometrics, 53, 406-413. DOI
3	Efron, B. and Tibshirani, R. (1997). Improvements on cross-validation: the 632+ bootstrap method, Journal of the American Statistical Association, 92, 548-560.
4	Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, 179-188. DOI
5	Guo, Y., Hastie, T., and Tibshirani, R. (2007). Regularized linear discriminant analysis and its applications in microarrays, Biostatistics, 8, 86-100. DOI
6	Witten, D. and Tibshirani, R. (2011). Penalized classification using Fisher's linear discriminant, Journal of Royal Statistical Society, Series B, 73, 753-772. DOI
7	Hastie, T., Buja, A., and Tibshirani, R. (1995). Penalized discriminant analysis, The Annals of Statistics, 23, 73-102. DOI
8	Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Element of Statistical Learning, Springer, New York.
9	McLachlan, G. (2004). Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons, New Jersey.
10	Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic-net, Journal of Royal Statistical Society, Series B, 67, 301-320. DOI
11	Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees, Wadsworth International Group.
12	Chun, H. and Keles, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection, Journal of Royal Statistical Society, Series B, 72, 3-25. DOI

KSCI

Optimal number of dimensions in linear discriminant analysis for sparse data 희박한 데이터에 대한 선형판별분석에서 최적의 차원 수 결정

Optimal number of dimensions in linear discriminant analysis for sparse data