DOI QR코드

DOI QR Code

A concise overview of principal support vector machines and its generalization

  • 투고 : 2024.01.19
  • 심사 : 2024.02.20
  • 발행 : 2024.03.31

초록

In high-dimensional data analysis, sufficient dimension reduction (SDR) has been considered as an attractive tool for reducing the dimensionality of predictors while preserving regression information. The principal support vector machine (PSVM) (Li et al., 2011) offers a unified approach for both linear and nonlinear SDR. This article comprehensively explores a variety of SDR methods based on the PSVM, which we call principal machines (PM) for SDR. The PM achieves SDR by solving a sequence of convex optimizations akin to popular supervised learning methods, such as the support vector machine, logistic regression, and quantile regression, to name a few. This makes the PM straightforward to handle and extend in both theoretical and computational aspects, as we will see throughout this article.

키워드

과제정보

This work is funded by the National Research Foundation of Korea (NRF) grants (2023R1A2C1006587, 2022M3J6A1063595) and Korea University (K2302021).

참고문헌

  1. Akaho S (2001). A kernel method for canonical correlation analysis, Available from: arXiv preprint cs/0609071
  2. Artemiou A and Dong Y (2016). Sufficient dimension reduction via principal L q support vector machine, Electronic Journal of Statistics, 10, 783-805. https://doi.org/10.1214/16-EJS1122
  3. Artemiou A, Dong Y, and Shin SJ (2021). Real-time sufficient dimension reduction through principal least squares support vector machines, Pattern Recognition, 112, 107768.
  4. Banijamali E, Karimi A-H, and Ghodsi A (2018). Deep variational sufficient dimensionality reduction, Available from: arXiv preprint arXiv:1812.07641
  5. Bickel PJ and Levina E (2008). Regularized estimation of large covariance matrices, The Annals of Statistics, 36, 199-227. https://doi.org/10.1214/009053607000000758
  6. Bondell HD and Li L (2009). Shrinkage inverse regression estimation for model-free variable selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 287-299. https://doi.org/10.1111/j.1467-9868.2008.00686.x
  7. Boyd SP and Vandenberghe L (2004). Convex Optimization, Cambridge University Press, Cambridge.
  8. Bura E, Forzani L, Arancibia RG, Llop P, and Tomassi D (2022). Sufficient reductions in regression with mixed predictors, The Journal of Machine Learning Research, 23, 4377-4423.
  9. Chun H and Keles, S (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection, Journal of the Royal Statistical Society Series B: Statistical Methodology, 72, 3-25. https://doi.org/10.1111/j.1467-9868.2009.00723.x
  10. Cook RD (2004). Testing predictor contributions in sufficient dimension reduction, The Annals of Statistics, 32, 1062-1092. https://doi.org/10.1214/009053604000000292
  11. Cook RD (2007). Fisher lecture: Dimension reduction in regression, Statistical Science, 22, 1-26. https://doi.org/10.1214/088342306000000682
  12. Cook RD and Weisberg S (1991). Discussion of "sliced inverse regression for dimension reduction", Journal of the American Statistical Association, 86, 28-33. https://doi.org/10.2307/2290564
  13. Fan J and Li R (2001). Variable section via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  14. Forero PA, Cano A, and Giannakis GB (2010). Consensus-based distributed linear support vector machines, In Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, Stockholm, 35-46.
  15. Fukumizu K, Bach FR, and Gretton A (2007). Statistical consistency of kernel canonical correlation analysis, Journal of Machine Learning Research, 8, 361-383.
  16. Hastie T, Tibshirani R, Friedman J, and Friedman JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York.
  17. Hristache M, Juditsky A, Polzehl J, and Spokoiny V (2001). Structure adaptive approach for dimension reduction, Annals of Statistics, 29, 1537-1566. https://doi.org/10.1214/aos/1015345954
  18. Jang HJ, Shin SJ, and Artemiou A (2023). Principal weighted least square support vector machine: An online dimension-reduction tool for binary classification, Computational Statistics & Data Analysis, 187, 107818.
  19. Jiang B, Zhang X, and Cai T (2008). Estimating the confidence interval for prediction errors of support vector machine classifiers, Journal of Machine Learning Research, 9, 521-540.
  20. Jin J, Ying C, and Yu Z (2019). Distributed estimation of principal support vector machines for sufficient dimension reduction, Available from: arXiv preprint arXiv:1911.12732
  21. Kang J and Shin SJ (2022). A forward approach for sufficient dimension reduction in binary classification, The Journal of Machine Learning Research, 23, 9025-9055.
  22. Kapla D, Fertl L, and Bura E (2022). Fusing sufficient dimension reduction with neural networks, Computational Statistics & Data Analysis, 168, 107390.
  23. Kim B and Shin SJ (2019). Principal weighted logistic regression for sufficient dimension reduction in binary classification, Journal of the Korean Statistical Society, 48, 194-206. https://doi.org/10.1016/j.jkss.2018.11.001
  24. Kim H, Howland P, Park H, and Christianini N (2005). Dimension reduction in text classification with support vector machines, Journal of Machine Learning Research, 6, 37-53.
  25. Kim K, Li B, Zhou Y, and Li L (2020). On post dimension reduction statistical inference, The Annals of Statistics, 48, 1567-1592. https://doi.org/10.1214/19-AOS1859
  26. Koenker R and Bassett G (1978). Regression quantiles, Econometrica, 46, 33-50. https://doi.org/10.2307/1913643
  27. Kong E and Xia Y (2014). An adaptive composite quantile approach to dimension reduction, The Annals of Statistics, 42, 1657-1688. https://doi.org/10.1214/14-AOS1242
  28. Lee KY, Li B, and Chiaromonte F (2013). A general theory for nonlinear sufficient dimension reduction: Formulation and estimation, The Annals of Statistics, 41, 221-249. https://doi.org/10.1214/12-AOS1071
  29. Li B (2018). Sufficient Dimension Reduction: Methods and Applications with R, CRC Press, Boca Raton, FL.
  30. Li B, Artemiou A, and Li L (2011). Principal support vector machines for linear and nonlinear sufficient dimension reduction, The Annals of Statistics, 39, 3182-3210. https://doi.org/10.1214/11-AOS932
  31. Li B and Wang S (2007). On directional regression for dimension reduction, Journal of the American Statistical Association, 102, 997-1008. https://doi.org/10.1198/016214507000000536
  32. Li B, Zha H, and Chiaromonte F (2005). Contour regression: A general approach to dimension reduction, The Annals of Statistics, 33, 1580-1616. https://doi.org/10.1214/009053605000000192
  33. Li K-C (1991). Sliced inverse regression for dimension reduction (with discussion), Journal of the American Statistical Association, 86, 316-342. https://doi.org/10.2307/2290563
  34. Li K-C (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein's lemma, Journal of the American Statistical Association, 87, 1025-1039. https://doi.org/10.1080/01621459.1992.10476258
  35. Li L (2007). Sparse sufficient dimension reduction, Biometrika, 94, 603-613. https://doi.org/10.1093/biomet/asm044
  36. Li L (2010). Dimension reduction for high-dimensional data, Statistical Methods in Molecular Biology, 620, 417-434. https://doi.org/10.1007/978-1-60761-580-4_14
  37. Pearson K (1901). On lines and planes of closest fit to systems of points in space, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2, 559-572. https://doi.org/10.1080/14786440109462720
  38. Power MD and Dong Y (2021). Bayesian model averaging sliced inverse regression, Statistics & Probability Letters, 174, 109103.
  39. Quach H and Li B (2023). On forward sufficient dimension reduction for categorical and ordinal responses, Electronic Journal of Statistics, 17, 980-1006. https://doi.org/10.1214/23-EJS2122
  40. Reich BJ, Bondell HD, and Li L (2011). Sufficient dimension reduction via Bayesian mixture modeling, Biometrics, 67, 886-895. https://doi.org/10.1111/j.1541-0420.2010.01501.x
  41. Shin SJ and Artemiou A (2017). Penalized principal logistic regression for sparse sufficient dimension reduction, Computational Statistics & Data Analysis, 111, 48-58.
  42. Shin SJ, Wu Y, Zhang HH, and Liu Y (2017). Principal weighted support vector machines for sufficient dimension reduction in binary classification, Biometrika, 104, 67-81. https://doi.org/10.1093/biomet/asw057
  43. Soale A-N and Dong Y (2022). On sufficient dimension reduction via principal asymmetric least squares, Journal of Nonparametric Statistics, 34, 77-94. https://doi.org/10.1080/10485252.2021.2025237
  44. Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of Royal Statistical Society, series B, 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  45. Van der Vaart AW (2000). Asymptotic Statistics, Cambridge University Press, Cambridge.
  46. Vapnik V (1999). The Nature of Statistical Learning Theory, Springer Science & Business Media, New York.
  47. Wahba G (1999). Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Advances in Kernel Methods-Support Vector Learning, 6, 69-87. https://doi.org/10.7551/mitpress/1130.003.0009
  48. Wang C, Shin SJ, and Wu Y (2018). Principal quantile regression for sufficient dimension reduction with heteroscedasticity, Electronic Journal of Statistics, 12, 2114-2140. https://doi.org/10.1214/18-EJS1432
  49. Weng J and Young DS (2017). Some dimension reduction strategies for the analysis of survey data, Journal of Big Data, 4, 1-19. https://doi.org/10.1186/s40537-017-0103-6
  50. Wu H-M (2008). Kernel sliced inverse regression with applications to classification, Journal of Computational and Graphical Statistics, 17, 590-610. https://doi.org/10.1198/106186008X345161
  51. Wu Y and Li L (2011). Asymptotic properties of sufficient dimension reduction with a diverging number of predictors, Statistica Sinica, 2011, 707.
  52. Xia Y (2007). A constructive approach to the estimation of dimension reduction directions, The Annals of Statistics, 35, 2654-2690. https://doi.org/10.1214/009053607000000352
  53. Xia Y, Tong H, Li WK, and Zhu L (2002). An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society Series B: Statistical Methodology, 64, 363-410. https://doi.org/10.1111/1467-9868.03411
  54. Yin X and Hilafu H (2015). Sequential sufficient dimension reduction for large p, small n problems, Journal of the Royal Statistical Society Series B: Statistical Methodology, 77, 879-892. https://doi.org/10.1111/rssb.12093
  55. Yin X, Li B, and Cook RD (2008). Successive direction extraction for estimating the central subspace in a multiple-index regression, Journal of Multivariate Analysis, 99, 1733-1757. https://doi.org/10.1016/j.jmva.2008.01.006
  56. Yuan M and Lin Y (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society Series B: Statistical Methodology, 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  57. Zhang C-H (2010). Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38, 894-942. https://doi.org/10.1214/09-AOS729
  58. Zhu L-P, Zhu L-X, and Feng Z-H (2010). Dimension reduction in regressions through cumulative slicing estimation, Journal of the American Statistical Association, 105, 1455-1466. https://doi.org/10.1198/jasa.2010.tm09666
  59. Zou H (2006). The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 101, 1418-1429. https://doi.org/10.1198/016214506000000735
  60. Zou H and Hastie T (2005). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B: Statistical Methodology, 67, 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x