DOI QR코드

DOI QR Code

L1-penalized AUC-optimization with a surrogate loss

  • Hyungwoo Kim (Department of Statistics and Data Science, Pukyong National University) ;
  • Seung Jun Shin (Department of Statistics, Korea University)
  • Received : 2024.01.04
  • Accepted : 2024.02.23
  • Published : 2024.03.31

Abstract

The area under the ROC curve (AUC) is one of the most common criteria used to measure the overall performance of binary classifiers for a wide range of machine learning problems. In this article, we propose a L1-penalized AUC-optimization classifier that directly maximizes the AUC for high-dimensional data. Toward this, we employ the AUC-consistent surrogate loss function and combine the L1-norm penalty which enables us to estimate coefficients and select informative variables simultaneously. In addition, we develop an efficient optimization algorithm by adopting k-means clustering and proximal gradient descent which enjoys computational advantages to obtain solutions for the proposed method. Numerical simulation studies demonstrate that the proposed method shows promising performance in terms of prediction accuracy, variable selectivity, and computational costs.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-00242528).

References

  1. Agarwal S, Graepel T, Herbrich R, Har-Peled S, Roth D, and Jordan MI (2005). Generalization bounds for the area under the ROC curve, Journal of Machine Learning Research, 6, 393-425.
  2. Arzhaeva Y, Duin RP, and Tax D (2006). Linear model combining by optimizing the area under the ROC curve, Proceedings of 18th International Conference on Pattern Recognition (ICPR'06), Vol.4, IEEE, 119-122.
  3. Ataman K, Street WN, and Zhang Y (2006). Learning to rank by maximizing AUC with linear programming, The 2006 IEEE International Joint Conference on Neural Network Proceedings, IEEE, pp. 123-129.
  4. Bouveyron C and Brunet-Saumard C (2014). Model-based clustering of high-dimensional data: A review, Computational Statistics & Data Analysis, 71, 52-78.
  5. Brefeld U and Scheffer T (2005). AUC maximizing support vector learning, Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning.
  6. Clemencon S, Depecker M, and Vayatis N (2013). An empirical comparison of learning algorithms ' for nonparametric scoring: The treerank algorithm and other methods, Pattern Analysis and Applications, 16, 475-496. https://doi.org/10.1007/s10044-012-0299-1
  7. Clemencon S, Lugosi G, and Vayatis N (2008). Ranking and empirical minimization of U-statistics, The Annals of Statistics, 36, 844-874. https://doi.org/10.1214/009052607000000910
  8. Cohen MB, Elder S, Musco C, Musco C, and Persu M (2015). Dimensionality reduction for k-means clustering and low rank approximation, Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, 163-172.
  9. Combettes PL and Wajs VR (2005). Signal recovery by proximal forward-backward splitting, Multiscale Modeling & Simulation, 4, 1168-1200.
  10. Cortes C and Vapnik V (1995). Support-vector networks, Machine Learning, 20, 273-297. https://doi.org/10.1007/BF00994018
  11. Duda RO, Hart PE, and Stork DG (1973). Pattern Classification and Scene Analysis, Vol.3, Wiley New York.
  12. Egan JP (1975). Signal detection theory and ROC analysis, (No Title), Available from: https://www.amazon.com/Detection-Analysis-Academic-Cognition-Perception/dp/0122328507
  13. Feldman V, Guruswami V, Raghavendra P, and Wu Y (2012). Agnostic learning of monomials by halfspaces is hard, SIAM Journal on Computing, 41, 1558-1590. https://doi.org/10.1137/120865094
  14. Gao W and Zhou Z-H (2015). On the consistency of AUC pairwise optimization, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence.
  15. Kim D and Shin SJ (2020). The regularization paths for the ROC-optimizing support vector machines, Journal of the Korean Statistical Society, 49, 264-275. https://doi.org/10.1007/s42952-019-00017-9
  16. Kim H, Sohn I, and Shin SJ (2021). Regularization paths of L1-penalized ROC curve-optimizing support vector machines, Stat, 10, e400.
  17. Lei Y and Ying Y (2021). Stochastic proximal AUC maximization, The Journal of Machine Learning Research, 22, 2832-2876.
  18. Liu M, Yuan Z, Ying Y, and Yang T (2019). Stochastic auc maximization with deep neural networks, Available from: arXiv preprint arXiv:1908.10831
  19. Menon AK and Williamson RC (2016). Bipartite ranking: A risk-theoretic perspective, The Journal of Machine Learning Research, 17, 6766-6867.
  20. Natole M, Ying Y, and Lyu S (2018). Stochastic proximal algorithms for AUC maximization, International Conference on Machine Learning, PMLR, 80, 3710-3719.
  21. Norton M and Uryasev S (2019). Maximization of auc and buffered auc in binary classification, Mathematical Programming, 174, 575-612. https://doi.org/10.1007/s10107-018-1312-2
  22. Rakotomamonjy A (2004). Optimizing area under Roc curve with SVMs, ROCAI, 71-80.
  23. Rockafellar RT (1976). Monotone operators and the proximal point algorithm, SIAM Journal on Control and Optimization, 14, 877-898. https://doi.org/10.1137/0314056
  24. Rousseeuw PJ (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  25. Thorndike RL (1953). Who belongs in the family?, Psychometrika, 18, 267-276. https://doi.org/10.1007/BF02289263
  26. Tian Y, Shi Y, Chen X, and Chen W (2011). AUC maximizing support vector machines with feature selection, Procedia Computer Science, 4, 1691-1698. https://doi.org/10.1016/j.procs.2011.04.183
  27. Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  28. Uematsu K and Lee Y (2017). On theoretically optimal ranking functions in bipartite ranking, Journal of the American Statistical Association, 112, 1311-1322. https://doi.org/10.1080/01621459.2016.1215988
  29. Yang Z, Shen W, Ying Y, and Yuan X (2020). Stochastic AUC optimization with general loss, Communications on Pure & Applied Analysis, 19, 4191-4212.
  30. Ying Y, Wen L, and Lyu S (2016). Stochastic online AUC maximization, Advances in Neural Information Processing Systems, 29.
  31. Zhang X, Saha A, and Vishwanathan S (2012). Smoothing multivariate performance measures, The Journal of Machine Learning Research, 13, 3623-3680. https://doi.org/10.1002/9780470057339.vnn052
  32. Zhao P, Hoi SC, Jin R, and Yang T (2011). Online AUC maximization, Available from: https://icml.cc/2011/papers/198icmlpaper.pdf
  33. Zhu J, Rosset S, Tibshirani R, and Hastie T (2003). 1-norm support vector machines, Advances in Neural Information Processing Systems, 16.