Browse > Article
http://dx.doi.org/10.5351/CKSS.2009.16.6.1005

The Unified Framework for AUC Maximizer  

Jun, Jong-Jun (Department of Statistics, Seoul National University)
Kim, Yong-Dai (Department of Statistics, Seoul National University)
Han, Sang-Tae (Department of Informational Statistics, Hoseo University)
Kang, Hyun-Cheol (Department of Informational Statistics, Hoseo University)
Choi, Ho-Sik (Department of Informational Statistics, Hoseo University)
Publication Information
Communications for Statistical Applications and Methods / v.16, no.6, 2009 , pp. 1005-1012 More about this Journal
Abstract
The area under the curve(AUC) is commonly used as a measure of the receiver operating characteristic(ROC) curve which displays the performance of a set of binary classifiers for all feasible ratios of the costs associated with true positive rate(TPR) and false positive rate(FPR). In the bipartite ranking problem where one has to compare two different observations and decide which one is "better", the AUC measures the quantity that ranking score of a randomly chosen sample in one class is larger than that of a randomly chosen sample in the other class and hence, the function which maximizes an AUC of bipartite ranking problem is different to the function which maximizes (minimizes) accuracy (misclassification error rate) of binary classification problem. In this paper, we develop a way to construct the unified framework for AUC maximizer including support vector machines based on maximizing large margin and logistic regression based on estimating posterior probability. Moreover, we develop an efficient algorithm for the proposed unified framework. Numerical results show that the propose unified framework can treat various methodologies successfully.
Keywords
ROC curve; AUC; bipartite ranking problem;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Agarwal, S., Graepel, T., Herbrich, R., Harpeled, S. and Roth, D. (2005). Generalization bounds for the area under the ROC curve, Journal of Machine Learning Research, 6, 393–425
2 Bach, F., Heckerman, D. and Horvitz, E. (2006). Considering cost asymmetry in learning classifiers, Journal of Machine Learning Research, 7, 1713–1741
3 Bartlett, P. and Tewari, A. (2007). Sparseness vs estimating conditional probabilities: Some asymptotic results, Journal of Machine Learning Research, 8, 775–790
4 Brefeld, U. and Scheffer, T. (2005). Auc maximizing support vector learning, In Proceedings of the ICML. 2005 Workshop on ROC Analysis in Machine Learning
5 Cl$\acute{e}$mencon, S., Lugosi, G. and Vayatis, N. (2006). From ranking to classification: A statistical view, From Data and Information Analysis to Knowledge Engineering, 214–221
6 Cl$\acute{e}$mencon, S., Lugosi, G. and Vayatis, N. (2008). Ranking and empirical minimization of Ustatistics, The Annals of Statistics, 36, 844–874
7 Cortes, C. and Mohri, M. (2004). Auc optimization vs. error rate minimization, In Flach, F. et al. (Eds.), In Advances in Neural Information Processing Systems, 16, MIT Press, Cambridge
8 Cortes, C. and Vapnik, V. (1995). Support-vector networks, Machine Learning, 20, 273–297   DOI
9 Freund, Y., Iyer, R., Schapire, R. E. and Singer, Y. (2003). An effcient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969
10 Friedman, J. (2008). Fast sparse regression and classification, Technical Report, Stanford University
11 Liu, Y. and Zhang, H. H. (2009). The large margin unified machines: A bridge between hard and soft classification. The 1st Institute of Mathematical Statistics Asia Pacific Rim Meeting & 2009 Conference of the Korean Statistical Society
12 Joachims, T. (2002). Optimizing search engines using clickthrough data, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD)   DOI
13 Kim, J. (2004). ROC and cost graphs for general cost matrix where correct classifications incur nonzero costs, Communications of the Korean Statistical Society, 11, 21–30
14 Kim, Y., Kim, K. and Song, S. (2005). Comparison of boosting and SVM, Journal of Korean Data & Information Science Society, 16, 999–1012
15 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B, 58, 267–288