[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/CKSS.2010.17.6.755

Cost Ratios for Cost and ROC Curves

Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University)
Yoo, Hyun-Sang (Department of Statistics, Sungkyunkwan University)

Publication Information

Communications for Statistical Applications and Methods / v.17, no.6, 2010 , pp. 755-765 More about this Journal

Abstract

For classification problems on mixture distribution, a threshold based on cost functions is optimal from the viewpoint of a minimum expected cost. Assuming that there is no cost information, we propose cost ratios in the expected cost corresponding to thresholds where the total accuracy and the true rate are maximized to explain the relation of these cost ratios minimizing the expected cost. Other cost ratios are also proposed by comparing the normalized expected costs when classification accuracy is maximized. The values of these cost ratios are located between two cost ratios for the expected costs based on classification accuracies, and converge to that of the minimum expected cost. This work suggests two cost ratios: one is minimized by the expected cost and the normalized expected cost, and the other in the expected cost and the normalized expected cost functions that are maximized classification accuracies. We discuss their compatibility based on the relation of these cost ratios.

Keywords

Classification accuracy; credit evaluation; default; expected cost; discriminant power; threshold;

Citations & Related Records

Times Cited By KSCI : 3 (Citation Analysis)

Reference
Cited By KSCI

1	Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction, University Press, Oxford.
2	Provost, F. and Fawcett, T. (1997). Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions, In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, 43–48.
3	Servigny, A. D. and Renault, O. (2004). Measuring and Managing Credit Risk, McGraw-Hill, New York.
4	Tasche, D. (2006). Validation of Internal Rating Systems and PD Estimates, arXiv.org, eprint arXiv:physics/0606071.
5	Turney, P. D. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm, Journal of Artificial Intelligence Research, 2, 369–409. DOI
6	Velez, D. R., White, B. C., Motsinger, A. A., Bush,W. S., Ritchie, M. D.,Williams, S. M. and Moore, J. H. (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction, Genetic Epidemiology, 31, 306–315.
7	Vuk, M. and Curk, T. (2006). ROC curve, lift chart and calibration plot, Metodoloski Zvezki, 3, 89–108.
8	Zhou, X. H., Obuchowski, N. A. and McClish, D. K. (2002). Statistical Methods in Diagnostic Medicine, Wiley, New York.
9	Zweig, M. H. and Campbell, G. (1993). Receiver-operating characteristic(ROC) plots: A fundamental evaluation tool in clinical medicine, Clinical Chemistry, 39, 561–577.
10	Hand, D. J. (2009). Mismatched models, wrong results, and dreadful decisions: on choosing appropriate data mining tools, In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. DOI
11	Hand, D. J. and Zhou, F. (2009). Evaluating models for classifying customers in retail banking collections, Journal of the Operational Society, doi:10.1057/jors.2009.129. DOI
12	Hilden, J. and Glasziou, P. (1996). Regret graphs, diagnostic uncertainty and Youden’s index, Statistics in Medicine, 15, 969–986.
13	Kaivanto, K. (2008). Maximization of the sum of sensitivity and specificity as a diagnostic cutpoint criterion, Journal of Clinical Epidemiology, 61, 516–518.
14	Holte, R. C. and Drummond, C. (2008). Cost-sensitive classifier evaluation using cost curves, Advances in Knowledge Discovery and Data Mining, 5012, 26–29. DOI
15	Hoshino, R., Coughtrey, D., Sivaraja, S., Volnyansky, I., Auer, S. and Trishtchenko, A. (2009). Applications and extensions of cost curves to marine container inspection, Annals of Operations Research, doi: 10.1007/s10479-009-0669-2. DOI
16	Jund, J., Rabilloud, M., Wallon, M. and Ecochard, R. (2005). Methods to estimate the optimal threshold for normally or log-narmally distributed biological tests, Medical Decision Making, 25, 406–415.
17	Krzanowski, W. J. and Hand, D. J. (2009). ROC Curves for Continuous Data, Champman & Hall/CRC, Boca Raton, Florida.
18	Liu, Y. (2002). The evaluation of classification models for credit scoring, Arbeitsbericht, 2, 1–65.
19	Liu, Y. and Shriberg, E. (2007). Comparing evaluation metrics for sentence boundary detection, In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP’07), 4, 185–188.
20	Metz, C. E. (1978). Basic principles of ROC analysis, Seminars in Nuclear Medicine, 8, 283–298. DOI
21	김지현 (2004). ROC and cost graphs for general cost matrix where correct classifications incur non-zero costs, <한국통계학회논문집>, 11, 21–30. 과학기술학회마을 DOI ScienceOn
22	홍종선, 주재선, 최진수 (2010). 혼합분포에서의 최적분류점, <응용통계연구>, 23, 13-28. DOI ScienceOn
23	홍종선, 최진수 (2009). ROC와 CAP 곡선에서의 최적분류점, <응용통계연구>, 22, 911-921. DOI
24	Cantor, S. B., Sun, C. C., Tortolero-Luna, G., Richards-Kortum, R. and Follen, M. (1999). A comparison of C/B ratios from studies using receiver operating characteristic curve analysis, Journal of Clinical Epidemiology, 52, 885–892. DOI
25	Adams, N. M. and Hand, D. J. (1999). Comparing classifiers when the misallocation costs are uncertain, Pattern Recognition, 30, 1139–1147.
26	Antonie, M. L., Zaiane, O. R. and Holte, R. C. (2006). Learning to use a learned model: A two-stage approach to classication, In Proceedings of the 6th IEEE International Conference on Data Mining(ICDM’06), 33–42. DOI
27	Briggs, W. M. and Zaretzki, R. (2007). The skill plot: a graphical technique for the evaluating the predictive usefulness of continuous diagnostic tests, Biometrics, 63, 250–261. DOI
28	Davis, J. and Goadrich, M. (2006). The relationship between precision-recall and ROC curves, In Proceedings of the 23rd International Conference on Machine Learning(ICML’06), 233–240. DOI
29	Drummond, C. and Holte, R. C. (2006). Cost curves: an improved method for visualizing classifier performance, Machine Learning, 65, 95–130. DOI
30	Fawcett, T. (2006). ROC graphs with instance-varying costs, Pattern Recognition Letters archive, 27, 882–891. DOI
31	Fielding, A. H. and Bell, J. F. (1997). A review of methods for the measurement of prediction errors in conservation presence/absence models, Environmental Conservation, 24, 38–49.

1	Alternative accuracy for multiple ROC analysis / [Hong, Chong Sun;Wu, Zhi Qiang;] / Journal of the Korean Data and Information Science Society
2	Alternative Optimal Threshold Criteria: MFR / [Hong, Chong Sun;Kim, Hyomin Alex;Kim, Dong Kyu;] / The Korean Journal of Applied Statistics

5	Chong Sun Hong. (2014) Korean Journal of Applied Statistics Alternative Optimal Threshold Criteria: MFR / 27 (5) , 773
6	Chong Sun Hong. (2014) Journal of the Korean Data and Information Science Society Alternative accuracy for multiple ROC analysis / 25 (6) , 1521

KSCI

Cost Ratios for Cost and ROC Curves 비용곡선과 ROC곡선에서의 비용비율

Cost Ratios for Cost and ROC Curves