Browse > Article
http://dx.doi.org/10.5351/CKSS.2010.17.6.755

Cost Ratios for Cost and ROC Curves  

Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University)
Yoo, Hyun-Sang (Department of Statistics, Sungkyunkwan University)
Publication Information
Communications for Statistical Applications and Methods / v.17, no.6, 2010 , pp. 755-765 More about this Journal
Abstract
For classification problems on mixture distribution, a threshold based on cost functions is optimal from the viewpoint of a minimum expected cost. Assuming that there is no cost information, we propose cost ratios in the expected cost corresponding to thresholds where the total accuracy and the true rate are maximized to explain the relation of these cost ratios minimizing the expected cost. Other cost ratios are also proposed by comparing the normalized expected costs when classification accuracy is maximized. The values of these cost ratios are located between two cost ratios for the expected costs based on classification accuracies, and converge to that of the minimum expected cost. This work suggests two cost ratios: one is minimized by the expected cost and the normalized expected cost, and the other in the expected cost and the normalized expected cost functions that are maximized classification accuracies. We discuss their compatibility based on the relation of these cost ratios.
Keywords
Classification accuracy; credit evaluation; default; expected cost; discriminant power; threshold;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction, University Press, Oxford.
2 Provost, F. and Fawcett, T. (1997). Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions, In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, 43–48.
3 Servigny, A. D. and Renault, O. (2004). Measuring and Managing Credit Risk, McGraw-Hill, New York.
4 Tasche, D. (2006). Validation of Internal Rating Systems and PD Estimates, arXiv.org, eprint arXiv:physics/0606071.
5 Turney, P. D. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm, Journal of Artificial Intelligence Research, 2, 369–409.   DOI
6 Velez, D. R., White, B. C., Motsinger, A. A., Bush,W. S., Ritchie, M. D.,Williams, S. M. and Moore, J. H. (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction, Genetic Epidemiology, 31, 306–315.
7 Vuk, M. and Curk, T. (2006). ROC curve, lift chart and calibration plot, Metodoloski Zvezki, 3, 89–108.
8 Zhou, X. H., Obuchowski, N. A. and McClish, D. K. (2002). Statistical Methods in Diagnostic Medicine, Wiley, New York.
9 Zweig, M. H. and Campbell, G. (1993). Receiver-operating characteristic(ROC) plots: A fundamental evaluation tool in clinical medicine, Clinical Chemistry, 39, 561–577.
10 Hand, D. J. (2009). Mismatched models, wrong results, and dreadful decisions: on choosing appropriate data mining tools, In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.   DOI
11 Hand, D. J. and Zhou, F. (2009). Evaluating models for classifying customers in retail banking collections, Journal of the Operational Society, doi:10.1057/jors.2009.129.   DOI
12 Hilden, J. and Glasziou, P. (1996). Regret graphs, diagnostic uncertainty and Youden’s index, Statistics in Medicine, 15, 969–986.
13 Kaivanto, K. (2008). Maximization of the sum of sensitivity and specificity as a diagnostic cutpoint criterion, Journal of Clinical Epidemiology, 61, 516–518.
14 Holte, R. C. and Drummond, C. (2008). Cost-sensitive classifier evaluation using cost curves, Advances in Knowledge Discovery and Data Mining, 5012, 26–29.   DOI
15 Hoshino, R., Coughtrey, D., Sivaraja, S., Volnyansky, I., Auer, S. and Trishtchenko, A. (2009). Applications and extensions of cost curves to marine container inspection, Annals of Operations Research, doi: 10.1007/s10479-009-0669-2.   DOI
16 Jund, J., Rabilloud, M., Wallon, M. and Ecochard, R. (2005). Methods to estimate the optimal threshold for normally or log-narmally distributed biological tests, Medical Decision Making, 25, 406–415.
17 Krzanowski, W. J. and Hand, D. J. (2009). ROC Curves for Continuous Data, Champman & Hall/CRC, Boca Raton, Florida.
18 Liu, Y. (2002). The evaluation of classification models for credit scoring, Arbeitsbericht, 2, 1–65.
19 Liu, Y. and Shriberg, E. (2007). Comparing evaluation metrics for sentence boundary detection, In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP’07), 4, 185–188.
20 Metz, C. E. (1978). Basic principles of ROC analysis, Seminars in Nuclear Medicine, 8, 283–298.   DOI
21 김지현 (2004). ROC and cost graphs for general cost matrix where correct classifications incur non-zero costs, <한국통계학회논문집>, 11, 21–30.   과학기술학회마을   DOI   ScienceOn
22 홍종선, 주재선, 최진수 (2010). 혼합분포에서의 최적분류점, <응용통계연구>, 23, 13-28.   DOI   ScienceOn
23 홍종선, 최진수 (2009). ROC와 CAP 곡선에서의 최적분류점, <응용통계연구>, 22, 911-921.   DOI
24 Cantor, S. B., Sun, C. C., Tortolero-Luna, G., Richards-Kortum, R. and Follen, M. (1999). A comparison of C/B ratios from studies using receiver operating characteristic curve analysis, Journal of Clinical Epidemiology, 52, 885–892.   DOI
25 Adams, N. M. and Hand, D. J. (1999). Comparing classifiers when the misallocation costs are uncertain, Pattern Recognition, 30, 1139–1147.
26 Antonie, M. L., Zaiane, O. R. and Holte, R. C. (2006). Learning to use a learned model: A two-stage approach to classication, In Proceedings of the 6th IEEE International Conference on Data Mining(ICDM’06), 33–42.   DOI
27 Briggs, W. M. and Zaretzki, R. (2007). The skill plot: a graphical technique for the evaluating the predictive usefulness of continuous diagnostic tests, Biometrics, 63, 250–261.   DOI
28 Davis, J. and Goadrich, M. (2006). The relationship between precision-recall and ROC curves, In Proceedings of the 23rd International Conference on Machine Learning(ICML’06), 233–240.   DOI
29 Drummond, C. and Holte, R. C. (2006). Cost curves: an improved method for visualizing classifier performance, Machine Learning, 65, 95–130.   DOI
30 Fawcett, T. (2006). ROC graphs with instance-varying costs, Pattern Recognition Letters archive, 27, 882–891.   DOI
31 Fielding, A. H. and Bell, J. F. (1997). A review of methods for the measurement of prediction errors in conservation presence/absence models, Environmental Conservation, 24, 38–49.