Browse > Article
http://dx.doi.org/10.5351/KJAS.2014.27.5.773

Alternative Optimal Threshold Criteria: MFR  

Hong, Chong Sun (Department of Statistics, Sungkyunkwan University)
Kim, Hyomin Alex (Department of Statistics, Sungkyunkwan University)
Kim, Dong Kyu (Department of Statistics, Sungkyunkwan University)
Publication Information
The Korean Journal of Applied Statistics / v.27, no.5, 2014 , pp. 773-786 More about this Journal
Abstract
We propose the multiplication of false rates (MFR) which is a classification accuracy criteria and an area type of rectangle from ROC curve. Optimal threshold obtained using MFR is compared with other criteria in terms of classification performance. Their optimal thresholds for various distribution functions are also found; consequently, some properties and advantages of MFR are discussed by comparing FNR and FPR corresponding to optimal thresholds. Based on general cost function, cost ratios of optimal thresholds are computed using various classification criteria. The cost ratios for cost curves are observed so that the advantages of MFR are explored. Furthermore, the de nition of MFR is extended to multi-dimensional ROC analysis and the relations of classification criteria are also discussed.
Keywords
Classification performance; confusion matrix; cost ratio; default; threshold;
Citations & Related Records
Times Cited By KSCI : 8  (Citation Analysis)
연도 인용수 순위
1 Adams, N. M. and Hand, D. J. (1999). Comparing classifiers when the misallocation costs are uncertain, Pattern Recognition, 30, 1139-1147.
2 Antonie, M. L., Zaiane, O. R. and Holte, R. C. (2006). Learning to use a learned model: A two-stage approach to classification, Proceedings of the 6th IEEE International Conference on Data Mining(ICDM'06), 33-42.
3 Brasil, P. (2010). Diagnostic test accuracy evaluation for medical professionals, Package DiagnosisMed in R.
4 Briggs, W. M. and Zaretzki, R. (2007). The skill plot: A graphical technique for the evaluating the predictive usefulness of continuous diagnostic tests, Biometrics, 63, 250-261.
5 Davis, J. and Goadrich, M. (2006). The relationship between precision-recall and ROC curves, Proceedings of the 23rd International Conference on Machine Learning(ICML'06), 233-240.
6 Drummond, C. and Holte, R. C. (2006). Cost curves: An improved method for visualizing classifier performance, Machine Learning, 65, 95-130.   DOI
7 Engelmann, B., Hayden, E. and Tasche, D. (2003). Measuring the discriminative power of rating systems, Discussion paper, Series 2: Banking and Financial Supervision, Frankfurt.
8 Fawcett, T. (2003). ROC Graphs: Notes and practical considerations for data mining researchers, Technical Report HPL-2003-4, HP Laboratories Palo Alto, 1-28, Palo Alto.
9 Freeman, E. A. and Moisen, G. (2008). PresenceAbsence: An R package for presence absence analysis, Journal of Statistical Software, 23 1-31.
10 Greiner, M. and Gardner, I. A. (2000). Epidemiologic issues in the validation of veterinary diagnostic tests, Preventive Veterinary Medicine, 45, 3-22.   DOI   ScienceOn
11 Hand, D. J. (2009). Mismatched models, wrong results, and dreadful decisions: On choosing appropriate data mining tools, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.
12 Hand, D. J. and Zhou, F. (2009). Evaluating models for classifying customers in retail banking collections, Journal of the Operational Society, DOI: 10.1057/jors.2009.129, London.   DOI
13 Hilden, J. and Glasziou, P. (1996). Regret graphs, diagnostic uncertainty and Youden's index, Statistics in Medicine, 15, 969-986.   DOI   ScienceOn
14 Holte, R. C. and Drummond, C. (2008). Cost-sensitive classifier evaluation using cost curves, Advances in Knowledge discovery and Data Mining, 5012, 26-29
15 Hong, C. S. (2009). Optimal threshold from ROC and CAP curves, Communications in Statistics-Simulation and Computation, 38, 2060-2072.   DOI   ScienceOn
16 Hong, C. S. and Lee W. Y. (2011). ROC curve fitting with normal mixture, The Korean Journal of Applied Statistics, 24, 269-278.   DOI
17 Hong, C. S. and Yoo, H. S. (2010). Cost ratios for cost and ROC curves, Communications of The Korean Statistical Society, 17, 755-765.   과학기술학회마을   DOI
18 Hong, C. S. and Joo, J. S. (2010). Optimal thresholds from non-normal mixture, The Korean Journal of Applied Statistics, 23, 943-953.   과학기술학회마을   DOI   ScienceOn
19 Hong, C. S., Joo, J. S. and Choi, J. S. (2010). Optimal thresholds from mixture distributions, The Korean Journal of Applied Statistics, 23, 13-28.   과학기술학회마을   DOI
20 Hong, C. S., Lin, M. H. and Hong, S.W. (2011). ROC function estimation, The Korean Journal of Applied Statistics, 24, 987-994.   과학기술학회마을   DOI
21 Hong, C. S., Jung, E. S. and Jung, D. G. (2013). Standard criterion of VUS for ROC surface, The Korean Journal of Applied Statistics, 26, 977-985.   과학기술학회마을   DOI
22 Hoshino, R., Coughtrey, D., Sivaraja, S., Volnyansky, I., Auer, S. and Trishtchenko, A. (2009). Applications and extensions of cost curves to marine container inspection, Annals of Operations Research, DOI: 10.1007/s10479-009-0669-2.   DOI
23 Jund, J., Rabillous, M., Wallon, M. and Ecochard, R. (2005). Methods to estimate the optimal threshold for normally or log-normally distributed biological tests, Medical Decision Making, 25, 406-415.   DOI
24 Kim, J. H. (2004). Roc and cost graphs for general cost matrix where correct classification incur non-zero costs, Communications of the Korean Statistical Society, 11, 21-30.   DOI
25 Liu, Y. and Shriberg, E. (2007). Comparing evaluation metrics for sentence boundary detection, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP'07), 4, 185-188.
26 Liu, Z., Tan, M. and Jiang, F. (2009). Regularized F-measure maximization for feature selection and classification, Journal of Biomedicine and Biotechnology, 617946.
27 Lambert, J. and Lipkovich, I. (2008). A macro for getting more out of your ROC curve, SAS Global forum, paper 231, Indianapolis.
28 Metz, C. E. (1978). Basic principles of ROC analysis, Seminars in Nuclear Medicine, 8, 283-298.
29 Provost, F. and Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, 43-48.
30 Pepe, M. S. (2003). The statistical Evaluation of Medical Tests for Classification and Prediction, Oxford University Press, Oxford.
31 Sobehart, J. and Keenan, S. C. (2001). Measuring default accurately, Credit Risk Special Report, Risk, 14, 31-33.
32 Tasche, D. (2006). Validation of internal rating systems and PD estimates, arXiv.org, eprint arXiv: physics/0606071, Frankfurt.
33 Turney, P. D. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm, Journal of Artificial Intelligence Research, 2, 369-409.
34 Velez, D. R., White, B. C., Motsinger, A. A., Bush, W. S., Ritichie, M. D., Williams, S. M. and Moore, J. H. (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction, Genetic Epidemiology, 31, 306-315.   DOI   ScienceOn
35 Vuk, M. and Curk, T. (2006). ROC curve, lift chart and calibration plot, Metodoloki zvezki, 3, 89-108
36 Yoo, H. S. and Hong, C. S. (2011). Optimal criterion of classification accuracy measures for normal mixture, Communications of The Korean Statistical Society, 18, 343-355.   과학기술학회마을   DOI
37 Zhou, X. H., Obuchowski, N. A. and McClish, D. K. (2002). Statistical Methods in Diagnostic Medicine, Wiley, New York.
38 Drummond, C. and Holte, R. (2000). Explicitly representing expected cost: An alternative to ROC repre- sentation, Technical Report, School of Information Technology and Engineering, University of Ottawa.
39 Cantor, S. B., Sun, C. C., Tortolero-Luna, G., Richards-Korturn, R. and Follen, M. (1999). A comparison of CB ratios from studies using receiver operating characteristic curve analysis, Journal of Clinical Epidemiology, 52, 885-892.   DOI   ScienceOn