DOI QR코드

DOI QR Code

혼동행렬의 상관계수를 이용한 최적분류점

Optimal threshold using the correlation coefficient for the confusion matrix

  • 홍종선 (성균관대학교 통계학과) ;
  • 오세현 (성균관대학교 통계학과) ;
  • 최예원 (성균관대학교 통계학과)
  • Hong, Chong Sun (Department of Statistics, Sungkyunkwan University) ;
  • Oh, Se Hyeon (Department of Statistics, Sungkyunkwan University) ;
  • Choi, Ye Won (Department of Statistics, Sungkyunkwan University)
  • 투고 : 2021.10.18
  • 심사 : 2021.12.15
  • 발행 : 2022.02.28

초록

의학통계와 신용평가 분야에서 혼합분포함수를 판별하는 최적분류점 추정하기 위하여 판별력을 측정하는 다양한 정확도 측도들이 존재한다. 최근에 혼동행렬 빈도수로 표현되는 Matthews의 상관계수와 정밀도와 재현율의 조화평균인 F1 통계량의 정확도 측도들이 최적분류점을 추정하는데 연구되었다. 본 연구에서는 이런 정확도 측도들 중에서 표본크기에 의존하는 정확도 측도들은 두 표본크기 차이가 많은 경우에 최적분류점을 설정하는데 적절하지 않음을 발견한다. 그리고 대안적인 정확도 측도로 혼동행렬의 비율들의 함수인 상관계수를 정의하고, 이를 최대화하는 분류점을 최적분류점으로 추정하는 방법을 제안하고 이 방법의 유용성과 활용성에 대하여 토론한다.

The optimal threshold estimation is considered in order to discriminate the mixture distribution in the fields of Biostatistics and credit evaluation. There exists well-known various accuracy measures that examine the discriminant power. Recently, Matthews correlation coefficient and the F1 statistic were studied to estimate optimal thresholds. In this study, we explore whether these accuracy measures are appropriate for the optimal threshold to discriminate the mixture distribution. It is found that some accuracy measures that depend on the sample size are not appropriate when two sample sizes are much different. Moreover, an alternative method for finding the optimal threshold is proposed using the correlation coefficient that defines the ratio of the confusion matrix, and the usefulness and utility of this method are also discusses.

키워드

참고문헌

  1. Altman DG and Bland JM (1994). Diagnostic tests. 1: sensitivity and specificity, British Medical Journal, 308, 1552. https://doi.org/10.1136/bmj.308.6943.1552
  2. Bamber D (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, Journal of mathematical psychology, 12, 387-415. https://doi.org/10.1016/0022-2496(75)90001-2
  3. Bradley AP (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognitions, 30, 1145-1159. https://doi.org/10.1016/S0031-3203(96)00142-2
  4. Brasil P (2010). Diagnostic Test Accuracy Evaluation for Medical Professionals, Package DiagnosisMed in R.
  5. Cao C, Chicco D, and Holfman MM (2020). The MCC-F1 Curve: A Performance Evaluation Technique for Binary Classification, arXiv Preprint arXiv:2006,11278.
  6. Chicco D and Jurman G (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC Genomics, 21, 1-13. https://doi.org/10.1186/s12864-019-6419-1
  7. Centor RN (1991). Signal detectability: The use of ROC curves and their analyses, Medical Decision Making, 11, 102-106. https://doi.org/10.1177/0272989x9101100205
  8. Connell FA and Koepsell TD (1985). Measures of gain in certainty from a diagnostic test, American Journal of Epidemiology, 121, 744-753. https://doi.org/10.1093/aje/121.5.744
  9. Egan JP (1975). Signal detection theory and ROC-analysis, New York, Academic press.
  10. Engelmann B, Hayden E, and Tasche D (2003). Testing rating accuracy, Risk, 16, 82-86.
  11. Fawcett T (2006). An introduction to ROC analysis, Pattern Recognition Letters, 27, 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
  12. Fawcett T and Provost F (1997). Adaptive fraud detection, Data Mining and Knowledge Discovery, 1, 291-316. https://doi.org/10.1023/A:1009700419189
  13. Green DM and Swets JA (1966). Signal detection theory and psychophysics, 1, New York: Wiley.
  14. Hanley JA and McNeil BJ (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29-36. https://doi.org/10.1148/radiology.143.1.7063747
  15. Hong CS (2000). Estimation and Hypothesis Testing, Freedom academy, Seoul.
  16. Hong CS and Jang DH (2020). Validation ratings for the length of the ROC curve, Journal of the Korean Data & Information Science Society, 31, 851-863. https://doi.org/10.7465/jkdi.2020.31.5.851
  17. Hong CS, Joo JS, and Choi JS (2010). Optimal thresholds from mixture distributions, The Korean Journal of Applied Statistics, 23, 13-28. https://doi.org/10.5351/KJAS.2010.23.1.013
  18. Hong CS and Lee SJ (2018). TROC curve and accuracy measures, Journal of the Korean Data & Information Science Society, 29, 861-872. https://doi.org/10.7465/jkdi.2018.29.4.861
  19. Hong CS and Lim HS (1997). Comparison analysis of association measures for categorical data, Communications for Statistical Applications and Methods, 4, 645-661.
  20. Hong CS, Lin MH, Hong SW, and Kim GC (2011). Classification accuracy measures with minimum error rate for normal mixture, Journal of the Korean Data & Information Science Society, 22, 619-630.
  21. Hsieh F and Turnbull BW (1996). Nonparametric and semiparametric estimation of the receiver operating characteristic curve, The Annals of Statistics, 24, 25-40.
  22. Krzanowski WJ and Hand DJ (2009). ROC Curves for Continuous Data, CRC Press, New York.
  23. Lambert J and Lipkovich I (2008). A macro for getting more out of your ROC curve, SAS Global Forum, 231.
  24. Matthews BW (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochimica et Biophysica Acta (BBA)-Protein Structure, 405, 442-451. https://doi.org/10.1016/0005-2795(75)90109-9
  25. McDermott J and Forsyth RS (2016). Diagnosing a disorder in a classification benchmark, Pattern Recognition Letters, 73, 41-43. https://doi.org/10.1016/j.patrec.2016.01.004
  26. Metz CE (1978). Basic principles of ROC analysis, In Seminars in nuclear medicine, 8, 283-298. https://doi.org/10.1016/S0001-2998(78)80014-2
  27. Metz CE and Kronman HB (1980). Statistical significance tests for binormal ROC curves, Journal of Mathematical Psychology, 22, 218-243. https://doi.org/10.1016/0022-2496(80)90020-6
  28. Moses LE, Shapiro D, and Littenberg B (1993). Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations, Statistics in Medicine, 12, 1293-1316. https://doi.org/10.1002/sim.4780121403
  29. Pepe MS (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford university press, Oxford.
  30. Perkins NJ and Schisterman EF (2006). The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve, American Journal of Epidemiology, 163, 670-675. https://doi.org/10.1093/aje/kwj063
  31. Provost F and Fawcett T (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231. https://doi.org/10.1023/a:1007601015854
  32. Spackman KA (1989). Signal detection theory: valuable tools for evaluating inductive learning, The Analytics of Risk Model Validation, San Mateo, 160-163.
  33. Sokolova M, Japkowicz N, and Szpakowicz S (2006). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation, In Australian Joint Conference on Artificial Intelligence, Springer, Berlin.
  34. Swets JA (1988). Measuring the accuracy of diagnostic systems, Science, 240, 1285-1293. https://doi.org/10.1126/science.3287615
  35. Tasche D (2008). Validation of internal rating systems and PD estimates, The Analytics of Risk Model Validation, 169-196.
  36. Vuk M and Curk T (2006). ROC curve, lift chart and calibration plot, MetodoloskiZvezki, 3, 89-108.
  37. Yoo HS and Hong CS (2011). Optimal criterion of classification accuracy measures for normal mixture, Communications for Statistical Applications and Methods, 18, 343-355. https://doi.org/10.5351/CKSS.2011.18.3.343
  38. Youden WJ (1950). Index for rating diagnostic test, Cancer, 3, 32-35. https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3
  39. Zweig MH and Campbell G (1993). Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine, Clinical chemistry, 39, 561-577. https://doi.org/10.1093/clinchem/39.4.561