Undecided inference using the difference of AUCs

Hong, Chong Sun;Na, Hae Rin;

doi:10.5351/KJAS.2021.34.2.141

응용통계연구 (The Korean Journal of Applied Statistics)

제34권2호
/
Pages.141-152
/
2021
/
1225-066X(pISSN)
/
2383-5818(eISSN)

한국통계학회 (The Korean Statistical Society)

DOI QR Code

AUC 차이를 이용한 미결정자 추론방법

Undecided inference using the difference of AUCs

홍종선 (성균관대학교 통계학과) ;
나해린 (성균관대학교 통계학과)

Hong, Chong Sun (Department of Statistics, Sungkyunkwan University) ;
Na, Hae Rin (Department of Statistics, Sungkyunkwan University)

투고 : 2020.11.09
심사 : 2021.01.05
발행 : 2021.04.30

https://doi.org/10.5351/KJAS.2021.34.2.141 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

미결정자 추론을 재평가하기 위해 기존 변수에 새로운 변수들을 추가하는 통계 모형이 필요하다. 미결정자와 결정자의 양성률은 다르게 계산되기 때문에 MNAR 가정이 필요하다. 본 연구에서는 두 통계적 모형이 계층 관계를 가지고 있으므로, 두 AUC 차이의 신뢰구간을 이용하여 MNAR 가정하에서 미결정자를 추론한다. AUC 차이 신뢰구간의 추정방법 중에서 모의실험을 통하여 네 종류의 방법의 성능이 우수함을 발견하였다. 그리고 네 종류의 방법을 바탕으로 로지스틱 회귀를 이용한 미결정자 추론에 도움이 되는 변수를 선택하는 방법을 제안한다.

A new statistical model needs additional variables in order to re-evaluate the undecided inference. Then the MNAR assumption is required, since the probabilities for the positivity of the indeterminant and the determinant is calculated differently. In this study, since two statistical models have a hierarchical relationship, we determine the undecided inference under the MNAR assumption using the confidence interval of the difference between two AUCs. Among many methods of estimating the confidence interval of the AUC difference, it is found that four kinds of methods show excellent performance through simulations. And based on these methods, we propose a variable selection method that are useful for the undecided inference using logistic regression models.

키워드

참고문헌

Bandos, A. I., Rockette, H. E., and Gur, D. (2007). Exact bootstrap variances of the area under ROC curve. Communications in Statistics-Theory and Methods, 36, 2443-2461. https://doi.org/10.1080/03610920701215811
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition, 30, 1145-1159. https://doi.org/10.1016/S0031-3203(96)00142-2
Centor, R. M. (1991). Signal detectability: the use of ROC curves and their analyses. Medical decision making, 11, 102-106. https://doi.org/10.1177/0272989X9101100205
DeLong, E. R., DeLong, D. M. and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach, Biometrics, 44, 837-845. https://doi.org/10.2307/2531595
Egan, J. P. (1975). Signal Detection Theory and ROC-Analysis, Academic Press.
Engelmann, B., Hayden, E., and Tasche, D. (2003). Testing rating accuracy, Risk, 16, 82-86.
Feelders, A. J. (2000). Credit scoring and reject inference with mixture models, International Journal of Intelligent System in Accounting, 8, 271-279.
Hand, D. J. (2001). Reject inference in credit operations, Handbook of Credit Scoring, 225-240.
Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29-36. https://doi.org/10.1148/radiology.143.1.7063747
Hanley, J. A. and McNeil, B. J. (1983). A method of comparing the areas under receiver operating characteristic curves derived from the same cases, Radiology, 148, 839-843. https://doi.org/10.1148/radiology.148.3.6878708
Heller, G., Seshan, V. E., Moskowitz, C. S., and Gonen, M. (2017). Inference for the difference in the area under the ROC curve derived from nested binary regression models, Biostatistics, 18, 260-274. https://doi.org/10.1093/biostatistics/kxw045
Hong, C. S. and Jung, M. H. (2011a). Undecided inference using bivariate probit models, Journal of the Korean Data and Information Science Society, 22, 1017-1028.
Hong, C. S. and Jung, M. S. (2011b). Undecided inference using logistic regression for credit evaluation. Journal of the Korean Data and Information Science Society, 22, 149-157.
Hong, C. S. and Won, C. H. (2016). Parameter estimation for the imbalanced credit scoring data using AUC maximization. The Korean Journal of Applied Statistics, 29, 309-319. https://doi.org/10.5351/KJAS.2016.29.2.309
Hong, C. S., Jeon, H. S., and Shin, H. S. (2019). Threshold interval for linear combination scores maximizing the partial AUC and VUS, The Korean Data and Information Science Society, 30, 759-770. https://doi.org/10.7465/jkdi.2019.30.4.759
Hong, C. S., Jung, E. S., and Jung, D. G. (2013). Standard criterion of VUS for ROC surface, The Korean Journal of Applied Statistics, 26, 977-985. https://doi.org/10.5351/KJAS.2013.26.6.977
Joseph, M. P. (2005). A PD validation framework for Basel II internal ratings-based systems, Credit Scoring and Credit Control IV.
Kim, H. Y. (2010). A comparison of the interval estimations for the difference in paired areas under the ROC curves, Communications for Statistical Applications and Methods, 17, 275-292. https://doi.org/10.5351/CKSS.2010.17.2.275
Li, C. R., Liao, C. T., and Liu, J. P. (2008). On the exact interval estimation for the difference in paired areas under the ROC curves, Statistics in Medicine, 27, 224-242. https://doi.org/10.1002/sim.2760
Metz, C. E. (1978). Basic principles of ROC analysis, In Seminars in Nuclear Medicine, 8, 283-298. https://doi.org/10.1016/S0001-2998(78)80014-2
Pepe, M. S., Cai, T., and Longton, G. (2006). Combining predictors for classification using the area under the receiver operating characteristic curve, Biometrics, 62, 221-229. https://doi.org/10.1111/j.1541-0420.2005.00420.x
Pepe, M. S., Kerr, K. F., Longton, G., and Wang, Z. (2013). Testing for improvement in prediction model performance, Statistics in Medicine, 32, 1467-1482. https://doi.org/10.1002/sim.5727
Pepe, M. S. and Thompson, M. L. (2000). Combining diagnostic test results to increase accuracy. Biostatistics, 1, 123-140. https://doi.org/10.1093/biostatistics/1.2.123
Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231. https://doi.org/10.1023/A:1007601015854
Su, J. Q. and Liu, J. S. (1993). Linear combinations of multiple diagnostic markers, Journal of the American Statistical Association, 88, 1350-1355. https://doi.org/10.1080/01621459.1993.10476417
Swets, J. A. (1988). Measuring the accuracy of diagnostic systems, Science, 240, 1285-1293. https://doi.org/10.1126/science.3287615
Vuk, M. and Curk, T. (2006). ROC curve, lift chart and calibration plot, Metodoloski Zvezki, 3, 89.
Yang, H., Lu, K., Lyu, X., and Hu, F. (2019). Two-way partial AUC and its properties, Statistical Methods in Medical Research, 28, 184-195. https://doi.org/10.1177/0962280217718866

응용통계연구 (The Korean Journal of Applied Statistics)

AUC 차이를 이용한 미결정자 추론방법

Undecided inference using the difference of AUCs

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)