Browse > Article
http://dx.doi.org/10.5351/KJAS.2019.32.2.291

Comparison of evaluation measures for classification models on binary data  

Kim, Byungsoo (Department of Statistics, Inje University)
Kwon, Soyoung (Medical Device Policy Division, Ministry of Food and Drug Safety)
Publication Information
The Korean Journal of Applied Statistics / v.32, no.2, 2019 , pp. 291-300 More about this Journal
Abstract
This study investigates the characteristics of evaluation measures for classification models on a binary response variable in order to evaluate their suitability for use. Six measures are considered: Accuracy, Sensitivity, Specificity, Precision, F-measure, and the Heidke's skill score (HSS). Evaluation measures are reformulated using x(ratio of actually 1), y(ratio predicted by 1), z(ratio of both actual and predicted by 1) from the confusion matrix. We suggest two necessary conditions to assess the suitability of the evaluation measures. The first condition is that the measure function is constant for x and y in the case of a random model. The second condition is that the measure function is increasing for z and decreasing for x and y. Since only HSS satisfies the two conditions, that is always appropriate as an evaluation measure for the classification model on the binary response variable, and the other measures should be used within a limited range.
Keywords
binary variable; measure; classification; random; Heidke's skill score;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Kim, B., Bae, W., Seok, K., Cho, D., and Choi, K. (2018a). SAS EM 14.1 Data Mining Basis and Application, Kyowoo, 293-317.
2 Kim, H., Shin, D., Shin, W., and Hwang, C. (2018b). Rating Information-Aided Denoising AutoEncoder for effective collaborative filtering, The Journal of Korean Institute of Communications and Information Sciences, 43, 357-1367.
3 Kim, M., Kim, S., and Ock, C. (2015). A predictive model of problem drinking of workers using decision tree analysis, Journal of The Korean Society of Living Environmental System, 22, 460-468.   DOI
4 Kim, S. Y. (2016). The comparison of analytical models for risk factors of colonic adenomatous polyp (Master Thesis), Graduate School, Chung-Ang University.
5 Leem, Y. M. and Ryu, C. H. (2006). A comparison of data mining techniques for predicting model of industrial accidents. In Proceedings for the Spring Conference 2006, Society of Korea Industrial and Systems Engineering, 107-113.
6 Park, I., Kim, Y., Choi, Y., Kim, S., Kim, E., Won, S., and Kang, S. (2013). Development of advanced TB case classification model using NHI claims data, The Journal of Digital Policy & Management, 11, 289-299.
7 Sakong, J. H. (2012). A study on predicting stock price based on data mining techniques (Master Thesis), Graduate School, Inje University.
8 Sohn, K., Lee, J., Lee, S., and Ryu, C. (2005). Statistical models for prediction of heavy rain in Honam area, Asia-Pacific Journal of Atmospheric Sciences, 41, 897-907.
9 Sung, O. (2013). A empirical study on the relevance of technology finance supporting business for technologically innovative SMEs, Journal of Korea Technology Innovation Society, 16, 303-322.
10 Bekkar, M., Djemaa, H. K., and Altiouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets, Journal of Information Engineering and Applications, 3, 27-38.