Browse > Article
http://dx.doi.org/10.5351/KJAS.2010.23.1.013

Optimal Thresholds from Mixture Distributions  

Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University)
Joo, Jae-Seon (Statistics and Panel Center, Korean Women's Development Institute)
Choi, Jin-Soo (Research Institute of Applied Statistics, Sungkyunkwan University)
Publication Information
The Korean Journal of Applied Statistics / v.23, no.1, 2010 , pp. 13-28 More about this Journal
Abstract
Assuming a mixture distribution for credit evaluation studies, we discuss estimating threshold methods to minimize errors that default borrowers are predicted as non defaults or non defaults are regarded as defaults. A method by using statistical hypotheses tests, the most powerful test and generalized likelihood ratio test, for the probability density functions which are defined with the score random variable and the parameter space consisted of only two elements such as the default and non default states is proposed to estimate a threshold. And anther optimal thresholds to maximize classification accuracy measures of the accuracy and the true rate for ROC and CAP curves are estimated as equations related with these probability density functions. Three kinds of optimal thresholds in terms of the hypotheses testing, the accuracy and the true rate are obtained from normal random samples with various means and variances. The sums of the type I and type II errors corresponding to each optimal threshold are obtained and compared. Finally we discuss about their efficiency and derive conclusions.
Keywords
Accuracy; CAP; default; discriminatory; error; likelihood ratio; most powerful; ROC; score; threshold; true rate;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Fawcett, T. (2003). ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, HP Laboratories, 1501 page Mill Road, Palo Alto, CA 94304.
2 Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classifcaiton and Prediction, University Press, Oxford.
3 Provost, F. and Fawcett, T. (1997). Analysis and visualization of classifier performance comparison under imprecise class and cost distributions, In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo park, CA, 43-48.
4 Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231.   DOI
5 Sobehart, J. R. and Keenan, S. C. (2001). Measuring default accurately, credit risk special report, Risk, 14, 31-33.
6 Sobehart, J. R., Keenan, S. C. and Stein, R. M. (2000). Benchmarking quantitative default risk models: A validation methodology, Moodys Investors Service.
7 Stein, R. M. (2005). The relationship between default prediction and lending profits: Integrating ROC analysis and loan pricing, Journal of Banking and Finance, 29, 1213-1236.   DOI   ScienceOn
8 Swets, J. A. (1988). Measuring the accuracy of diagnostic systems, American Association for the Advancement of Science, 240, 1285-1293.   DOI
9 Hanley, A. and McNeil, B. (1982). The meaning and use of the area under a receiver operating characteristics curve, Diagnostic Radiology, 143, 29-36.
10 Tasche, D. (2006). Validation of internal rating systems and PD estimates, arXiv.org, eprint arXiv: physics/0606071.
11 Tasche, D. (2009). Estimating discriminatory power and PD curves when the number of defaults is small, arXiv.org, eprint arXiv:0905.3928v1.
12 Vuk, M. and Curk, T. (2006). ROC curve, lift chart and calibration plot, Metodoloki Zvezki, 3, 89-108.
13 Zou, K. H. (2002). Receiver Operating Characteristic Literature Research, On-line bibliography available from: http://www.spl.harvard.edu/pages/ppl/zou/roc.html.
14 Engelmann, B., Hayden, E. and Tasche, D. (2003). Measuring the discriminative power of rating systems, Discussion paper, Series 2: Banking and Financial Supervision.
15 홍종선, 최진수 (2009). ROC와 CAP 곡선에서의 최적 분류점, <응용통계연구>, 22, 911-922.   DOI
16 Berry, M. J. A. and Linoff, G. (1999). Data Mining Techniques: For Marketing, Sales, and Customer Support, Morgan Kaufmann Publishers.
17 Drummond, C. and Holte, R. C. (2006). Cost curves: An improved method for visualizing classifier performance, Machine Learning, 65, 95-130.   DOI