Undecided inference using bivariate probit models

이변량 프로빗모형을 이용한 미결정자 추론

  • Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University) ;
  • Jung, Mi-Yang (Research Institute of Applied Statistics, Sungkyunkwan University)
  • 홍종선 (성균관대학교 경제학부 통계학과) ;
  • 정미향 (성균관대학교 응용통계연구소, 통계학과 대학원)
  • Received : 2011.09.14
  • Accepted : 2011.10.23
  • Published : 2011.12.01

Abstract

When it is not easy to decide the credit scoring for some loan applicants, credit evaluation is postponded and reserve to ask a specialist for further evaluation of undecided applicants. This undecided inference is one of problems that happen to most statistical models including the biostatistics and sportal statistics as well as credit evaluation area. In this work, the undecided inference is regarded as a missing data mechanism under the assumption of MNAR, and use the bivariate probit model which is one of sample selection models. Two undecided inference methods are proposed: one is to make use of characteristic variables to represent the state for decided applicants, and the other is that more accurate and additional informations are collected and apply these new variables. With an illustrated example, misclassification error rates for undecided and overall applicants are obtainded and compared according to various characteristic variables, undecided intervals, and thresholds. It is found that misclassification error rates could be reduced when the undecided interval is increased and more accurate information is put to model, since more accurate situation of decided applications are reflected in the bivariate probit model.

신용평가를 판단하기 어렵기 때문에 평가를 유보하고 특별한 전문가에게 재심사를 의뢰하기 위하여 결정이 보류된 미결정자에 대한 미결정자 추론은 신용평가 분야 이외에도 의학통계와 스포츠통계등 대부분의 통계적 모형에서 발생하는 문제이다. 본 연구에서는 미결정자 추론을 비임의결측 가정하에서의 결측자료 유형으로 간주하고, 표본선택모형 중의 하나인 이변량 프로빗모형을 이용한다. 결정된 차주의 특성을 나타내는 확률변수를 사용하여 미결정자를 추론하는 방법과 보다 정확한 정보를 수집한 후 추가적인 확률변수를 사용하여 추론하는 방법을 제안한다. 실증예제를 통하여 특성변수의 조합과 다양한 미결정 구간, 그리고 절단점의 변동에 따라 미결정자와 전체 오분류율을 비교한다. 미결정구간을 확대하거나 정확한 신용정보를 모형에 추가하여 사용하면 정상 집단과 부도 집단의 정보를 더욱 정확하게 반영할 수 있기 때문에 미결정자와 전체 오분류율의 큰 감소효과를 기대할 수 있다.

Keywords

References

  1. 홍종선, 정민섭 (2011). 신용평가에서 로지스틱회귀를 이용한 미결정자 추론. <한국데이터정보과학회지>, 22, 149-157.
  2. 홍종선, 권태완 (2010). 수익률 분포의 적합과 리스크값 추정. <한국데이터정보과학회지>, 21, 219-229.
  3. 홍종선, 최진수 (2009). ROC와 CAP 곡선에서의 최적분류점. <응용통계연구>, 22, 911-921.
  4. Ananda, B. W. (2010). Receiver operating characteristic curves for measuring the quality of decisions incricket. Journal of Quantitative Analysis in Sports, 6, 1-13.
  5. Feelders, A. J. (2000a). An overview of model based reject inference for credit scoring, Utrecht University, Institute for Information and Computing Sciences.
  6. Feelders, A. J. (2000b). Credit scoring and reject inference with mixture models. International Journal of Intelligent System in Accounting, 8, 271-279.
  7. Greene, W. H. (1996). Marginal effects in the bivariate probit model, NYU Working Paper No. EC-96-11.
  8. Hand, D. J. (2001). Reject inference in credit operations. Handbook of Credit Scoring, 225-240.
  9. Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153-161. https://doi.org/10.2307/1912352
  10. Kim, H. J. (2002). Analysis of incomplete data with nonignorable missing values. Journal of the Korean Data & Information Science Society, 13, 167-174.
  11. Kim, K. S. and Lee, C. S. (2003). A study of data mining optimization model for the credit evaluation. Journal of the Korean Data & Information Science Society, 14, 825-836.
  12. Meng, C. and James R. V. (2002). A statistical model of bilateral cooperation. Political Analysis, 10, 101-112. https://doi.org/10.1093/pan/10.2.101
  13. Pepe, M. S. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results. Biometrics, 54, 124-135. https://doi.org/10.2307/2534001
  14. Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction, University Press, Oxford.
  15. Poirier, D. J. (1980). Partial observability in bivariate probit models. Journal of Econometrics, 12, 210-217.
  16. Sartori, A. (2003) An estimator for some binary-outcome selection models without exclusion restrictions. Ploitical Analysis, 11, 111-138. https://doi.org/10.1093/pan/mpg001