DOI QR코드

DOI QR Code

Default Prediction for Real Estate Companies with Imbalanced Dataset

  • Dong, Yuan-Xiang (School of Economics and Business Administration, Chongqing University) ;
  • Xiao, Zhi (School of Economics and Business Administration, Chongqing University) ;
  • Xiao, Xue (Department of Real Estate, School of Design and Environment, National University of Singapore)
  • 투고 : 2013.04.17
  • 심사 : 2013.09.20
  • 발행 : 2014.06.30

초록

When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the two-class imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation approach to create new minority samples. The logistic regression, support vector machine (SVM) classification, and neural network (NN) classification use an imbalanced dataset. They were used as benchmarks with a single prediction model that used a balanced dataset corrected by the minority samples generation approach. Instead of using prediction-oriented tests and the overall accuracy, the true positive rate (TPR), the true negative rate (TNR), G-mean, and F-score are used to measure the performance of default prediction models for imbalanced dataset. In this paper, we describe an empirical experiment that used a sampling of 14 default and 315 non-default listed real estate companies in China and report that most results using single prediction models with a balanced dataset generated better results than an imbalanced dataset.

키워드

참고문헌

  1. A. Camara, I. Popova, and B. Simkins, "A comparative study of the probability of default for global financial firms," Journal of Banking & Finance, vol. 36, no. 3, pp. 717-732, 2012. https://doi.org/10.1016/j.jbankfin.2011.02.019
  2. P. Gharghori, H. Chan, and R. Faff, "Default risk and equity returns: Australian evidence," Pacific-Basin Finance Journal, vol. 17, no. 5, pp. 580-593, 2009. https://doi.org/10.1016/j.pacfin.2009.03.001
  3. M. Xu and C. Zhang, "Bankruptcy prediction: the case of Japanese listed companies," Review of Accounting Studies, vol. 14, no. 4, pp. 534-558, 2009. https://doi.org/10.1007/s11142-008-9080-5
  4. Z. Xiao, X. Yang, Y. Pang, and X. Dang, "The prediction for listed companies' financial distress by using multiple prediction methods with rough set and Dempster-Shafer evidence theory," Knowledge-Based Systems, vol. 26, pp. 196-206, 2012. https://doi.org/10.1016/j.knosys.2011.08.001
  5. S. Chava and R. A. Jarrow, "Bankruptcy prediction with industry effects," Review of Finance, vol. 8, no. 4, pp. 537-569, 2004.
  6. M. C. Gupta and R. J. Huefner, "Cluster analysis study of financial ratios and industry characteristics," Journal of Accounting Research, vol. 10, no. 1, pp. 77-95, 1972. https://doi.org/10.2307/2490219
  7. K. Patel and P. Vlamis, "An empirical estimation of default risk of the UK real estate companies," Journal of Real Estate Finance and Economics, vol. 32, no. 1, pp. 21-40, 2006. https://doi.org/10.1007/s11146-005-5176-x
  8. K. Patel and R. Pereira, "Expected default probabilities in structural models: Empirical evidence," Journal of Real Estate Finance and Economics, vol. 34, no. 1, pp. 107-133, 2007. https://doi.org/10.1007/s11146-007-9006-1
  9. H. Shen and Y. Jiang, "Logit model for pre-warning financial distress of listed real-estate companies in China," in Proceedings of the International Conference on Management and Service Science, Wuhan, China, 2010.
  10. Y. H. Kang and X. Li, "Research on financial distress prediction of China real estate public companies based on Z-Score model," in Proceedings of the 18th Annual International Conference on Management Science and Engineering, Rome, Italy, 2011, pp. 1166-1173.
  11. H. Frydman, E. I. Altman, and D. L. Kao, "Introducing recursive partitioning for financial classification: the case of financial distress," Journal of Finance, vol. 40, no. 1, pp. 269-291, 1985. https://doi.org/10.1111/j.1540-6261.1985.tb04949.x
  12. M. D. Odom and R. Sharda, "A neural network model for bankruptcy prediction," in Proceedings of the International Joint Conference on Neural Networks, San Diego, CA, 1990, pp. 163-168.
  13. B. S. Ahn, S. S. Cho, and C. Y. Kim, "The integrated methodology of rough set theory and artificial neural network for business failure prediction," Expert Systems with Applications, vol. 18, no. 2, pp. 65-74, 2000. https://doi.org/10.1016/S0957-4174(99)00053-6
  14. W. Hardle, Y. J. Lee, D. Schafer, and Y. R. Yeh, "Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies," Journal of Forecasting, vol. 28, no. 6, pp. 512-534, 2009. https://doi.org/10.1002/for.1109
  15. N. Japkowicz and S. Stephen, "The class imbalance problem: a systematic study," Intelligent Data Analysis, vol. 6, no. 5, pp. 429-449, 2002.
  16. W. H. Beaver, "Financial ratios as predictors of failure," Journal of Accounting Research, vol. 4, pp. 71-111, 1966. https://doi.org/10.2307/2490171
  17. E. I. Altman, "Financial ratios, discriminant analysis and prediction of corporate bankruptcy," Journal of Finance, vol. 23, no. 4, pp. 589-609, 1968.
  18. E. I. Altman and R. A. Eisenbeis, "Financial applications of discriminant-analysis: clarification," Journal of Financial and Quantitative Analysis, vol. 13, no. 1, pp. 185-195, 1978. https://doi.org/10.2307/2330534
  19. J. A. Ohlson, "Financial ratios and the probabilistic prediction of bankruptcy," Journal of Accounting Research, vol. 18, no. 1, pp. 109-131, 1980. https://doi.org/10.2307/2490395
  20. M. E. Zmijewski, "Methodological issues related to the estimation of financial distress prediction models," Journal of Accounting Research, vol. 22, pp. 59-82, 1984. https://doi.org/10.2307/2490859
  21. P. K. Coats and L. F. Fant, "Recognizing financial distress patterns using a neural-network tool," Financial Management, vol. 22, no. 3, pp. 142-155, 1993. https://doi.org/10.2307/3665934
  22. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
  23. R. Barandela, J. S. Sanchez, V. Garcia, and E. Rangel, "Strategies for learning in class imbalance problems," Pattern Recognition, vol. 36, no. 3, pp. 849-851, 2003. https://doi.org/10.1016/S0031-3203(02)00257-1
  24. J. P. Hwang, S. Park, and E. Kim, "A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function," Expert Systems with Applications, vol. 38, no. 7, pp. 8580-8585, 2011. https://doi.org/10.1016/j.eswa.2011.01.061
  25. N. Japkowicz, "The class imbalance problem: significance and strategies," in Proceedings of the International Conference on Artificial Intelligence, Las Vegas, NV, 2000, pp. 111-117.
  26. H. Li and J. Sun, "Forecasting business failure: The use of nearest-neighbour support vectors and correcting imbalanced samples: evidence from the Chinese hotel industry," Tourism Management, vol. 33, no. 3, pp. 622-634, 2012. https://doi.org/10.1016/j.tourman.2011.07.004
  27. P. Brockman and H. J. Turtle, "A barrier option framework for corporate security valuation," Journal of Financial Economics, vol. 67, no. 3, pp. 511-529, 2003. https://doi.org/10.1016/S0304-405X(02)00260-X
  28. S. A. Hillegeist, E. K. Keating, D. P. Cram, and K. G. Lundstedt, "Assessing the probability of bankruptcy," Review of Accounting Studies, vol. 9, no. 1, pp. 5-34, 2004. https://doi.org/10.1023/B:RAST.0000013627.90884.b7
  29. H. P. Tserng, G. F. Lin, L. K. Tsai, and P. C. Chen, "An enforced support vector machine model for construction contractor default prediction," Automation in Construction, vol. 20, no. 8, pp. 1242-1249, 2011. https://doi.org/10.1016/j.autcon.2011.05.007
  30. J. Sun and H. Li, "Listed companies' financial distress prediction based on weighted majority voting combination of multiple classifiers," Expert Systems with Applications, vol. 35, no. 3, pp. 818-827, 2008. https://doi.org/10.1016/j.eswa.2007.07.045
  31. J. Sun and H. Li, "Financial distress prediction based on serial combination of multiple classifiers," Expert Systems with Applications, vol. 36, no. 4, pp. 8659-8666, 2009. https://doi.org/10.1016/j.eswa.2008.10.002
  32. K. Y. Tam and M. Y. Kiang, "Managerial applications of neural networks: the case of bank failure predictions," Management Science, vol. 38, no. 7, pp. 926-947, 1992. https://doi.org/10.1287/mnsc.38.7.926
  33. A. P. Bradley, "The use of the area under the roc curve in the evaluation of machine learning algorithms," Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997. https://doi.org/10.1016/S0031-3203(96)00142-2
  34. N. V. Chawla, D. A. Cieslak, L. O. Hall, and A. Joshi, "Automatically countering imbalance and its empirical relationship to cost," Data Mining and Knowledge Discovery, vol. 17, no. 2, pp. 225-252, 2008. https://doi.org/10.1007/s10618-008-0087-0
  35. M. Gao, X. Hong, S. Chen, and C. J. Harris, "A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems," Neurocomputing, vol. 74, no. 17, pp. 3456-3466, 2011. https://doi.org/10.1016/j.neucom.2011.06.010