DOI QR코드

DOI QR Code

Software Quality Classification using Bayesian Classifier

베이지안 분류기를 이용한 소프트웨어 품질 분류

  • Received : 2012.01.27
  • Accepted : 2012.03.13
  • Published : 2012.03.31

Abstract

Many metric-based classification models have been proposed to predict fault-proneness of software module. This paper presents two prediction models using Bayesian classifier which is one of the most popular modern classification algorithms. Bayesian model based on Bayesian probability theory can be a promising technique for software quality prediction. This is due to the ability to represent uncertainty using probabilities and the ability to partly incorporate expert's knowledge into training data. The two models, Na$\ddot{i}$veBayes(NB) and Bayesian Belief Network(BBN), are constructed and dimensionality reduction of training data and test data are performed before model evaluation. Prediction accuracy of the model is evaluated using two prediction error measures, Type I error and Type II error, and compared with well-known prediction models, backpropagation neural network model and support vector machine model. The results show that the prediction performance of BBN model is slightly better than that of NB. For the data set with ambiguity, although the BBN model's prediction accuracy is not as good as the compared models, it achieves better performance than the compared models for the data set without ambiguity.

Keywords

References

  1. Ebert, C., "Fuzzy classification for software criticality analysis : Expert Systems with Applications, Vol.11, No.3(1996), pp.323-342. https://doi.org/10.1016/S0957-4174(96)00048-6
  2. Catal, C., "Software fault prediction : A literature review and current trends", Expert Systems with Applications, Vol.38, No.4(2011), pp.4626-4636. https://doi.org/10.1016/j.eswa.2010.10.024
  3. Menzies, T., J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors", IEEE Trans Software Engineering, Vol.33, No.1(2007), pp.2-13. https://doi.org/10.1109/TSE.2007.256941
  4. 홍의석, "훈련 데이터집합을 사용하지 않는 소프트웨어 품질예측 모델," 정보처리학회논문지, 제10-D권, 제4호(2003), pp.689-696.
  5. 홍의석, "Support Vector Machine을 이용한 초기 소프트웨어 품질 예측," 한국IT서비스학회지, 제10권, 제2호(2011), pp.235-245.
  6. Elish, K. O. and M. O. Elish, "Predicting defect prone software modules using support vector machines", J. Systems Software, Vol. 81, No.5(2008), pp.649-660. https://doi.org/10.1016/j.jss.2007.07.040
  7. 홍의석, "소프트웨어 품질 예측 모델을 위한 분류 프레임워크," 한국콘텐츠학회논문지, 제10 권, 제6호(2010), pp.134-143. https://doi.org/10.5392/JKCA.2010.10.6.134
  8. Catal, C. and B. Diri, "A systematic review of software fault prediction studies", Expert Systems with Applications, Vol.36, No.4(2009), pp.7346-7354. https://doi.org/10.1016/j.eswa.2008.10.027
  9. Zhong, S., T. M. Khoshgoftaar, and N. Seliya, "Analyzing Software Measurement Data with Clustering Techniques", IEEE Intelligent Systems, Vol.19, No.2(2004), pp.20-27.
  10. Seliya N. and T. M. Khoshgoftaar, "Software quality analysis of unlabeled program modules with semisupervised clustering", IEEE Trans. Systems, Man and Cybernetics, Vol.37, No.2(2007), pp.201-211. https://doi.org/10.1109/TSMCA.2006.889473
  11. Seliya, N. and T. M. Khoshgoftaar, "Software quality estimation with limited fault data : A semi supervised learning perspective", Software Quality Journal, Vol.15, No.3 (2007), pp.327-344. https://doi.org/10.1007/s11219-007-9013-8
  12. Catal, C. and B. Diri, "Unlabeled Extra Data do not Always Mean Extra Performance for Semi-Supervised Fault Prediction", Expert Systems, Vol.26, No.5(2009), pp.458-471. https://doi.org/10.1111/j.1468-0394.2009.00509.x
  13. Menzies, T., J. DiStefano, A. Orrego, and R. Chapman, "Assessing predictors of software defects", Proc. workshop on Predictive software models, 2004.
  14. Pai, G. J. and J. B. Dugan, "Empirical analysis of software fault content and fault proneness using Bayesian methods", IEEE Trans. Software Engineering, Vol.33, No.10 (2007), pp.675-686. https://doi.org/10.1109/TSE.2007.70722
  15. Turhan, B. and A. Bener, "Analysis of Naive Bayes' assumptions on software fault data : An empirical study", Data and Knowledge Engineering, Vol.68, No.2(2009), pp. 278-290. https://doi.org/10.1016/j.datak.2008.10.005
  16. Cooper, G. F. and E. Herskovits, "A Bayesian method for the induction of probabilistic networks from data", Machine Learning, Vol.9, No.4(1992), pp.309-347.