Browse > Article
http://dx.doi.org/10.5370/JEET.2014.9.5.1739

Software Fault Prediction at Design Phase  

Singh, Pradeep (Dept. of Electronic and Telecommunication Engineering, National Institute of Technology)
Verma, Shrish (Indian Institute of Information Technology)
Vyas, O.P. (Dept. of Computer Sc. and Engineering, National Institute of Technology)
Publication Information
Journal of Electrical Engineering and Technology / v.9, no.5, 2014 , pp. 1739-1745 More about this Journal
Abstract
Prediction of fault-prone modules continues to attract researcher's interest due to its significant impact on software development cost. The most important goal of such techniques is to correctly identify the modules where faults are most likely to present in early phases of software development lifecycle. Various software metrics related to modules level fault data have been successfully used for prediction of fault-prone modules. Goal of this research is to predict the faulty modules at design phase using design metrics of modules and faults related to modules. We have analyzed the effect of pre-processing and different machine learning schemes on eleven projects from NASA Metrics Data Program which offers design metrics and its related faults. Using seven machine learning and four preprocessing techniques we confirmed that models built from design metrics are surprisingly good at fault proneness prediction. The result shows that we should choose Naïve Bayes or Voting feature intervals with discretization for different data sets as they outperformed out of 28 schemes. Naive Bayes and Voting feature intervals has performed AUC > 0.7 on average of eleven projects. Our proposed framework is effective and can predict an acceptable level of fault at design phases.
Keywords
Software metrics; Machine learning; Design metric; Fault prediction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Boetticher, G., Menzies, T., & Ostrand, T. J. (2007). "The PROMISE repository of empirical software engineering data "West Virginia University, Lane Department of Computer Science and Electrical Engineering.
2 http://www.cse.lehigh.edu/-gtan/bug/localCopies/nist Report.pdf
3 Barry Boehm, Software Engineering Economics, ${\copyright}$ 1981, p. 40. of Prentice Hall, Inc., Englewood Cliffs, NJ
4 www.promisedata.org
5 T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors", IEEE Trans. Software Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007   DOI   ScienceOn
6 S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Software Eng., vol. 34, no. 4, pp. 485-496, July/Aug. 2008   DOI   ScienceOn
7 www.cs.waikato.ac.nz/-ml/weka/.
8 Shull F, Basili V, Boehm B, Brown A, Costa P, Lindvall M, et al. "What we have learned about fighting defect". In: Proceedings of 8th international software metrics symposium, Ottawa, Canada; 2002. p. 249-58.
9 Menzies T, Raffo D, on Setamanit S, Hu Y, Tootoonian S. "Model-based tests of truisms". In: Proceedings of IEEE ASE 2002.
10 Do-178b and mccabe iq. available in http://www.mccabe.com/iq_research_whitepapers.htm.
11 Fayyad, U.M., and Irani, K.B. (1993), "Multi Interval discretization of continuous-valued attributes for classification learning", in Proceeding of the 13th International Joint Conference on Artificial Intelli- gence, 1022-1027, Morgan Kauffmann
12 Quinlan, R.J., "C4.5: Programs for Machine Learning", Morgan Kaufman, 1993
13 Lee, S., "Noisy Replication in Skewed Binary Classification, Computational Statistics and Data Analysis," 34, 2000.
14 Kolcz, A. Chowdhury, and J. Alspector, Data duplication: "An imbalance problem"In Workshop on Learning from Imbalanced Data Sets" (ICML), 2003.
15 Niels Landwehr, Mark Hall, and Eibe Frank. "Logistic model trees". Machine Learning, 59(1-2):161-205, 2005.   DOI
16 Shatovskaya, T., Repka, V., & Good, A. (2006). "Application of the Bayesian Networks in the informational modeling". International conference: Modern problems of radio engineering, telecommunications, and computer science, international conference (p. 108). Lviv-Slavsko, Ukraine.
17 Singh, P.; Verma, S., "Empirical investigation of fault prediction capability of object oriented metrics of open source software," Computer Science and Software Engineering (JCSSE), 2012 International Joint Conference on, vol., no., pp. 323, 327, May 30 2012-June 1 2012
18 P. Singh and S. Verma, "An Investigation of the Effect of Discretization on Defect Prediction Using Static Measures", IEEE International Conference on Advances in Computing, Control, and Telecommunication Technologies (2009), pp. 837-839
19 P. Singh and S. Verma, "Effectiveness analysis of consistency based feature selection in Software fault Prediction", International Journal of Advancements in Computer Science & Information Technology, vol.02, no.1, pp. 01-09, 2012
20 Koru, A. G., & Liu, H. (2007). "Identifying and characterizing change-prone classes in two largescale open-source products". Journal of Systems and Software, 80(1), 63-73.   DOI   ScienceOn
21 Hall, Mark A Holmes, Geoffrey "Benchmarking Attribute Selection Techniques for Discrete Class Data Mining" IEEE Transactions on Software Engineering, 2003