Browse > Article
http://dx.doi.org/10.3745/JIPS.2012.8.2.241

Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality  

Malhotra, Ruchika (Dept. of Software Engineering, Delhi Technological University)
Jain, Ankita (Dept. of Computer Engineering, Delhi Technological University)
Publication Information
Journal of Information Processing Systems / v.8, no.2, 2012 , pp. 241-262 More about this Journal
Abstract
An understanding of quality attributes is relevant for the software organization to deliver high software reliability. An empirical assessment of metrics to predict the quality attributes is essential in order to gain insight about the quality of software in the early phases of software development and to ensure corrective actions. In this paper, we predict a model to estimate fault proneness using Object Oriented CK metrics and QMOOD metrics. We apply one statistical method and six machine learning methods to predict the models. The proposed models are validated using dataset collected from Open Source software. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the model predicted using the random forest and bagging methods outperformed all the other models. Hence, based on these results it is reasonable to claim that quality models have a significant relevance with Object Oriented metrics and that machine learning methods have a comparable performance with statistical methods.
Keywords
Empirical Validation; Object Oriented; Receiver Operating Characteristics; Statistical Methods; Machine Learning; Fault Prediction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K.E. Emam and W. Melo, "The Prediction of Faulty Classes Using Object-Oriented Design Metrics," Technical report: NRC 43609, 1999.
2 M.H. Tang, M.H. Kao, and M.H. Chen , "An empirical study on object-oriented metrics," In Proceedings of Metrics, 242-249.
3 L. Briand, J. Wuest, S. Ikonomovski, and H. Lounis, "A comprehensive Investigation of Quality Factors in Object-Oriented Designs: An Industrial Case Study," International Software Engineering Research Network, technical report ISERN-98-29, 1998.
4 K. El Emam, S. Benlarbi, N. Goel, and S. Rai, "The confounding effect of class size on the validity of objectoriented metrics," IEEE Transactions on Software Engineering, Vol.27, No.7, 2001, pp.630-650.   DOI   ScienceOn
5 L. Briand, J. Wust, J and H. Lounis, "Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs," Empirical Software Engineering. International Journal (Toronto,Ont.), Vol.6, No.1, 2001, pp.11-58.   DOI   ScienceOn
6 P. Yu, T. Systa, and H. Muller, "Predicting fault-proneness using OO metrics: An industrial case study," In Proceedings of Sixth European Conference on Software Maintenance and Reengineering, Budapest, Hungary, 2002, pp.99-107.
7 Y. Zhou, and H. Leung, H, "Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults," IEEE Transactions on Software Engineering, Vol.32, No.10, 2006, pp.771-789.   DOI   ScienceOn
8 S. Chidamber and C. Kemerer, "A Metrics Suite for Object-Oriented Design," IEEE Trans. Soft Ware Eng., Vol.20, No.6, 1994, pp.476-493.   DOI   ScienceOn
9 L.Briand, P. Devanbu, W. Melo, "An investigation into coupling Measures for C++," In Proceedings of the 19th International Conference on Software Engineering.
10 J. Bansiya and C. Davis, "A Hierarchical Model for Object-Oriented Design Quality Assessment," IEEE Trans. Software Eng., Vol.28, No.1, 2002, pp.4-17.   DOI   ScienceOn
11 F. Brito e Abreu and W. Melo, "Evaluating the Impact of Object-Oriented Design on Software Quality," Proceedings Third Int'l Software Metrics Symposium, 1996, pp.90-99.
12 M.Lorenz and J. Kidd, "Object-Oriented Software Metrics," Prentice-Hall, 1994.
13 W. Li and W. Henry, "Object-Poiented Metrics that Predict Maintainability," In Journal of Software and Sytems, 1993, Vol.23, pp.111-122.   DOI   ScienceOn
14 M.Cartwright and M. Shepperd, "An empirical investigation of an object-oriented software system," IEEE Transactions on Software Engineering, Vol.26, No.8,1999, pp.786-796.
15 T.Gyimothy, R. Ferenc, and I.Siket, "Empirical validation of object-oriented metrics on open source software for fault prediction," IEEE Transactions on Software Engineering, Vol.31, No.10, 2005, pp.897-910.   DOI   ScienceOn
16 S. Kanmani, V.R. Uthariaraj, V. Sankaranarayanan, P. Thambidurai, "Object-oriented software prediction using neural networks," Information and Software Technology, Vol.49, 2007, pp.482-492.
17 I. Gondra, "Applying machine learning to software fault-proneness prediction," The Journal of Systems and Software," Vol.81, 2008, pp.186-195.   DOI   ScienceOn
18 Promise. http://promisedata.org/repository/.
19 K. El Emam, S. Benlarbi, N. Goel, and S. Rai, "A validation of object-oriented metrics," NRC Technical report ERB-1063,1999.
20 L. Briand, W. Daly and J. Wust, "Exploring the relationships between design measures and software quality," Journal of Systems and Software, Vol.51, No.3, 2000, pp.245-273.   DOI   ScienceOn
21 G. Pai, "Empirical analysis of software fault content and fault proneness using Bayesian methods," IEEE Transactions on Software Eng., Vol.33,No.10,2007, pp.675-686.   DOI   ScienceOn
22 K. K. Aggarwal, Y. Singh, A. Kaur, and R. Malhotra, "Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: A replicated case study," Software Process: Improvement and Practice, Vol.16,No.1,2009,pp.39-62.
23 Y. Singh, A. Kaur, and R. Malhotra, "Empirical vlidation of object-oriented metrics for predicting fault proneness models," Software Quality Journal, Vol.18,No.1, 2010,pp.3-35.   DOI
24 R. Malhotra and A.Jain, "Software Effort Prediction using Statistical and Machine Learning Me thod," International Journal of Advanced Computer Science and Applications , Vol.2, No.1, 2011.
25 Weka. Available: http://www.cs.waikato.ac.nz/ml/weka/
26 Y. Freund and R.E. Schapire, "A Short Introduction to Boosting," Journal of Japanese Society for Artificial Intelligence, Vol.14, No.5, 1999, pp.771-780.
27 L.Breiman, "Bagging predictors," Machine Learning, Vol.24, 1996, pp.123-140.
28 M.Stone, "Cross-validatory choice and assessment of statistical predictions," Journal Royal Stat. Soc., Vol.36, 1974, pp.111-147.
29 H.Olague, L. Etzkorn, S. Gholston, and S.Quattlebaum, "Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes," IEEE Transactions on Software Engineering, Vol.33, No.8,2007, pp.402-419.   DOI   ScienceOn
30 M.English, C.Exton, I.Rigon and B.Clearyp, "Fault Detection and Prediction in an open source Software project," Proceeding: PROMISE '09 Proceedings of the 5th International conference on Predictor Models in Software Engineering.
31 Y.Zhou, B. Xu and H. Leung, "On the ability of complexity metrics to predict fault-prone classes in object - oriented systems," The journal of Systems and Software, Vol.83, 2010,pp.660-674.   DOI   ScienceOn
32 R. Burrows, F.C. Ferrari, O.A.L. Lemos, A. Garcia and F. Taiani, "The impact of Coupling on the fault- Proneness of Aspect-oriented Programs:An Empirical Study," IEEE 21st Internati onal Symposium on Software Reliability Engineering, 2010.
33 N. Fenton and N. Ohlsson, "Quantitative analysis of faults and failures in a complex software system," IEEE Transactions on Software Engineering, Vol.26, No.8, 2000, pp.797-814.   DOI   ScienceOn
34 R. Shatnawi and W. Li, "The effectiveness of software metrics in identifying error-prone classes in post release software evolution process," The Journal of Systems and Software,Vol.81, 2008,pp.1868-1882.   DOI   ScienceOn
35 R. Malhotra and Y. Singh, "On the Applicability of Machine Learning Techniques for ObjectOriented Software Fault Prediction," Software Engineering: An International Journal, Vol.1,No.1, 2011, pp.24-37.
36 C.M. Bishop, "Neural Networks for Pattern Recognition," Oxford, U.K. : Claredon Press, 1995.
37 ckjm download : http://www.Spinellis.gr/sw/ckjm/
38 V. Basili, L. Briand and W.Melo, "A validation of object-oriented design metrics as quality Indicators," IEEE Transactions on Software Engineering, Vol.22, No.10,1996, pp.751-761.   DOI   ScienceOn
39 D. Hosmer and S. Lemeshow, Applied logistic regression. New York: Wiley,1989.
40 J.R. Quinlan, C4.5 : Programs for Machine Learning. Morgan Kaufmann, 1993.
41 A. Porter and R. Selly, "Empirically guided Software Devlopment using Metric-Based Classification Trees," IEEE Software, Vol.7, No.2, 1990, pp.46-54.
42 F. Xing, P. Gua, and M.R. Lyu, "A novel method for early software quality prediction based on support vector machine," In: Proceedings of IEEE International Conference on Software Reliability Engineering, 2005, pp.213-222.
43 Y. Freund, R. Schapire, "Experiments with a new boosting algorithm," In: Thirteenth International Conference on Machine Learning, San Francisco, 1996, pp.148-156.
44 J. Friedman, T. Hastie, and R. Tibshirani, "Additive Logistic Regression: a Statistical View of Boosting," Stanford University.
45 C. Catal and B. Diri, "A systematic review of software fault prediction studies," Expert Systems with Applications Vol.36, 2009, pp 7346-7354.   DOI   ScienceOn
46 N. Ohlsson, M. Zhao and M. Helander, M, "Application of multivariate analysis for soft ware fault prediction," Software Quality Journal, Vol.7, 1998,pp.51-66.
47 T.M. Khoshgoftaar, E.B. Allen, K.S. Kalaichelvan and N. Goel, "Early quality prediction: a case study in telecommunications," IEEE Software, Vol.13, No.1, 1996, pp.65-71.