[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/JIPS.04.0077

Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

Malhotra, Ruchika (Dept. of Computer Science and Engineering, Delhi Technological University)
Sharma, Anjali (Dept. of Computer Science and Engineering, Delhi Technological University)

Publication Information

Journal of Information Processing Systems / v.14, no.3, 2018 , pp. 751-770 More about this Journal

Abstract

Web applications are indispensable in the software industry and continuously evolve either meeting a newer criteria and/or including new functionalities. However, despite assuring quality via testing, what hinders a straightforward development is the presence of defects. Several factors contribute to defects and are often minimized at high expense in terms of man-hours. Thus, detection of fault proneness in early phases of software development is important. Therefore, a fault prediction model for identifying fault-prone classes in a web application is highly desired. In this work, we compare 14 machine learning techniques to analyse the relationship between object oriented metrics and fault prediction in web applications. The study is carried out using various releases of Apache Click and Apache Rave datasets. En-route to the predictive analysis, the input basis set for each release is first optimized using filter based correlation feature selection (CFS) method. It is found that the LCOM3, WMC, NPM and DAM metrics are the most significant predictors. The statistical analysis of these metrics also finds good conformity with the CFS evaluation and affirms the role of these metrics in the defect prediction of web applications. The overall predictive ability of different fault prediction models is first ranked using Friedman technique and then statistically compared using Nemenyi post-hoc analysis. The results not only upholds the predictive capability of machine learning models for faulty classes using web applications, but also finds that ensemble algorithms are most appropriate for defect prediction in Apache datasets. Further, we also derive a consensus between the metrics selected by the CFS technique and the statistical analysis of the datasets.

Keywords

Empirical Validation; Fault prediction; Machine Learning; Object-Oriented Metrics; Web Application Quality;

Citations & Related Records

Reference

1	T. Gyimothy, R. Ferenc, and I. Siket, "Empirical validation of object-oriented metrics on open source software for fault prediction," IEEE Transactions on Software Engineering, vol. 31, no. 10, pp. 897-910, 2005. DOI
2	T. Menzies, J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors," IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 2-13, 2007. DOI
3	J. Demsar, "Statistical comparisons of classifiers over multiple data sets," Journal of Machine Learning Research, vol. 7, pp. 1-30, 2006.
4	M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: a benchmark and an extensive comparison," Empirical Software Engineering, vol. 17, no. 4-5, pp. 531-577, 2012. DOI
5	J. Bansiya, and C. G. Davis, "A hierarchical model for object-oriented design quality assessment," IEEE Transactions on Software Engineering, vol. 28, no. 1, pp. 4-17, 2002. DOI
6	S. R. Chidamber and C. F. Kemerer "A metrics suite for object oriented design," IEEE Transactions on Software Engineering, vol. 20, no. 6, pp. 476-493, 1994. DOI
7	T. Fawcett, "ROC graphs: notes and practical considerations for researchers," Machine Learning, vol. 31, no. 1, pp. 1-38, 2004.
8	M. A. Hall, "Correlation-based feature selection for machine learning," Ph.D dissertation, Department of Computer Science, The Waikato University, Hamilton, New Zealand, 1998.
9	M. A. Hall and L. A. Smith, "Practical feature subset selection for machine learning," in Proceedings of the 21st Australasian Computer Science Conference, Perth, Australia, 1998, pp. 181-191.
10	M. A. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009. DOI
11	E. Arisholm, L. C. Briand, and E. B. Johannessen, "A systematic and comprehensive investigation of methods to build and evaluate fault prediction models," Journal of System and Software, vol. 83, no. 1, pp. 2-17, 2010. DOI
12	T. J. McCabe, "A complexity measure," IEEE Transactions on Software Engineering, vol. 2, no. 4, pp. 308-320, 1976.
13	A. H. Watson, D. R. Wallace, and T. J. McCabe, Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric. Gaithersburg, MD: National Institute of Standards and Technology, 1996.
14	M. H. Halstead, Elements of Software Science. New York, NY: North-Holland, 1977.
15	C. Catal and B. Diri, "A systematic review of software fault prediction studies," Expert Systems Applications, vol. 36, no. 4, pp. 7346-7354, 2009. DOI
16	J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proceedings of 4th IEEE International Conference on Neural Networks, Perth, Australia, 1995, pp. 1942-1948.
17	Y. Shi and R. C. Eberhart, "A modified particle swarm optimizer," in Proceedings of IEEE International Conference on Evolutionary Computation, Anchorage, AK, 1998, pp. 69-73.
18	F. Wilcoxon, "Individual comparisons by ranking methods," Biometrics Bulletin, vol. 1, no. 6, pp. 80-83, 1945. DOI
19	Y. Singh, A. Kaur, and R. Malhotra, "Application of support vector machine to predict fault prone classes," ACM SIGSOFT Software Engineering Notes, vol. 34, no. 1, pp. 1-6, 2009.
20	H. M. Olague, L. H. Etzkorn, S. Gholston, and S. Quattlebaum, "Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes," IEEE Transactions on software Engineering, vol. 33, no. 6, pp. 402-419, 2007. DOI
21	G. J. Pai and J. B. Dugan, "Empirical analysis of software fault content and fault proneness using Bayesian methods," IEEE Transactions on Software Engineering, vol. 33, no. 10, pp. 675-686, 2007. DOI
22	S. Kanmani, V. R. Uthariaraj, V. Sankaranarayanan, and P. Thambidurai, "Object-oriented software fault prediction using neural networks," Information and Software Technology, vol. 49, no. 5, pp. 483-492, 2007. DOI
23	Y. Zhou, B. Xu, and H. Leung, "On the ability of complexity metrics to predict fault-prone classes in object oriented systems," Journal of Systems and Software, vol. 83, no. 4, pp. 660-674, 2010. DOI
24	D. Azar and J. Vybihal, "An ant colony optimization algorithm to improve software quality prediction models: case of class stability," Information and Software Technology, vol. 53, no. 4, pp. 388-393, 2011. DOI
25	S. Di Martino, F. Ferrucci, C. Gravino, and F. Sarro, "A genetic algorithm to configure support vector machines for predicting fault-prone components," in Product-Focused Software Process Improvement. Heidelberg: Springer, 2011, pp. 247-261.
26	A. Okutan and O. T. Yildiz, "Software defect prediction using Bayesian networks," Empirical Software Engineering, vol. 19, no. 1, pp. 154-181, 2014. DOI
27	Y. Zhou and H. Leung, "Empirical analysis of object oriented design metrics for predicting high severity faults," IEEE Transactions on Software Engineering, vol. 32, no. 10, pp. 771-784, 2006. DOI
28	E. Patrick and F. Fisher, "Nonparametric feature selection," IEEE Transactions in Information Theory, vol. 15, no. 4, pp. 577-584, 1969. DOI
29	A. Bhattacharyya, "On a measure of divergence between two statistical populations defined by their probability distributions," Bulletin of the Calcutta Mathematical Society, vol. 35, pp. 99-109, 1943.
30	H. Chernoff, "A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations," The Annals of Mathematical Statistics, vol. 23, no. 4, pp. 493-507, 1952. DOI
31	Z. Xu, J. Liu, Z. Yang, G. An, and X. Jia, "The impact of feature selection on defect prediction performance: an empirical comparison," in Proceedings of the 27th International Symposium on Software Reliability Engineering, Ottawa, Canada, 2016, pp. 309-320.
32	A. B. de Carvalho, A. Pozo, S. Vergilio, and A. Lenz, "Predicting fault proneness of classes through a multiobjective particle swarm optimization algorithm," in Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence, Dayton, OH, 2008, pp. 387-394.
33	S. C. Misra and V. C. Bhavsar, "Relationships between selected software measures and latent bug-density: guidelines for improving quality," in Computational Science and Its Applications. Heidelberg: Springer, 2003, pp. 724-732.
34	D. Glasberg, K. El Emam, W. Melo, and N. Madhavji, Validating Object-Oriented Design Metrics on a Commercial Java Application. Ottawa, Canada: National Research Council of Canada, 2000.
35	Q. Song, J. Ni, and G. Wang, "A fast clustering-based feature subset selection algorithm for high-dimensional data," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 1-14, 2013. DOI
36	V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, J. M. Benitez, and F. Herrera, "A review of microarray datasets and applied feature selection methods," Information Sciences, vol. 282, pp. 111-135, 2014. DOI
37	D. Liu, D. W. Sun, and X. A. Zeng, "Recent advances in wavelength selection techniques for hyperspectral image processing in the food industry," Food and Bioprocess Technology, vol. 7, no. 2, pp. 307-323, 2014. DOI
38	C. Catal, B. Diri, and B. Ozumut, "An artificial immune system approach for fault prediction in object-oriented Software," in Proceedings of the 2nd International Conference on Dependability of Computer System, Szklarska, Poland, 2007, pp. 238-245.
39	B. Henderson-Sellers, Object-Oriented Metrics: Measures of Complexity. Upper Saddle River, NJ: Prentice-Hall, 1996.
40	K. Dejaeger, T. Verbraken, and B. Baesens, "Toward comprehensible software fault prediction models using Bayesian network classifiers," IEEE Transactions on Software Engineering, vol. 39, no. 2, pp. 237-257, 2013. DOI
41	P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: Prentice-Hall, 1982.
42	D. Bowes, T. Hall, M. Harman, Y. Jia, F. Sarro, and F. Wu, "Mutation-aware fault prediction," in Proceedings of the 25th International Symposium on Software Testing and Analysis, Saarbrucken, Germany, 2016, pp. 330-341.
43	R. Malhotra and R. Raje, "An empirical comparison of machine learning techniques for software defect prediction," in Proceedings of the 8th International Conference on Bioinspired Information and Communications Technologies, Boston, MA, 2014, pp. 320-327.
44	R. Malhotra, N. Pritam, K. Nagpal, and P. Upmanyu, "Defect collection and reporting system for Git based open source software," in Proceedings of the International Conference on Data Mining and Intelligent Computing, New Delhi, India, 2014, pp. 1-7.
45	A. Arcuri and G. Fraser, "On parameter tuning in search based software engineering," in Search Based Software Engineering. Heidelberg: Springer, 2011, pp. 33-47.
46	M. Stone, "Cross-validatory choice and assessment of statistical predictions," Journal of the Royal Statistical Society Series B (Methodological), vol. 36, no. 2, pp. 111-114, 1974.
47	W. Fu, T. Menzies, and X. Shen, "Tuning for software analytics: is it really necessary?," in Information and Software Technology, vol. 76, pp. 135-146, 2016. DOI
48	F. Sarro, S. Di Martino, F. Ferrucci, and C. Gravino, "A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction," in Proceedings of the 27th Annual ACM Symposium on Applied Computing, Trento, Italy, 2012, pp. 1215-1220.
49	M. Friedman, "A comparison of alternative tests of significance for the problem of m rankings," The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86-92, 1940. DOI
50	P. B. Nemenyi, "Distribution-free multiple comparisons," Ph.D. dissertation, Princeton University, Princeton, NJ, 1963.
51	S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking classification models for software defect prediction: a proposed framework and novel findings," IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 485-496, 2008 DOI
52	S. M. A. Shah, M. Morisio, and M. Torchiano, "An overview of software defect density: a scoping study," in Proceedings of the 19th Asia-Pacific Software Engineering Conference, Hong Kong, China, 2012, pp. 406-415.
53	B. Ghotra, S. McIntosh, and A. E. Hassan, "Revisiting the impact of classification techniques on the performance of defect prediction models," in Proceedings of the 37th International Conference on Software Engineering, Florence, Italy, 2015, pp. 789-800.
54	P. He, B. Li, X. Liu, J. Chen, and Y. Ma, "An empirical study on software defect prediction with a simplified metric set," Information and Software Technology, vol. 59, pp. 170-190, 2015. DOI
55	F. Rahman, D. Posnett, and P. Devanbu, "Recalling the imprecision of cross-project defect prediction," in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Cary, NC, 2012.
56	S. H. Kan, Metrics and Models in Software Quality Engineering, 2nd ed. Boston, MA: Addison-Wesley, 2003.