Towards Effective Analysis and Tracking of Mozilla and Eclipse Defects using Machine Learning Models based on Bugs Data

  • Hassan, Zohaib (FSRA&IT Solutions Providing Organization Peshawar) ;
  • Iqbal, Naeem (FSRA&IT Solutions Providing Organization Peshawar) ;
  • Zaman, Abnash (Faculty of Bioinformatics Shaheed Benazeer Bhutto Women University Peshawar)
  • Published : 2021.06.01


Analysis and Tracking of bug reports is a challenging field in software repositories mining. It is one of the fundamental ways to explores a large amount of data acquired from defect tracking systems to discover patterns and valuable knowledge about the process of bug triaging. Furthermore, bug data is publically accessible and available of the following systems, such as Bugzilla and JIRA. Moreover, with robust machine learning (ML) techniques, it is quite possible to process and analyze a massive amount of data for extracting underlying patterns, knowledge, and insights. Therefore, it is an interesting area to propose innovative and robust solutions to analyze and track bug reports originating from different open source projects, including Mozilla and Eclipse. This research study presents an ML-based classification model to analyze and track bug defects for enhancing software engineering management (SEM) processes. In this work, Artificial Neural Network (ANN) and Naive Bayesian (NB) classifiers are implemented using open-source bug datasets, such as Mozilla and Eclipse. Furthermore, different evaluation measures are employed to analyze and evaluate the experimental results. Moreover, a comparative analysis is given to compare the experimental results of ANN with NB. The experimental results indicate that the ANN achieved high accuracy compared to the NB. The proposed research study will enhance SEM processes and contribute to the body of knowledge of the data mining field.



  1. Rodriguez, D.; Herraiz, I.; Harrison, R. On software engineering repositories and their open problems. 2012 First International Workshop on Realizing AI Synergies in Software Engineering (RAISE). IEEE, 2012, pp. 52-56.
  2. Liu, C.; Fei, L.; Yan, X.; Han, J.; Midkiff, S.P. Statistical debugging: A hypothesis testing-based approach. IEEE Transactions on software engineering 2006, 32, 831-848.
  3. Voinea, L.; Telea, A. Mining software repositories with cvsgrab. Proceedings of the 2006 international workshop on Mining software repositories, 2006, pp. 167-168.
  4. Sliwerski, J.; Zimmermann, T.; Zeller, A. When do changes induce fixes? ' ACM sigsoft software engineering notes 2005, 30, 1-5.
  5. Shippey, T.; Bowes, D.; Hall, T. Automatically identifying code features for software defect prediction: Using ast n-grams. Information and Software Technology 2019, 106, 142-160.
  6. Arshad, S.; Tjortjis, C. Clustering software metric values extracted from c# code for maintainability assessment. Proceedings of the 9th Hellenic Conference on Artificial Intelligence, 2016, pp. 1-4.
  7. Tjortjis, C. Data Mining Code Clustering (DMCC): An approach supporting software maintenance and comprehension. Technical report, Technical report, School of Science & Technology, International Hellenic . . . , 2019.
  8. Kanwal, J.; Basit, H.A.; Maqbool, O. Structural clones: An evolution perspective. 2018 IEEE 12th International Workshop on Software Clones (IWSC). IEEE, 2018, pp. 9-15.
  9. Hindle, A.; German, D.M.; Holt, R. What do large commits tell us? A taxonomical study of large commits. Proceedings of the 2008 international working conference on Mining software repositories, 2008, pp. 99-108.
  10. Tan, P.N.; Steinbach, M.; Kumar, V. Introduction to data mining, Pearson education. Inc., New Delhi 2006.
  11. Ekanayake, J.; Tappolet, J.; Gall, H.C.; Bernstein, A. Tracking concept drift of software projects using defect prediction quality. 2009 6th IEEE International Working Conference on Mining Software Repositories. IEEE, 2009, pp. 51-60.
  12. Zimmermann, T.; Zeller, A.; Weissgerber, P.; Diehl, S. Mining version histories to guide software changes. IEEE Transactions on Software Engineering 2005, 31, 429-445.
  13. Fu, S.; Shen, B. Code bad smell detection through evolutionary data mining. 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 2015, pp. 1-9.
  14. Naseem, R. An improved hierarchical clustering combination approach for software modularization. PhD thesis, Universiti Tun Hussein Onn Malaysia, 2017.
  15. Schafer, T.; Jonas, J.; Mezini, M. Mining framework usage changes from instantiation code. Proceedings of the 30th international conference on Software engineering, 2008, pp. 471-480.
  16. Raza, U.; Tretter, M. Predicting software outcomes using data mining and text mining. SAS Global Forum, 2007.
  17. Raghavan, S.; Rohana, R.; Leon, D.; Podgurski, A.; Augustine, V. Dex: A semantic-graph differencing tool for studying changes in large code bases. 20th IEEE International Conference on Software Maintenance, 2004. Proceedings. IEEE, 2004, pp. 188-197.
  18. Rolfsnes, T.; Di Alesio, S.; Behjati, R.; Moonen, L.; Binkley, D.W. Generalizing the analysis of evolutionary coupling for software change impact analysis. 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 2016, Vol. 1, pp. 201-212.
  19. German, D.M. An empirical study of fine-grained software modifications. Empirical Software Engineering 2006, 11, 369-393.