Browse > Article
http://dx.doi.org/10.9716/KITS.2013.12.2.361

Ambiguity Analysis of Defectiveness in NASA MDP Data Sets  

Hong, Euyseok (성신여자대학교 IT학부)
Publication Information
Journal of Information Technology Services / v.12, no.2, 2013 , pp. 361-371 More about this Journal
Abstract
Public domain defect data sets, such as NASA data sets which are available from the NASA MDP and PROMISE repositories, make it possible to compare the results of different defect prediction models by using the same data sets. This means that repeatable and general prediction models can be built. However, some recent studies have raised questions about the quality of two versions of NASA data set, and made new cleaned data sets by applying their data cleaning processes. We find that there are two ways in the NASA MDP versions to determine the defectiveness of a module, 0 or 1, and the two results are different in some cases. This serious problem, to our knowledge, has not been addressed in previous studies. To handle this ambiguity problem, we define two kinds of module defectiveness and two conditions that can be used to determine the ambiguous cases. We meticulously analyze 5 projects among the 13 NASA projects by using our ambiguity analysis method. The results show that JM1 and PC4 are the best projects with few ambiguous cases.
Keywords
Public Data Sets; NASA MDP Data Sets; Defectiveness; Ambiguity Analysis;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Lessmann, S., B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction : A Proposed Framework and Novel Findings", IEEE Trans. Software Engineering, Vol.34, No.4(2008), pp.485-496.   DOI   ScienceOn
2 Menzies, T., J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors", IEEE Trans. Software Engineering, Vol.33, No.1(2007), pp.2-13.   DOI   ScienceOn
3 Seliya, N. and T. M. Khoshgoftaar, "Software quality analysis of unlabeled program modules with semisupervised clustering", IEEE Trans. Systems, Man and Cybernetics, Vol.37, No.2(2007), pp.201-211.   DOI   ScienceOn
4 Shepperd, M., Q. Song, Z. Sun, and C. Mair, "Data Quality : Some Comments on the NASA Software Defect Data Sets", http://nasa-softwaredefectdatasets.wikispaces.com /file/view/NASA+defect+data+sets+paper. pdf.
5 홍의석, "훈련데이터 집합을 사용하지 않는 소프트웨어 품질예측 모델", 정보처리학회논문지, 제10-D권, 제4호(2003), pp.689-696.   과학기술학회마을   DOI   ScienceOn
6 Catal, C. and B. Diri, "A systematic review of software fault prediction studies", Expert Systems with Applications, Vol.36, No.4(2009), pp.7346-7354.   DOI   ScienceOn
7 Catal, C., "Software fault prediction : A literature review and current trends", Expert Systems with Applications, Vol.38, No.4(2011), pp.4626-4636.   DOI   ScienceOn
8 Elish, K. O. and M. O. Elish, "Predicting defect prone software modules using support vector machines", J. Systems Software, Vol. 81, No.5(2008), pp.649-660.   DOI   ScienceOn
9 Gray, D., D. Bowes, N. Davey, Y. Sun, and B. Christianson, "Reflections on the NASA MDP data sets", IET Software, Vol.6, No.6 (2012), pp.549-558.   DOI   ScienceOn
10 Hall, T., S. Beecham, D. Bowes, D. Gray and S. Counsell, "A Systematic Literature Review on Fault Prediction Performance in Software Engineering", IEEE Trans. Software Engineering, Vol.38, No.6(2012), pp.1276-1304.   DOI   ScienceOn
11 Khoshgoftaar, T. M. and E. B. Allen, "Ordering fault-prone software modules", Software Quality Journal, Vol.11, No.1(2003), pp. 19-37.   DOI   ScienceOn
12 Zhou, Y. and H. Leung, "Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults", IEEE Trans. Software Engineering, Vol.32, No.10(2006), pp.771-789.   DOI   ScienceOn
13 Song, Q., Z. Jia, M. Shepperd, S. Ying, and J. Liu, "A General Software Defect-Proneness Prediction Framework", IEEE Trans. Software Engineering, Vol.37, No.3(2011), pp.356-370.   DOI   ScienceOn
14 Zhong, S., T. M. Khoshgoftaar, and N. Seliya, "Analyzing Software Measurement Data with Clustering Techniques", IEEE Intelligent Systems, Vol.19, No.2(2004), pp.20-27.