• Title/Summary/Keyword: Software Defect Prediction

Search Result 25, Processing Time 0.021 seconds

Defect Severity-based Defect Prediction Model using CL

  • Lee, Na-Young;Kwon, Ki-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.9
    • /
    • pp.81-86
    • /
    • 2018
  • Software defect severity is very important in projects with limited historical data or new projects. But general software defect prediction is very difficult to collect the label information of the training set and cross-project defect prediction must have a lot of data. In this paper, an unclassified data set with defect severity is clustered according to the distribution ratio. And defect severity-based prediction model is proposed by way of labeling. Proposed model is applied CLAMI in JM1, PC4 with the least ambiguity of defect severity-based NASA dataset. And it is evaluated the value of ACC compared to original data. In this study experiment result, proposed model is improved JM1 0.15 (15%), PC4 0.12(12%) than existing defect severity-based prediction models.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

Software Defect Prediction Based on SAINT (SAINT 기반의 소프트웨어 결함 예측)

  • Sriman Mohapatra;Eunjeong Ju;Jeonghwa Lee;Duksan Ryu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.5
    • /
    • pp.236-242
    • /
    • 2024
  • Software Defect Prediction (SDP) enhances the efficiency of software development by proactively identifying modules likely to contain errors. A major challenge in SDP is improving prediction performance. Recent research has applied deep learning techniques to the field of SDP, with the SAINT model particularly gaining attention for its outstanding performance in analyzing structured data. This study compares the SAINT model with other leading models (XGBoost, Random Forest, CatBoost) and investigates the latest deep learning techniques applicable to SDP. SAINT consistently demonstrated superior performance, proving effective in improving defect prediction accuracy. These findings highlight the potential of the SAINT model to advance defect prediction methodologies in practical software development scenarios, and were achieved through a rigorous methodology including cross-validation, feature scaling, and comparative analysis.

Bayesian Optimization Framework for Improved Cross-Version Defect Prediction (향상된 교차 버전 결함 예측을 위한 베이지안 최적화 프레임워크)

  • Choi, Jeongwhan;Ryu, Duksan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.339-348
    • /
    • 2021
  • In recent software defect prediction research, defect prediction between cross projects and cross-version projects are actively studied. Cross-version defect prediction studies assume WP(Within-Project) so far. However, in the CV(Cross-Version) environment, the previous work does not consider the distribution difference between project versions is important. In this study, we propose an automated Bayesian optimization framework that considers distribution differences between different versions. Through this, it automatically selects whether to perform transfer learning according to the difference in distribution. This framework is a technique that optimizes the distribution difference between versions, transfer learning, and hyper-parameters of the classifier. We confirmed that the method of automatically selecting whether to perform transfer learning based on the distribution difference is effective through experiments. Moreover, we can see that using our optimization framework is effective in improving performance and, as a result, can reduce software inspection effort. This is expected to support practical quality assurance activities for new version projects in a cross-version project environment.

Centroid and Nearest Neighbor based Class Imbalance Reduction with Relevant Feature Selection using Ant Colony Optimization for Software Defect Prediction

  • B., Kiran Kumar;Gyani, Jayadev;Y., Bhavani;P., Ganesh Reddy;T, Nagasai Anjani Kumar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.1-10
    • /
    • 2022
  • Nowadays software defect prediction (SDP) is most active research going on in software engineering. Early detection of defects lowers the cost of the software and also improves reliability. Machine learning techniques are widely used to create SDP models based on programming measures. The majority of defect prediction models in the literature have problems with class imbalance and high dimensionality. In this paper, we proposed Centroid and Nearest Neighbor based Class Imbalance Reduction (CNNCIR) technique that considers dataset distribution characteristics to generate symmetry between defective and non-defective records in imbalanced datasets. The proposed approach is compared with SMOTE (Synthetic Minority Oversampling Technique). The high-dimensionality problem is addressed using Ant Colony Optimization (ACO) technique by choosing relevant features. We used nine different classifiers to analyze six open-source software defect datasets from the PROMISE repository and seven performance measures are used to evaluate them. The results of the proposed CNNCIR method with ACO based feature selection reveals that it outperforms SMOTE in the majority of cases.

Cross-Project Pooling of Defects for Handling Class Imbalance

  • Catherine, J.M.;Djodilatchoumy, S
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.11-16
    • /
    • 2022
  • Applying predictive analytics to predict software defects has improved the overall quality and decreased maintenance costs. Many supervised and unsupervised learning algorithms have been used for defect prediction on publicly available datasets. Most of these datasets suffer from an imbalance in the output classes. We study the impact of class imbalance in the defect datasets on the efficiency of the defect prediction model and propose a CPP method for handling imbalances in the dataset. The performance of the methods is evaluated using measures like Matthew's Correlation Coefficient (MCC), Recall, and Accuracy measures. The proposed sampling technique shows significant improvement in the efficiency of the classifier in predicting defects.

A Comparative Experiment of Software Defect Prediction Models using Object Oriented Metrics (객체지향 메트릭을 이용한 결함 예측 모형의 실험적 비교)

  • Kim, Yun-Kyu;Kim, Tae-Yeon;Chae, Heung-Seok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.8
    • /
    • pp.596-600
    • /
    • 2009
  • To support an efficient management of software verification and validation activities, many defect prediction models have been proposed based on object oriented metrics. They usually adopt logistic regression analysis, And, they state that the correctness of prediction is about 60${\sim}$70%, We performed a similar experiment with Eclipse 3.3 to check their prediction effectiveness, However, the result shows that correctness is about 40% which is much lower than the original results. We also found that univariate logistic regression analysis produces better results than multivariate logistic regression analysis.

A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction (교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.6
    • /
    • pp.205-220
    • /
    • 2018
  • Software defect prediction is helpful for allocating valuable project resources effectively for software quality assurance activities thanks to focusing on the identified fault-prone modules. If historical data collected within a company is sufficient, a Within-Project Defect Prediction (WPDP) can be utilized for accurate fault-prone module prediction. In case a company does not maintain historical data, it may be helpful to build a classifier towards predicting comprehensible fault prediction based on Cross-Project Defect Prediction (CPDP). Since CPDP employs different project data collected from other organization to build a classifier, the main obstacle to build an accurate classifier is that distributions between source and target projects are not similar. To address the problem, because it is crucial to identify effective similarity measure techniques to obtain high performance for CPDP, In this paper, we aim to identify them. We compare various similarity measure techniques. The effectiveness of similarity weights calculated by those similarity measure techniques are evaluated. The results are verified using the statistical significance test and the effect size test. The results show k-Nearest Neighbor (k-NN), LOcal Correlation Integral (LOCI), and Range methods are the top three performers. The experimental results show that predictive performances using the three methods are comparable to those of WPDP.

Quality Measurement Process Management Using Defect Data of Embedded SW (Embedded SW의 품질 측정 프로세스 관리 방법에 관한 연구)

  • Park, Bok-Nam
    • 한국IT서비스학회:학술대회논문집
    • /
    • 2003.11a
    • /
    • pp.713-721
    • /
    • 2003
  • The time to market and productivity of embedded system needs a quality measurement process management of embedded software. But, defect management without preemptive analysis or prediction is not useful for quality measurement process management. This subject is focused on the defect that is one of the most important attributes of software measure in the process. Defining of defect attribute and quality measurement process management is according to understanding of embedded sw characteristics and defect data. So, this study contributes to propose the good method of the quantitative based on defect management in the test phase of sw lifecycle.

  • PDF

Software Quality Prediction based on Defect Severity (결함 심각도에 기반한 소프트웨어 품질 예측)

  • Hong, Euy-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.5
    • /
    • pp.73-81
    • /
    • 2015
  • Most of the software fault prediction studies focused on the binary classification model that predicts whether an input entity has faults or not. However the ability to predict entity fault-proneness in various severity categories is more useful because not all faults have the same severity. In this paper, we propose fault prediction models at different severity levels of faults using traditional size and complexity metrics. They are ternary classification models and use four machine learning algorithms for their training. Empirical analysis is performed using two NASA public data sets and a performance measure, accuracy. The evaluation results show that backpropagation neural network model outperforms other models on both data sets, with about 81% and 88% in terms of accuracy score respectively.