Within-Project and Cross-Project Defect Prediction Based on Model Averaging

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Software defect prediction has an important impact on the national economy and financial service industry. Discovering defective modules in the early stage of software development has great significance. This paper proposes a within-project and cross-project defect prediction technology based on model averaging, which uses XGBoost and LightGBM algorithms in machine learning as candidate models and introduces model averaging theory to improve performance. First, two candidate models are used for probability prediction, and then each group is used as a test dataset to evaluate the model by the cross-validation method. Then, the model weight is determined by minimizing the sum of the squared prediction errors of all groups, and finally, the predicted probability of model averaging is obtained. Four typical public software defect datasets (NASA, AEEEM, ReLink, SoftLab) are used as test datasets, and the four indicators, precision, recall, F1 and AUC are used as evaluation criteria. For within-project defect prediction, compared with the XGBoost and LightGBM algorithms, the prediction results of the model averaging method on the four datasets are slightly better than the XGBoost and LightGBM algorithms, which also corresponds with the ensemble learning idea of model averaging theory. Compared with the six traditional machine learning algorithms, the model average prediction method performed best on most of the data. For cross-project defect prediction, compared with the four benchmark methods, the model averaging method performs better overall. The experimental results show that the model averaging prediction method achieves good prediction results in both the within-project and cross-project defect scenarios.

Article activity feed