Improving Ensemble Models for Software Defect Prediction: a study applying preprocessing techniques

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Defect prediction in software is a practice to improve the quality of software. However, the methods proposed to detect defects efficiently have challenges. Methods based on mining software repositories face challenges like the high dimensionality of data sets and the imbalance of datasets from software repositories. The need to deal with imbalanced data scenarios and large feature sets motivates the search to improve defect prediction models' effectiveness. Related works have studied ensemble models, feature selection, and imbalanced data, but have not analyzed their individual and combined impact with real-world datasets. The general purpose is to enhance the mining of software repositories to detect defects. We collected data from three open-source repositories, preprocessed using feature selection and data balance techniques, and developed models to compare with the same model algorithms but without preprocessing. The results are promising, showing improvement on final general metrics, as well as the metrics for the minority class. All the code developed in this research is available in the GitHub repository SoftDefectProcess

Article activity feed