Refactoring in Software Maintenance and Development: Application with Case Study
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transforming code from one form to another is a critical software maintenance activity aimed at increasing code quality without changing its external behavior. Yet, the quantitative impact this has on maintainability and defect reduction is seemingly not well understood and is certainly not well predicted. This study present a dual-model approach for estimating post-refactor maintainability and for classifying whether a given refactor will reduce the number of defects that appear in a module after it has been worked on. For maintainability estimation, Random Forest Regressor trained on a dataset code of 150,000 lines from Github was used, dataset that represents modules before and after they have been worked on. For classification, we use a Random Forest Classifier trained on a dataset representing the kinds of changes made to modules when they are refactored. Both models are well-optimized, with the hyperparameters for both being selected via rigorous cross-validation procedures.The regression model attained an R² of 0.877 with an RMSE of 3.03, while the classification model achieved81.25% accuracy, with precision = 0.836, recall = 0.933, and F1-score = 0.882. Feature importance and SHAP analyses identified pre-refactoring MI, code duplication, and cyclomatic complexity as dominant predictors. The projected models were further validated through hyperparameter optimization and robustness evaluation. The outcomesrevealed that structural complexity metrics are more analytical of post-refactoring quality improvements than specific refactoring method indicators. These study can inform data-driven decision-making in continuous integration workflows, allowing automated evaluation of refactoring results and supporting evidence-based software maintenance policies.