Automated Code Smell Detection for Software Quality Assurance Using a Web-Based Machine Learning Framework
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Code Smells are indicators of structural weaknesses in software design and implementation that can negatively impact maintainability, scalability, and readability. To enhance software quality and support efficient maintenance, this study proposes a machine learning based approach for the automated detection of Critical Threshold Rule (CTR) violations, specifically targeting Long Method and Large Class Code Smell types. During the feature selection process, six machine learning models Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Stochastic Gradient Descent (SGD) were used alongside techniques such as GridSearchCV, RandomizedSearchCV, Out-of-Bag (OOB) validation, and SHAP values to improve both model performance and interpretability. After selecting the optimal features, thirteen machine learning classifiers were trained and evaluated: LR, DT, RF, SVM, Gaussian Naive Bayes (GNB), Multinomial Naive Bayes (MNB), MLP, Linear Support Vector (Linear SV), K-Nearest Neighbors (KNN), Gradient Boosting (GB), Extra Trees (ET), Bernoulli Naive Bayes (BNB), and AdaBoost (AB). These models were trained on datasets extracted from Software Development Versioning (SDV) repositories containing 1,000 log file entries. Evaluations across multiple metrics including accuracy, precision, recall, F1 score, and ROC AUC; showed that ensemble-based models, particularly RF, consistently delivered top performance, achieving the highest accuracy of 96.02% for Long method and 92.63% for Large Class dataset. The findings were further validated through statistical analysis using the Wilcoxon signed-rank test. To support practical use, the proposed method was implemented as a web application using the React Native framework. The system analyzes log data, identifies smelly code segments, and highlights their severity, offering actionable insights for developers. This approach provides a reliable and interpretable solution for Code Smell detection and shows strong potential for real-world integration into modern software development environments.