Automated Code Smell Detection for Software Quality Assurance Using a Web-Based Machine Learning Framework

Saiful Islam Emon
Md Mahbubur Rahman
Amena Akter
Srikanto Rajbongshi
Sumona Yeasmin
M.A. Nur Quraishi
Ahmad Shafkat
Yaqoob Majeed

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Code Smells are indicators of structural weaknesses in software design and implementation that can negatively impact maintainability, scalability, and readability. To enhance software quality and support efficient maintenance, this study proposes a machine learning based approach for the automated detection of Critical Threshold Rule (CTR) violations, specifically targeting Long Method and Large Class Code Smell types. During the feature selection process, six machine learning models Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Stochastic Gradient Descent (SGD) were used alongside techniques such as GridSearchCV, RandomizedSearchCV, Out-of-Bag (OOB) validation, and SHAP values to improve both model performance and interpretability. After selecting the optimal features, thirteen machine learning classifiers were trained and evaluated: LR, DT, RF, SVM, Gaussian Naive Bayes (GNB), Multinomial Naive Bayes (MNB), MLP, Linear Support Vector (Linear SV), K-Nearest Neighbors (KNN), Gradient Boosting (GB), Extra Trees (ET), Bernoulli Naive Bayes (BNB), and AdaBoost (AB). These models were trained on datasets extracted from Software Development Versioning (SDV) repositories containing 1,000 log file entries. Evaluations across multiple metrics including accuracy, precision, recall, F1 score, and ROC AUC; showed that ensemble-based models, particularly RF, consistently delivered top performance, achieving the highest accuracy of 96.02% for Long method and 92.63% for Large Class dataset. The findings were further validated through statistical analysis using the Wilcoxon signed-rank test. To support practical use, the proposed method was implemented as a web application using the React Native framework. The system analyzes log data, identifies smelly code segments, and highlights their severity, offering actionable insights for developers. This approach provides a reliable and interpretable solution for Code Smell detection and shows strong potential for real-world integration into modern software development environments.

Version published to 10.21203/rs.3.rs-6474801/v1 on Research Square
Apr 21, 2025

Multiclass Classification and Prioritisation of Static Analysis Warnings Using Developer-Labelled Industrial Data

This article has 3 authors:
1. Benedikt Fein
2. Vibhash Kumar Singh
3. Gordon Fraser
This article has no evaluationsLatest version Apr 30, 2025
Enhancing malware detection reliability in non-executable files using confidence score prediction

This article has 4 authors:
1. Rasoul Rezvani-Jalal
2. Morteza Zakeri
3. Saeed Parsa
4. Amin Hasan-Zarei
This article has no evaluationsLatest version May 15, 2025
GPTVD: Vulnerability Detection and Analysis Method Based on LLM's Chain of Thoughts.

This article has 5 authors:
1. Yinan Chen
2. Yuan Huang
3. Xiangping Chen
4. Pengfei Shen
5. Lei Yun
This article has no evaluationsLatest version Apr 21, 2025

Listed in

Abstract

Article activity feed

Related articles

Multiclass Classification and Prioritisation of Static Analysis Warnings Using Developer-Labelled Industrial Data

Enhancing malware detection reliability in non-executable files using confidence score prediction

GPTVD: Vulnerability Detection and Analysis Method Based on LLM's Chain of Thoughts.