Enhancing Logistic Regression Performance Through Hyperparameter Tuning: A Comparative Evaluation Across Datasets

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Logistic regression (LR) is widely used in binary and multi-class classification tasks, yet its predictive performance is highly sensitive to hyperparameter configuration. Suboptimal choices can lead to overfitting, underfitting, reduced generalization, and inconsistent model behavior across datasets. This study aims to systematically enhance LR performance by applying a comprehensive hyperparameter optimization framework and evaluating its impact across four diverse datasets: breast cancer, heart disease, liver disorders, and handwritten digits. Methods: A Python-based experimental framework was developed using Scikit-learn, NumPy, and Pandas to examine how hyperparameters influence LR performance. A combinatorial optimization strategy was applied to tune regularization strength (C), penalty type (L1), solver choice ( liblinear , saga ), class-weight settings, and maximum iterations. Model evaluation was conducted using both train--test splits (20%, 30%, 40%) and k-fold cross-validation (\((k = 3, 5, 10)\)). Performance was assessed using accuracy, F1-score, AUC, and cross-validation accuracy. Tableau-based visual analytics were used to compare model behaviors under different configurations. Results: Optimized hyperparameters consistently improved model performance across all datasets. The breast cancer and digits datasets achieved the most substantial gains, with maximum test accuracies of 97% and 98%, respectively, and AUC values up to 0.99. Cross-validation scores indicated strong generalization, with the best-performing models showing CV accuracies above 0.90. In contrast, performance improvements on heart disease and liver disorder datasets were present but more modest due to noisier features and class imbalance. Hyperparameter combinations involving L1 penalty, balanced class weights, and the liblinear solver produced the highest accuracy and F1-scores across several datasets. Conclusions: Systematic hyperparameter tuning significantly enhances logistic regression performance, generalization, and discrimination ability. The results demonstrate that even simple models can achieve high accuracy when appropriately optimized. This framework provides practical guidance for improving LR across heterogeneous datasets and highlights the importance of penalty choice, regularization strength, and solver selection. Future work should explore advanced optimization techniques such as Bayesian optimization and evolutionary algorithms to further improve efficiency and performance.

Article activity feed