Enhancing Logistic Regression Performance Through Hyperparameter Tuning: A Comparative Evaluation Across Datasets

Mueed Ahmad
Noman Javed
Awais Muzafar
Mateen Muzafar
Hadia Naseer
Guantian Huang
Dianning He

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Logistic regression (LR) is widely used in binary and multi-class classification tasks, yet its predictive performance is highly sensitive to hyperparameter configuration. Suboptimal choices can lead to overfitting, underfitting, reduced generalization, and inconsistent model behavior across datasets. This study aims to systematically enhance LR performance by applying a comprehensive hyperparameter optimization framework and evaluating its impact across four diverse datasets: breast cancer, heart disease, liver disorders, and handwritten digits. Methods: A Python-based experimental framework was developed using Scikit-learn, NumPy, and Pandas to examine how hyperparameters influence LR performance. A combinatorial optimization strategy was applied to tune regularization strength (C), penalty type (L1), solver choice ( liblinear , saga ), class-weight settings, and maximum iterations. Model evaluation was conducted using both train--test splits (20%, 30%, 40%) and k-fold cross-validation (\((k = 3, 5, 10)\)). Performance was assessed using accuracy, F1-score, AUC, and cross-validation accuracy. Tableau-based visual analytics were used to compare model behaviors under different configurations. Results: Optimized hyperparameters consistently improved model performance across all datasets. The breast cancer and digits datasets achieved the most substantial gains, with maximum test accuracies of 97% and 98%, respectively, and AUC values up to 0.99. Cross-validation scores indicated strong generalization, with the best-performing models showing CV accuracies above 0.90. In contrast, performance improvements on heart disease and liver disorder datasets were present but more modest due to noisier features and class imbalance. Hyperparameter combinations involving L1 penalty, balanced class weights, and the liblinear solver produced the highest accuracy and F1-scores across several datasets. Conclusions: Systematic hyperparameter tuning significantly enhances logistic regression performance, generalization, and discrimination ability. The results demonstrate that even simple models can achieve high accuracy when appropriately optimized. This framework provides practical guidance for improving LR across heterogeneous datasets and highlights the importance of penalty choice, regularization strength, and solver selection. Future work should explore advanced optimization techniques such as Bayesian optimization and evolutionary algorithms to further improve efficiency and performance.

Version published to 10.21203/rs.3.rs-8304042/v1 on Research Square
Jan 9, 2026

Reliable CNN Evaluation in Medical Imaging via Variance-Aware Cross-Validation

This article has 2 authors:
1. Peter Abban
2. Mehdi Taassori
This article has no evaluationsLatest version Feb 12, 2026
EGADB: An Enhanced Genetic Algorithm for Class Imbalance Problems

This article has 4 authors:
1. Oluwafunmilola Aderannibi Adepegba
2. Stephen Olatunde Olabiyisi
3. Solomon Akinboro
4. Emmanuel Okyere Ekwam
This article has no evaluationsLatest version Feb 10, 2026
Intelligent 5G Network Performance Optimization through Gradient Boosting

This article has 2 authors:
1. Mohammed Al-Hubaishi
2. Abdulkader Alabdullah
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reliable CNN Evaluation in Medical Imaging via Variance-Aware Cross-Validation

EGADB: An Enhanced Genetic Algorithm for Class Imbalance Problems

Intelligent 5G Network Performance Optimization through Gradient Boosting