Learning from the Margins: Using Machine Learning Models to Predict Epilepsy Outcomes from Socioeconomic Factors
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Epilepsy is one of the most common serious neurological conditions globally and is associated with substantial morbidity, reduced quality of life, and increased socioeconomic disadvantage. Although clinical predictors of seizure outcomes have been widely studied, demographic and socioeconomic determinants of treatment response remain underrepresented in predictive modelling. Greater understanding of how these factors influence outcomes may inform earlier identification of patients at risk of poor seizure control and contribute to strategies aimed at reducing health inequalities. Methods A retrospective observational study was undertaken using routinely collected clinical, demographic, and socioeconomic data from 1,609 adults with epilepsy attending a specialist epilepsy service in England between 2020 and 2024. Demographic variables included age, sex, and ethnicity. Socioeconomic position was captured using area-level measures from the Index of Multiple Deprivation, including income, employment, education, health, housing, and living environment domains. Limited high-level clinical descriptors were included, without medication-specific information. Logistic Regression, Random Forest, and Gradient Boosting models were used to predict seizure-free status at the most recent clinical review. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied within training folds. Model performance was assessed using stratified cross-validation and evaluated using accuracy, precision, recall, and precision–recall area under the curve. Results Demographic and socioeconomic variables showed consistent associations with epilepsy treatment outcomes and contributed substantially to model performance. After accounting for class imbalance, Random Forest and Gradient Boosting models achieved accuracy, precision, and recall exceeding 83%. Individuals residing in more socioeconomically deprived areas were overrepresented among those who were not seizure-free, indicating persistent social gradients in epilepsy outcomes. Comparable predictive performance was observed when models were trained using predominantly non-clinical features, suggesting that social determinants carry independent prognostic value. Conclusions Demographic and socioeconomic factors are important predictors of epilepsy treatment outcomes and can be leveraged within machine learning frameworks to support population-level risk stratification. Incorporating social determinants of health into predictive approaches may help identify patients at increased risk of poor outcomes and inform more equitable, context-aware epilepsy care pathways.