Predictive models for secondary epilepsy within 1 year in patients with acute ischemic stroke: a multicenter retrospective study

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This useful study reports machine learning models derived from large-scale data to predict the risk of post-stroke epilepsy. The evidence supporting the conclusions is, however, incomplete, as many critical methodological aspects have been omitted or described too briefly, the analysis of the results is not complete, and the dataset and code have not been disclosed, which represents an obstacle to reproducibility. The study may be of some interest in the field of clinical neurology.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Objective

Post-stroke epilepsy (PSE) is a significant complication that has a negative impact on the prognosis and quality of life of ischemic stroke patients. We collected medical records from 4 hospitals in Chongqing and created an interpretable machine learning model for prediction.

Methods

We collected medical records, imaging reports, and laboratory tests from 21459 patients with a diagnosis of ischemic stroke . We conducted traditional univariable and multivariable statistics analyses to compare and identify important features. Then the data was divided into a 70% training set and a 30% testing set. We employed the Synthetic Minority Oversampling Technique combined with Edited Nearest Neighbors method to resample an imbalanced dataset in the training set. Nine commonly used methods were used to build machine learning models, and relevant prediction metrics were compared to select the best-performing model. Finally, we used SHAP(SHapley Additive exPlanations) for model interpretability analysis, assessing the contribution and clinical significance of different features to the prediction.

Results

In the traditional regression analysis, complications such as hydrocephalus, cerebral hernia, uremia, deep vein thrombosis; significant brain regions included the involvement of the cortical regions including frontal lobe, parietal lobe, occipital lobe, temporal lobe, subcortical region of basal ganglia, thalamus and so on contributed to PSE. General features such as age, gender, and the National Institutes of Health Stroke Scale score, as well as laboratory indicators including WBC count, D-dimer, lactate, HbA1c and so on were associated with a higher likelihood of PSE. Patients with conditions such as fatty liver, coronary heart disease, hyperlipidemia, and low HDL had a higher likelihood of developing PSE. The machine learning models, particularly tree models such as Random Forest, XGBoost, and LightGBM, demonstrated good predictive performance with an AUC of 0.99.

Conclusion

The model built on a large dataset can effectively predict the likelihood of PSE, with tree-based models performing the best. The NIHSS score , WBC count and D-dimer were found to have the greatest impact.

Article activity feed

  1. eLife assessment

    This useful study reports machine learning models derived from large-scale data to predict the risk of post-stroke epilepsy. The evidence supporting the conclusions is, however, incomplete, as many critical methodological aspects have been omitted or described too briefly, the analysis of the results is not complete, and the dataset and code have not been disclosed, which represents an obstacle to reproducibility. The study may be of some interest in the field of clinical neurology.

  2. Reviewer #1 (Public Review):

    Summary:

    This is a large cohort of ischemic stroke patients from a single centre. The author successfully set up predictive models for PTS.

    Strengths:

    The design and implementation of the trial are acceptable, and the results are credible. It may provide evidence of seizure prevention in the field of stroke treatment.

    Weaknesses:

    The methodology needs further consideration. The Discussion needs extensive rewriting.

  3. Reviewer #2 (Public Review):

    Summary

    The authors present multiple machine-learning methodologies to predict post-stroke epilepsy (PSE) from admission clinical data.

    Strengths

    The Statistical Approach section is very well written. The approaches used in this section are very sensible for the data in question.

    Weaknesses

    There are many typos and unclear statements throughout the paper.

    There are some issues with SHAP interpretation. SHAP in its default form, does not provide robust statistical guarantees of effect size. There is a claim that "SHAP analysis showed that white blood cell count had the greatest impact among the routine blood test parameters". This is a difficult claim to make.

    The Data Collection section is very poorly written, and the methodology is not clear.

    There is no information about hyperparameter selection for models or whether a hyperparameter search was performed. Given this, it is difficult to conclude whether one machine learning model performs better than others on this task.

    The inclusion and exclusion criteria are unclear - how many patients were excluded and for what reasons?

    There is no sensitivity analysis of the SMOTE methodology: How many synthetic data points were created, and how does the number of synthetic data points affect classification accuracy?

    Did the authors achieve their aims? Do the results support their conclusions?

    The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage.

    The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

    For greater certainty on all reported results, it would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits

    The likely impact of the work on the field

    If this model works as claimed, it will be useful for predicting PSE. This has some direct clinical utility.

    Analysis of features contributing to PSE may provide clinical researchers with ideas for further research on the underlying aetiology of PSE.

    Additional context that might help readers

    The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

  4. Reviewer #3 (Public Review):

    Summary:

    The authors report the performance of a series of machine learning models inferred from a large-scale dataset and externally validated with an independent cohort of patients, to predict the risk of post-stroke epilepsy. Some of the reported models have very good explicative and predictive performances.

    Strengths:

    The models have been derived from real-world large-scale data.

    Performances of the best-performing models are very good according to the external validation results.

    Early prediction of the risk of post-stroke epilepsy would be of high interest to implement early therapeutic interventions that could improve prognosis.

    Weaknesses:

    There are issues with the readability of the paper. Many abbreviations are not introduced properly and sometimes are written inconsistently. A lot of relevant references are omitted. The methodological descriptions are extremely brief and, sometimes, incomplete.

    The dataset is not disclosed, and neither is the code (although the code is made available upon request). For the sake of reproducibility, unless any bioethical concerns impede it, it would be good to have these data disclosed.

    Although the external validation is appreciated, cross-validation to check the robustness of the models would also be welcome.