Improving Diagnostic Accuracy of Routine EEG for Epilepsy using Deep Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background and Objectives
The diagnostic yield of routine EEG in epilepsy is limited by low sensitivity and the potential for misinterpretation of interictal epileptiform discharges (IEDs). Our objective is to develop, train, and validate a deep learning model that can identify epilepsy from routine EEG recordings, complementing traditional IED-based interpretation.
Methods
This is a retrospective cohort study of diagnostic accuracy. All consecutive patients undergoing routine EEG at our tertiary care center between January 2018 and September 2019 were included. EEGs recorded between July 2019 and September 2019 constituted a temporally shifted testing cohort. The diagnosis of epilepsy was established by the treating neurologist at the end of the available follow-up period, based on clinical file review. Original EEG reports were reviewed for IEDs. We developed seven novel deep learning models based on Vision Transformers (ViT) and Convolutional Neural Networks (CNN), training them to classify raw EEG recordings. We compared their performance to IED-based interpretation and two previously proposed machine learning methods.
Results
The study included 948 EEGs from 846 patients (820 EEGs/728 patients in training/validation, 128 EEGs/118 patients in testing). Median follow-up was 2.2 years and 1.7 years in each cohort, respectively. Our flagship ViT model, DeepEpilepsy, achieved an area under the receiver operating characteristic curve (AUROC) of 0.76 (95% CI: 0.69–0.83), outperforming IED-based interpretation (0.69; 0.64–0.73) and previous methods. Combining DeepEpilepsy with IEDs increased the AUROC to 0.83 (0.77–0.89).
Discussion
DeepEpilepsy can identify epilepsy on routine EEG independently of IEDs, suggesting that deep learning can detect novel EEG patterns relevant to epilepsy diagnosis. Further research is needed to understand the exact nature of these patterns and evaluate the clinical impact of this increased diagnostic yield in specific settings.