Automated Seizure Classification Using Multimodal Large Language Models

Lina Zhang
Richard Jiang
Tonmoy Monsoor
Jessica N. Pasqua
Colin M. McCrimmon
Prateik Sinha
Kartik Sharma
Muayad Alzuabi
Victor Morales
Hailey M. Miranda
Chaya Manjeshwar
Vwani Roychowdhury
Rajarshi Mazumder

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective

Accurately distinguishing between epileptic seizures (ES) and nonepileptic seizures (NES) is a significant clinical challenge that typically requires resource-intensive inpatient video-EEG monitoring. Here, we developed a novel Multimodal Large Language Models (MLLMs)-based method for automated extraction of semiological features from videos of seizure events, and subsequently, classified the events as ES or NES.

Methods

90 videos of ES and NES events from 29 patients were obtained from an epilepsy monitoring unit at a large academic hospital. Events were labeled as ES or NES based on expert evaluation of video-EEG recordings and simultaneously annotated with 24 clinically relevant semiological features. We implemented a MLLMs framework that integrates open-source vision-language models (VLMs) and audio-language models (ALMs) to analyze the videos and associated audio tracks and automatically extract these 24 features. The performance of the MLLMs-based feature extraction was evaluated against expert annotations. These features were subsequently used to train several classifiers including K-Nearest Neighbors (KNN), XGBoost, and Deep Factorization Machine, to differentiate ES from NES. Model performance was evaluated using leave-one-patient-out (LOPO) cross-validation.

Results

Using KNN, expert-annotated semiological features achieved precision 0.97, recall 0.97, F1-score 0.97, and AUC 0.99, establishing an upper bound on ES/NES classification performance. The MLLMs pipeline achieved an overall mean recall of 0.71, mean accuracy of 0.58, and a mean F1-score of 0.51 for semiological feature extraction compared to expert annotations. The best performing KNN model (k=7) using MLLMs-extracted features achieved a precision of 0.77, recall of 0.76, F1-score of 0.76, and AUC of 0.76 in classifying ES versus NES; correctly identifying 68 out of 90 events.

Conclusion

We demonstrate the feasibility of using MLLMs to automatically extract clinically relevant semiological features from seizure videos and classify ES versus NES. MLLMs-based feature extraction and classification offer a promising clinically interpretable approach to aid diagnosis of epilepsy using videos.

Version published to 10.1101/2025.10.07.25337538 on medRxiv
Oct 9, 2025

Variational autoencoder for interpretable seizure onset phases detection

This article has 4 authors:
1. Isaac Capallera
2. Borja Mercadal
3. Fabrice Bartolomei
4. Giulio Ruffini
This article has no evaluationsLatest version Sep 15, 2025
Real-Time EEG-Based Epileptic Seizure Prediction Using Artificial Intelligence: A Systematic Review

This article has 4 authors:
1. Zikang Song
2. Kim Arrowsmith
3. Dion Henare
4. Mangor Pedersen
This article has no evaluationsLatest version Oct 10, 2025
Machine Learning Analysis of Routine EEG Accurately Predicts Anti-Seizure Medication Response

This article has 13 authors:
1. Peter D. Galer
2. Gregory G. Grecco
3. Fan Zhang
4. Amulya Mathur
5. Jonathan J. Halford
6. Andrew J. Cole
7. Amy S. Chappell
8. Elizabeth Garofalo
9. Steven M. Paul
10. Michael Detke
11. Jacqueline A. French
12. Ken Wang
13. Qiang Li
This article has no evaluationsLatest version Oct 24, 2025

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusion

Article activity feed

Related articles

Variational autoencoder for interpretable seizure onset phases detection

Real-Time EEG-Based Epileptic Seizure Prediction Using Artificial Intelligence: A Systematic Review

Machine Learning Analysis of Routine EEG Accurately Predicts Anti-Seizure Medication Response