Automated Seizure Classification Using Multimodal Large Language Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective

Accurately distinguishing between epileptic seizures (ES) and nonepileptic seizures (NES) is a significant clinical challenge that typically requires resource-intensive inpatient video-EEG monitoring. Here, we developed a novel Multimodal Large Language Models (MLLMs)-based method for automated extraction of semiological features from videos of seizure events, and subsequently, classified the events as ES or NES.

Methods

90 videos of ES and NES events from 29 patients were obtained from an epilepsy monitoring unit at a large academic hospital. Events were labeled as ES or NES based on expert evaluation of video-EEG recordings and simultaneously annotated with 24 clinically relevant semiological features. We implemented a MLLMs framework that integrates open-source vision-language models (VLMs) and audio-language models (ALMs) to analyze the videos and associated audio tracks and automatically extract these 24 features. The performance of the MLLMs-based feature extraction was evaluated against expert annotations. These features were subsequently used to train several classifiers including K-Nearest Neighbors (KNN), XGBoost, and Deep Factorization Machine, to differentiate ES from NES. Model performance was evaluated using leave-one-patient-out (LOPO) cross-validation.

Results

Using KNN, expert-annotated semiological features achieved precision 0.97, recall 0.97, F1-score 0.97, and AUC 0.99, establishing an upper bound on ES/NES classification performance. The MLLMs pipeline achieved an overall mean recall of 0.71, mean accuracy of 0.58, and a mean F1-score of 0.51 for semiological feature extraction compared to expert annotations. The best performing KNN model (k=7) using MLLMs-extracted features achieved a precision of 0.77, recall of 0.76, F1-score of 0.76, and AUC of 0.76 in classifying ES versus NES; correctly identifying 68 out of 90 events.

Conclusion

We demonstrate the feasibility of using MLLMs to automatically extract clinically relevant semiological features from seizure videos and classify ES versus NES. MLLMs-based feature extraction and classification offer a promising clinically interpretable approach to aid diagnosis of epilepsy using videos.

Article activity feed