Language Model Applications for Early Diagnosis of Childhood Epilepsy

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective

Accurate and timely epilepsy diagnosis is crucial to reduce delayed or unnecessary treatment. While language serves as an indispensable source of information for diagnosing epilepsy, its computational analysis remains relatively unexplored. This study assessed – and compared – the diagnostic value of different language model applications in extracting information and identifying overlooked language patterns from first-visit documentation to improve the early diagnosis of childhood epilepsy.

Methods

We analyzed 1,561 patient letters from two independent first seizure clinics. The dataset was divided into training and test sets to evaluate performance and generalizability. We employed two approaches: an established Naïve Bayes model as a natural language processing technique, and a sentence-embedding model based on the Bidirectional Encoder Representations from Transformers (BERT)-architecture. Both models analyzed anamnesis data only. Within the training sets we identified predictive features, consisting of keywords indicative of ‘epilepsy’ or ‘no epilepsy’. Model outputs were compared to the clinician’s final diagnosis (gold standard) after follow-up. We computed accuracy, sensitivity, and specificity for both models.

Results

The Naïve Bayes model achieved an accuracy of 0.73 (95% CI: 0.68-0.78), with a sensitivity of 0.79 (95% CI: 0.74-0.85) and a specificity of 0.62 (95% CI: 0.52-0.72). The sentence-embedding model demonstrated comparable performance with an accuracy of 0.74 (95% CI: 0.68-0.79), sensitivity of 0.74 (95% CI: 0.68-0.80), and specificity of 0.73 (95% CI: 0.61-0.84).

Conclusion

Both models demonstrated relatively good performance in diagnosing childhood epilepsy solely based on first-visit patient anamnesis text. Notably, the more advanced sentence-embedding model showed no significant improvement over the computationally simpler Naïve Bayes model. This suggests that modeling of anamnesis data does depend on word order for this particular classification task. Further refinement and exploration of language models and computational linguistic approaches are necessary to enhance diagnostic accuracy in clinical practice.

Article activity feed