Multimodal Machine Learning for Diagnosis of Multiple Sclerosis Using Optical Coherence Tomography in Pediatric Cases
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background and Objectives
Identifying MS in children early and distinguishing it from other neuroinflammatory conditions of childhood is critical, as early therapeutic intervention can improve outcomes. The anterior visual pathway has been demonstrated to be of central importance in diagnostic considerations for MS and has recently been identified as a fifth topography in the McDonald Diagnostic Criteria for MS. Optical coherence tomography (OCT) provides high-resolution retinal imaging and reflects the structural integrity of the retinal nerve fiber and ganglion cell inner plexiform layers. Whether multimodal deep learning models can use OCT alone to diagnose pediatric MS (POMS) is unknown.
Methods
We analyzed 3D OCT scans collected prospectively through the Neuroinflammatory Registry of the Hospital for Sick Children (REB# 1000005356 ). Raw macular and optic nerve head images, and 52 automatically segmented features were included. We evaluated three classification approaches: (1) deep learning models (e.g. ResNet, DenseNet) for representation learning followed by classical ML classifiers, (2) ML models trained on OCT-derived features, and (3) multimodal models combining both via early and late fusion.
Results
Scans from individuals with POMS (onset 16.0 ± 3.1 years, 51.0% F; 211 scans) and 29 children with non-inflammatory neurological conditions ( 13.1 ± 4.0 years, 69.0% F, 52 scans) were included. The early fusion model achieved the highest performance (AUC: 0.87 , F1: 0.87 , Accuracy: 90% ), outperforming both unimodal and late fusion models. The best unimodal feature-based model (SVC) yielded an AUC of 0.84 , F1 of 0.85 and an accuracy of 85% , while the best image-based model (ResNet101 with Random Forest) achieved an AUC of 0.87 , F1 of 0.79 , and accuracy of 84% . Late fusion underperformed, reaching 82% accuracy but failing in the minority class.
Discussion
Multimodal learning with early fusion significantly enhances diagnostic performance by combining spatial retinal information with clinically relevant structural features. This approach captures complementary patterns associated with MS pathology and shows promise as an AI-driven tool to support pediatric neuroinflammatory diagnosis.