Leveraging Multimodal Large Language Models to Extract Mechanistic Insights from Biomedical Visuals: A Case Study on COVID-19 and Neurodegenerative Diseases
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
The COVID-19 pandemic has intensified concerns about its long-term neurological impact, with growing evidence linking SARS-CoV-2 infection to neurodegenerative diseases (NDDs) such as Alzheimer’s (AD) and Parkinson’s (PD). Patients with these conditions not only face higher risk of severe COVID-19 outcomes but may also undergo accelerated cognitive and motor decline following infection. Proposed mechanisms—ranging from neuroinflammation and blood–brain barrier disruption to abnormal protein aggregation—closely mirror core features of neurodegenerative pathology. Yet, current knowledge is fragmented across text, figures, and pathway diagrams, hindering integration into computational models capable of uncovering systemic patterns.
Results
To address this gap, we applied GPT-4 Omni (GPT-4o), a multimodal large language model, to extract mechanistic insights from biomedical figures. Over 10,000 images were retrieved through targeted searches on COVID-19 and neurodegeneration; after automated and manual filtering, a curated subset was analyzed. GPT-4o extracted biological relationships as semantic triples, which were grouped into six mechanistic categories—including microglial activation and barrier disruption—using ontology-guided similarity and assembled into a Neo4j knowledge graph.
Accuracy was evaluated against a gold-standard dataset of expert-annotated images using BioBERT-based semantic matching. This evaluation also enabled prompt tuning, threshold optimization, and hyperparameter assessment. Results demonstrate that GPT-4o successfully recovers both established and novel mechanisms, yielding interpretable outputs that illuminate complex biological links between SARS-CoV-2 and neurodegeneration.
Conclusions
This study showcases the potential of multimodal LLMs to mine biomedical visual data at scale. By complementing text mining and integrating figure-derived knowledge, our framework advances understanding of COVID-19–related neurodegeneration and supports future translational research.