Characteristics and Early Diagnosis of Motor Neuron Disease (MND) in 67 million individuals in England: a comparative study on phenotyping models derived by AI, Knowledge Graphs and the MND Association
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Motor neuron disease (MND) is a rapidly progressive and fatal neurodegenerative condition, making early diagnosis critical for optimizing patient outcomes and care planning. Despite the existence of decade-long clinical guidelines, early diagnosis of MND remains challenging due to the lack of population-level evidence on the effectiveness of what we know and, more importantly, what we do not know. This study aims to apply advanced computational methods on the whole English population linked health data to improve MND phenotype detection and diagnosis. Additionally, we assess the impact of COVID-19 on people with MND, examining mortality trends and vaccination effects.
Methods
The nationwide linked health records of 67 million individuals in England were used for identifying MND cohorts for two periods of 2014-2019 and 2020-2021, which were analysed to describe their characteristics and derive MND period prevalence. On this routinely collected health data, we implemented the MND Association’s red flag list (MNDA guideline) as well as three AI derived phenotyping models: knowledge graph, GPT-4 and machine learning on real-world data driven approaches. These phenotyping models were used in developing prediction models for (a) diagnosing MND; and (b) predicting MND 1, 3, and 5 years before coded diagnosis. Various computational methods were used in the implementation of prediction models including logistic regression, random forest, support vector machine as well as recurrent neural networks. The effectiveness was assessed using positive predictive value, sensitivity, F1 score, Area Under the Receiver Operating Characteristic Curve (AUROC) and specificity. The Kaplan-Meier method was used for conducting the survival analysis for COVID-19 related mortality of people with MND.
Findings
Of 67,270,015 individuals, from 1st January 2014 to 31st December 2019, we identified 12,240 people with coded MND diagnosis (6 year period prevalence of 18.20 cases per 100,000 people). For MND screening (task b), the MNDA guideline showed poor to moderate discrimination (AUROC: 0.62-0.63) while combining the guideline with AI derived ones (the ensemble) could improve it significantly (0.66-0.68). For the diagnosing task, the guideline had good discrimination (0.70), but lower overall performance (F1 score: 0.44) and the real-world data driven approach (hypothesis free) achieved much better results (AUROC: 0.78; F1: 0.55). As for COVID-19 mortality risk, compared to matched controls, MND patients had elevated risk (HR=2.97, 95% CI: 1.97-4.48) in wave 1, while fully vaccinated individuals in wave 2 demonstrated non-statistically significant higher risk (HR=1.27, 95% CI: 0.69-2.34).
Interpretation
This population scale study showed the MNDA guideline did not show very effective power in either screening or diagnosing MND probably due to the missing of predictive phenotypes available in routine care. The hypothesis-free phenotyping approach, applying AI on real-world datasets for deriving predictive phenotypes, demonstrated a great utility by identifying 13 novel phenotypes from 7 ICD-10 chapters that had significant effects in predicting MND.
Funding
British Heart Foundation Data Science Centre, led by Health Data Research UK. National Institute for Health and Care Research (NIHR) Dementia Biomedical Research Unit at South London and Maudsley NHS Foundation Trust and King’s College London.