Clinically reported covert cerebrovascular disease and risk of neurological disease: a whole-population cohort of 395,273 people using natural language processing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Importance

Understanding the relevance of covert cerebrovascular disease (CCD) for later health will allow clinicians to more effectively monitor and target interventions.

Objective

To examine the association between clinically reported CCD, measured using natural language processing (NLP), and subsequent disease risk.

Design, Setting and Participants

We conducted a retrospective e-cohort study using linked health record data. From all people with clinical brain imaging in Scotland from 2010 to 2018, we selected people with no prior hospitalisation for neurological disease. The data were analysed from March 2024 to June 2025.

Exposure

Four phenotypes were identified with NLP of imaging reports: white matter hypoattenuation or hyperintensities (WMH), lacunes, cortical infarcts and cerebral atrophy.

Main outcomes and measures

Hazard ratios (aHR) for stroke, dementia, and Parkinson’s disease (conditions previously associated with CCD), epilepsy (a brain-based control condition) and colorectal cancer (a non-brain control condition), adjusted for age, sex, deprivation, region, scan modality, and pre-scan healthcare, were calculated for each phenotype.

Results

From 395,273 people with brain imaging and no history of neurological disease, 145,978 (37%) had ≥1 phenotype. For each phenotype, the aHR of any stroke was: WMH 1.4 (95%CI: 1.3–1.4), lacunes 1.6 (1.5–1.6), cortical infarct 1.7 (1.6–1.8), and cerebral atrophy 1.1 (1.0–1.1). The aHR of any dementia was: WMH, 1.3 (1.3–1.3), lacunes, 1.0 (0.9–1.0), cortical infarct 1.1 (1.0–1.1) and cerebral atrophy 1.7 (1.7–1.7). The aHR of Parkinson’s disease was, in people with a report of: WMH 1.1 (1.0–1.2), lacunes 1.1 (0.9–1.2), cortical infarct 0.7 (0.6–0.9) and cerebral atrophy 1.4 (1.3–1.5). The aHRs between CCD phenotypes and epilepsy and colorectal cancer overlapped the null.

Conclusions and Relevance

NLP identified CCD and atrophy phenotypes from routine clinical image reports, and these had important associations with future stroke, dementia and Parkinson’s disease. Prevention of neurological disease in people with CCD should be a priority for healthcare providers and policymakers.

Key Points

Question

Are measures of Covert Cerebrovascular Disease (CCD) associated with the risk of subsequent disease (stroke, dementia, Parkinson’s disease, epilepsy, and colorectal cancer)?

Findings

This study used a validated NLP algorithm to identify CCD (white matter hypoattenuation/hyperintensities, lacunes, cortical infarcts) and cerebral atrophy from both MRI and computed tomography (CT) imaging reports generated during routine healthcare in >395K people in Scotland. In adjusted models, we demonstrate higher risk of dementia (particularly Alzheimer’s disease) in people with atrophy, and higher risk of stroke in people with cortical infarcts. However, associations with an age-associated control outcome (colorectal cancer) were neutral, supporting a causal relationship. It also highlights differential associations between cerebral atrophy and dementia and cortical infarcts and stroke risk.

Meaning

CCD or atrophy on brain imaging reports in routine clinical practice is associated with a higher risk of stroke or dementia. Evidence is needed to support treatment strategies to reduce this risk. NLP can identify these important, otherwise uncoded, disease phenotypes, allowing research at scale into imaging-based biomarkers of dementia and stroke.

Article activity feed