A large dataset of brain imaging linked to health systems data: the curation and access to a whole system national cohort from NHS Scotland

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present the design and implementation of a data curation framework to generate a large-scale clinical brain imaging dataset suitable for artificial intelligence (AI) enabled image analysis, accessible through the Brain Health Data (BHD) initiative. The raw data accessible through the BHD includes approximately 417K magnetic resonance imaging (MRI) and 846K computerized tomography (CT) head scans, linked electronic health records (EHRs), and associated free-text imaging reports from clinical practice between 2010 and 2018 in Scotland, totally exceeding 185 TB storage of brain imaging and associated data.

We present the work curating the dataset and the strengths of the BHD, including clinical relevance thanks to its unprecedented scale, population-wide representativeness of a national free-at-the-point-of-service healthcare, long-term follow-up to neurodegenerative disease, and real-world variability. We discuss challenges and lessons learnt in developing the framework to curate the data initially available, including the time needed to obtain relevant permissions, the need for easily accessible, secure, responsive and affordable computational environments, the variability and inconsistencies of clinical data and records, and the challenge of extracting linked clinical data and images at scale, among others. This resource will be crucial for clinical research, fostering the development of personalized medicine approaches, and fast-tracking the implementation of AI models in clinical workflows. We encourage the use of the BHD data through a streamlined application to the Data Research and Innovation Service (eDRIS) of Public Health Scotland (PHS).

Article activity feed