Clinical Evaluation of a PACS-Integrated Deep Learning Tool for Intracranial Hemorrhage Severity Assessment: Comparison with LLM-Based Report Interpretation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background : Radiology reports are the standard method for communicating imaging findings in intracranial hemorrhage (ICH), yet it remains unclear whether narrative descriptions accurately reflect overall hemorrhage burden compared with objective quantitative imaging metrics. We aimed to compare the clinical associations of an automated deep learning–based severity index (CEREBLEED) with severity assessments extracted from radiology reports using large language models (LLMs). Methods : This prospective single-center study enrolled consecutive adult patients with ICH on non-contrast CT between June and December 2025. CEREBLEED, integrated into the institutional PACS, automatically segmented hemorrhages and computed a composite Severity Index incorporating volume, location, intraventricular extension, and midline shift. Three LLMs (Claude, GPT-4, Gemini) independently extracted severity categories (mild/moderate/severe) from radiology reports using standardized prompts. Outcomes included Glasgow Coma Scale (GCS) at admission, emergent surgical intervention, and Glasgow Outcome Scale Extended (GOSE) at discharge. Agreement was assessed with Cohen’s κ, correlations with Spearman’s ρ, discriminative performance with AUC and reclassification indices (NRI, IDI), and prognostic value with ordinal logistic regression and likelihood ratio testing. Results : Of 186 patients analyzed (mean age 70.5 years, 52.7% male), 56.5% required ICU admission, 17.2% underwent emergent surgery, and 40.3% had unfavorable outcome (GOSE 1–4). Agreement between CEREBLEED and LLM-derived severity was moderate (κ=0.51–0.52), while inter-LLM agreement was substantial (κ=0.77–0.82), suggesting systematic differences in information content rather than extraction variability. CEREBLEED correlated more strongly with GOSE (ρ=−0.715) than LLM-derived severity (ρ=−0.569 to −0.628). For surgical intervention prediction, CEREBLEED achieved superior discrimination (AUC 0.843 vs 0.733–0.754) with significant reclassification improvement over all LLMs (NRI 0.26–0.35, all p<0.01). In univariable ordinal regression, CEREBLEED showed the best model fit for GOSE prediction (pseudo-R²=0.194 vs 0.123–0.164 for LLMs). In multivariable analysis adjusted for age and GCS, CEREBLEED remained independently prognostic (OR 0.30 per 1-SD, p<0.001). Likelihood ratio testing confirmed significant incremental value of CEREBLEED beyond age, GCS, and hemorrhage volume (χ²=18.97, p<0.001). Conclusions : Automated severity quantification with CEREBLEED showed stronger prognostic performance for clinical outcomes than severity estimates derived from radiology-report interpretation by LLMs. This likely reflects the added value of objective, continuous imaging biomarkers compared with information available in narrative reports. Quantitative imaging tools may therefore complement routine radiological assessment, supporting more consistent severity communication and clinical decision-making in ICH care.

Article activity feed