The ethics of simplification: Balancing patient autonomy, comprehension, and accuracy in AI-generated radiology reports

Hong-Seon Lee
Seung-Hyun Song
Chaeri Park
Jeongrok Seo
Won Hwa Kim
Jaeil Kim
Sungjun Kim
Kyunghwa Han
Young Han Lee

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Large language models (LLMs) such as GPT-4 are increasingly used to simplify radiology reports and improve patient comprehension. However, excessive simplification may undermine informed consent and autonomy by compromising clinical accuracy. This study investigates the ethical implications of readability thresholds in AI-generated radiology reports, identifying the minimum reading level at which clinical accuracy is preserved. Methods We retrospectively analyzed 500 computed tomography and magnetic resonance imaging reports from a tertiary hospital. Each report was transformed into 17 versions (reading grade levels 1–17) using GPT-4 Turbo. Readability metrics and word counts were calculated for each version. Clinical accuracy was evaluated using radiologist assessments and PubMed-BERTScore. We identified the first grade level at which a statistically significant decline in accuracy occurred, determining the lowest level that preserved both accuracy and readability. We further assessed potential clinical consequences in reports simplified to the 7th-grade level. Results Readability scores showed strong correlation with prompted reading levels (r = 0.80–0.84). Accuracy remained stable across grades 13–11 but declined significantly below grade 11. At the 7th-grade level, 20% of reports contained inaccuracies with potential to alter patient management, primarily due to omission, incorrect conversion, or inappropriate generalization. The 11th-grade level emerged as the current lower bound for preserving accuracy in LLM-generated radiology reports. Conclusions Our findings highlight an ethical tension between improving readability and maintaining clinical accuracy. While 7th-grade readability remains an ethical ideal, current AI tools cannot reliably produce accurate reports below the 11th-grade level. Ethical implementation of AI-generated reporting should include layered communication strategies and model transparency to safeguard patient autonomy and comprehension.

Version published to 10.21203/rs.3.rs-6365487/v1 on Research Square
May 7, 2025

A Unified Platform for Radiology Report Generation and Clinician-Centered AI Evaluation

This article has 9 authors:
1. Zhuoqi Ma
2. Xinye Yang
3. Zach Atalay
4. Andrew Yang
5. Scott Collins
6. Harrison Bai
7. Michael Bernstein
8. Grayson Baird
9. Zhicheng Jiao
This article has no evaluationsLatest version Jul 8, 2025
LLM Reasoning Does Not Protect Against Clinical Cognitive Biases - An Evaluation Using BiasMedQA

This article has 12 authors:
1. Su Hwan Kim
2. Sebastian Ziegelmayer
3. Felix Busch
4. Christian J. Mertens
5. Matthias Keicher
6. Lisa C. Adams
7. Keno K. Bressem
8. Rickmer Braren
9. Marcus R. Makowski
10. Jan S. Kirschke
11. Dennis M. Hedderich
12. Benedikt Wiestler
This article has no evaluationsLatest version Jun 23, 2025
Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination

This article has 10 authors:
1. Yuichiro Hirano
2. Soichiro Miki
3. Yosuke Yamagishi
4. Shouhei Hanaoka
5. Takahiro Nakao
6. Tomohiro Kikuchi
7. Yuta Nakamura
8. Yukihiro Nomura
9. Takeharu Yoshikawa
10. Osamu Abe
This article has no evaluationsLatest version Jun 23, 2025

Listed in

Abstract

Article activity feed

Related articles

A Unified Platform for Radiology Report Generation and Clinician-Centered AI Evaluation

LLM Reasoning Does Not Protect Against Clinical Cognitive Biases - An Evaluation Using BiasMedQA

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination