Prompt Architecture as a High-Impact Design Factor in Expert-Rated Clinical Documentation Quality: A Controlled Comparative Study in Inpatient Rehabilitation

Idoia Eceizabarrena-Matxinandiarena
Emilio-Javier Frutos-Reoyo
José Ignacio Guerrero-Rojas
Clara Vidal-Millet
Pedro Ignacio Tejada Ezquerro
Elena Roldan-Arcelus
Irene De-Torres
Judith Sanchez-Raya
Lourdes Gil-Fraguas
María Hernandez-Manada
Carolina de Miguel-Benadiba
Josep Maria Monguet-Fierro
Alejandro Trejo-Omeñaca
Michelle Catta-Preta
Astrid Teixeira-Taborda
Natalia Álvarez-Bandrés
Raquel Cutillas-Ruiz
Helena Bascuñana-Ambrós

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly explored for clinical documentation support, yet the influence of prompting architecture on documentation quality in complex longitudinal contexts remains poorly characterized. This controlled retrospective methodological study evaluated three prompting strategies—Single Prompt (SP), Section-Based Prompt (SBP), and Section-Based Prompt with Writing Refinement (SBP+W)—for generating inpatient rehabilitation discharge reports using OpenAI large language model (GPT-5.2). Twenty anonymized rehabilitation cases involving prolonged hospital stays and multidimensional func-tional documentation were processed under standardized model conditions. AI-generated reports were compared with human-authored summaries. Two blinded board-certified rehabilitation physicians in-dependently evaluated outputs using a structured 4-point ordinal scale assessing structural integrity, clinical coherence, completeness, and readability. Inter-rater reliability was estimated with quadratic weighted Cohen’s kappa and bootstrap confidence intervals. Group differences were analyzed using non-parametric testing and exploratory multivariable modeling. All LLM prompting strategies achieved significantly higher expert-rated quality scores than hu-man-authored reports (p < 0.01). SBP demonstrated the highest median performance and strongest regression effect, although differences among LLM-based strategies were not statistically significant after correction. Prompting strategy explained more variability in expert ratings than case-level factors. Structured section-based prompting may represent a practical design lever for improving perceived quality in AI-assisted clinical documentation workflows. Keywords: artificial intelligence; clinical documentation; discharge reports; large language models; medical writing; prompt architecture; prompt engineering; rehabilitation medicine.

Version published to 10.20944/preprints202604.0054.v1
Apr 1, 2026

Impact of Query Language on Large Language Model Performance in Dental Trauma Management: A Comparative Evaluation of ChatGPT, Gemini, and Claude

This article has 2 authors:
1. Hasan Öz
2. Mehmet Dundar
This article has no evaluationsLatest version Feb 20, 2026
The Documentation Paradox: Quantifying Administrative Burden in Accredited Medical Laboratories—A Fifteen-Year Longitudinal Study from India

This article has 1 author:
1. Swapan Samanta
This article has no evaluationsLatest version Mar 10, 2026
Diagnostic Performance and Cost-Efficiency of Large Language Models in Secondary Hypertension: A Blinded Comparative Study

This article has 4 authors:
1. Asena Gökçay Canpolat
2. Özge Baş Aksu
3. Rıfat Emral
4. Uğur Canpolat
This article has no evaluationsLatest version Mar 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Impact of Query Language on Large Language Model Performance in Dental Trauma Management: A Comparative Evaluation of ChatGPT, Gemini, and Claude

The Documentation Paradox: Quantifying Administrative Burden in Accredited Medical Laboratories—A Fifteen-Year Longitudinal Study from India

Diagnostic Performance and Cost-Efficiency of Large Language Models in Secondary Hypertension: A Blinded Comparative Study