Scoring Physician Risk Communication in Prostate Cancer Using Large Language Models

Guillermo Lopez-Garcia
Dongfang Xu
Michael Luu
Renning Zheng
Timothy J. Daskivich
Graciela Gonzalez-Hernandez

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Effective risk communication is essential to shared decision-making in prostate cancer care. However, the quality of physician communication of key tradeoffs varies widely in real-world consultations. Manual evaluation of communication is labor-intensive and not scalable. We present a structured, rubric-based framework that uses large language models (LLMs) to automatically score the quality of risk communication in prostate cancer consultations. Using transcripts from 20 clinical visits, we curated and annotated 487 physician-spoken sentences that referenced five decision-making domains: cancer prognosis, life expectancy, and three treatment side effects (erectile dysfunction, incontinence, and irritative urinary symptoms). Each sentence was assigned a score from 0 to 5 based on the precision and patient-specificity of communicated risk, using a validated scoring rubric. We modeled this task as five multiclass classification problems and evaluated both fine-tuned transformer baselines and GPT-4o with rubric-based and chain-of-thought (CoT) prompting. Our best performing approach, which combined rubric-based CoT prompting with few-shot learning, achieved micro averaged F1 scores between 85.0 and 92.0 across domains, outperforming supervised baselines and matching inter-annotator agreement. These findings establish a scalable foundation for AI-driven evaluation of physician–patient communication in oncology and beyond.

Version published to 10.1101/2025.08.07.25333034 on medRxiv
Aug 11, 2025

Large Language Models Improve Cancer Survival Prediction Using Real-World Clinical Notes

This article has 20 authors:
1. Niklas Kiermeyer
2. Tim Lenfers
3. Amin Dada
4. Julian Friedrich
5. Sameh Khattab
6. Eric Knop
7. Jan Egger
8. Markus Pauly
9. Andreas Jung
10. Grégoire Montavon
11. Jens T. Siveke
12. Marcel Wiesweg
13. Stefan Kasper
14. Ulf P. Neumann
15. Frederick Klauschen
16. Sylvia Hartmann
17. Martin Schuler
18. Philipp Keyl
19. Jens Kleesiek
20. Julius Keyl
This article has no evaluationsLatest version Aug 19, 2025
From Clinical Judgment to Large Language Models: Benchmarking Predictive Approaches for Unplanned Hospital Admissions

This article has 2 authors:
1. Bernardo Neves
2. Mário J. Silva
This article has no evaluationsLatest version Sep 12, 2025
An explainable language model predicts survival from medical reports in oncology

This article has 13 authors:
1. Clément Piat
2. Quentin Blampey
3. Alexandre Joutard
4. Mohamed Aymen Qabel
5. Théo Di Piazza
6. Ugo Benassayag
7. Raphael Vienne
8. Raphael Reme
9. Daphné Morel
10. Maxime Choffe
11. Eric Deutsch
12. Jean-Yves Blay
13. Loic Verlingue
This article has no evaluationsLatest version Aug 27, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Large Language Models Improve Cancer Survival Prediction Using Real-World Clinical Notes

From Clinical Judgment to Large Language Models: Benchmarking Predictive Approaches for Unplanned Hospital Admissions

An explainable language model predicts survival from medical reports in oncology