An explainable language model predicts survival from medical reports in oncology

Clément Piat
Quentin Blampey
Alexandre Joutard
Mohamed Aymen Qabel
Théo Di Piazza
Ugo Benassayag
Raphael Vienne
Raphael Reme
Daphné Morel
Maxime Choffe
Eric Deutsch
Jean-Yves Blay
Loic Verlingue

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Prognosis estimation is key to personalizing oncology care, yet current models rely on limited and often incomplete clinical and biological data. We designed a solution adapted to any kind of cancer (type and stage) based on narrative electronic medical reports, i.e. the basic working material for oncologists. We used 2.3M medical documents (corresponding to 36,123 patients for whom we had the date of death) to train, validate and test three different approaches. The best survival prediction performances were obtained by taking into account the medical history with sequential reports. This model (K-memBERT-T2) reached a Pearson correlation of 0.655 on the test cohort, 0.621 on a large external cohort of 143k documents (17,633 additional patients) (p-values:<10 ^− 5 ) and a concordance index of 0.766 when adding 7082 alive and censored patients in this external test cohort. The 3-month binary survival predictions achieved an AUC of 0.852 on the test cohort and 0.875 on the external dataset. The model related to survival duration better than the PS, independently of its mention in texts. We present a non-invasive and interpretable method paving the way for an easy implementation in French-speaking centers.

Version published to 10.21203/rs.3.rs-7121466/v1 on Research Square
Aug 27, 2025

Large Language Models Improve Cancer Survival Prediction Using Real-World Clinical Notes

This article has 20 authors:
1. Niklas Kiermeyer
2. Tim Lenfers
3. Amin Dada
4. Julian Friedrich
5. Sameh Khattab
6. Eric Knop
7. Jan Egger
8. Markus Pauly
9. Andreas Jung
10. Grégoire Montavon
11. Jens T. Siveke
12. Marcel Wiesweg
13. Stefan Kasper
14. Ulf P. Neumann
15. Frederick Klauschen
16. Sylvia Hartmann
17. Martin Schuler
18. Philipp Keyl
19. Jens Kleesiek
20. Julius Keyl
This article has no evaluationsLatest version Aug 19, 2025
Comparison of Large Language Model and Manual Review for Clinical Data Curation in Breast Cancer

This article has 12 authors:
1. Young-Joon Kang
2. Hocheol Lee
3. Jae Pak Yi
4. Hyobin Kim
5. Chang Ik Yoon
6. Jong Min Baek
7. Yong-seok Kim
8. Ye Won Jeon
9. Jiyoung Rhu
10. Su Hyun Lim
11. Hoon Choi
12. Se Jeong Oh
This article has no evaluationsLatest version Sep 1, 2025
Performance assessment of large language models in cancer staging: Comparative analysis of Mistral models

This article has 10 authors:
1. Roman Rouzier
2. Valentin Harter
3. Ethan Rouzier
4. Victor Ferment
5. Simon Gruau
6. Benoit Andre
7. Cécile Saumon-Sud
8. Lawrence Nadin
9. Aurélien Corroyer-Dulmont
10. Nicolas Vigneron
This article has no evaluationsLatest version Sep 12, 2025

Listed in

Abstract

Article activity feed

Related articles

Large Language Models Improve Cancer Survival Prediction Using Real-World Clinical Notes

Comparison of Large Language Model and Manual Review for Clinical Data Curation in Breast Cancer

Performance assessment of large language models in cancer staging: Comparative analysis of Mistral models