Performance assessment of large language models in cancer staging: Comparative analysis of Mistral models

Roman Rouzier
Valentin Harter
Ethan Rouzier
Victor Ferment
Simon Gruau
Benoit Andre
Cécile Saumon-Sud
Lawrence Nadin
Aurélien Corroyer-Dulmont
Nicolas Vigneron

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Cancer staging plays a critical role in treatment planning and prognosis but is often embedded in unstructured clinical narratives. To automate the extraction and structuring of staging data, large language models (LLMs) have emerged as a promising approach. However, their performance in real-world oncology settings has yet to be systematically evaluated. Herein, we analysed 1000 oncological summaries from patients receiving treatment for breast cancer between 2019 and 2020 at the François Baclesse Comprehensive Cancer Centre, France. Five Mistral artificial intelligence–based LLMs were evaluated (i.e. Small, Medium, Large, Magistral and Mistral:latest) for their ability to derive the cancer stage and identify staging elements. Larger models outperformed their smaller counterparts in staging accuracy and reproducibility (kappa > 0.95 for Mistral Large and Medium). Mistral Large achieved the highest accuracy in deriving the cancer stage (93.0%), surpassing the original clinical documentation in several cases. The LLMs consistently performed better in deriving the cancer stage when working through tumour size, nodal status and metastatic components compared to when they were directly requested stage data. The top-performing models had a test–retest reliability exceeding 97%, while smaller models and locally deployed versions lacked sufficient robustness, particularly in handling unit conversions and complex staging rules. The structured, stepwise use of LLMs that emulates clinician reasoning offers a more efficient, transparent and reproducible approach to cancer staging, and the study findings support LLM integration into digital oncology workflows.

Version published to 10.1101/2025.09.10.25335246 on medRxiv
Sep 12, 2025

Comparison of Large Language Model and Manual Review for Clinical Data Curation in Breast Cancer

This article has 12 authors:
1. Young-Joon Kang
2. Hocheol Lee
3. Jae Pak Yi
4. Hyobin Kim
5. Chang Ik Yoon
6. Jong Min Baek
7. Yong-seok Kim
8. Ye Won Jeon
9. Jiyoung Rhu
10. Su Hyun Lim
11. Hoon Choi
12. Se Jeong Oh
This article has no evaluationsLatest version Sep 1, 2025
An explainable language model predicts survival from medical reports in oncology

This article has 13 authors:
1. Clément Piat
2. Quentin Blampey
3. Alexandre Joutard
4. Mohamed Aymen Qabel
5. Théo Di Piazza
6. Ugo Benassayag
7. Raphael Vienne
8. Raphael Reme
9. Daphné Morel
10. Maxime Choffe
11. Eric Deutsch
12. Jean-Yves Blay
13. Loic Verlingue
This article has no evaluationsLatest version Aug 27, 2025
Navigating Diagnostic Challenges in Lymphoma: Patient Narrative Analysis Using Large Language Models

This article has 1 author:
1. Feng He
This article has no evaluationsLatest version Sep 8, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Comparison of Large Language Model and Manual Review for Clinical Data Curation in Breast Cancer

An explainable language model predicts survival from medical reports in oncology

Navigating Diagnostic Challenges in Lymphoma: Patient Narrative Analysis Using Large Language Models