Medical pre-training and fine-tuning improve large-language-model prediction of rheumatoid-arthritis disease activity

Suguru Honda
Katsunori Ikari
Mayuko Fujisaki
Eiichi Tanaka
Masayoshi Harigai

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large-language models (LLMs) already excel at extracting clinical facts from electronic health records and drafting differential diagnoses. However, when applied to clinical support for rheumatoid arthritis (RA), despite the fact that the accuracy of disease activity prediction is directly related to the adjustment of treatment intensity, the performance of LLMs and how much they can be improved by pre-training and fine-tuning for medical use have not been fully tested. We therefore trained privacy-preserving, on-premise Llama-2 models with medical domain pre-training (Meditron) and QLoRA fine-tuning, and compared their two-year predictions of RA activity and disability with logistic regression, random forest and XGBoost. The refined LLMs surpassed conventional models on most Disease Activity Score (DAS)-based outcomes, matched them on remission tasks, retained reliable calibration, and offered small but consistent net-benefit advantages where high-disease-activity was rare, while avoiding the clinical harm observed for tabular methods in high-disability prediction. These results show that endpoint-specific, locally deployable LLMs can complement or replace established tabular models in RA management without sacrificing data privacy.

Version published to 10.21203/rs.3.rs-7159212/v1 on Research Square
Aug 8, 2025

Predicting Annotation Yield in Artificial Intelligence-Ranked Electronic Health Record Cohorts: A Regression-Based Framework for Efficient Manual Review

This article has 6 authors:
1. Assaf Landschaft
2. Leena Abdelmoity
3. Fatemeh Mohammad Alizadeh Chafjiri
4. Molly Ann Puckett
5. Jennifer Gettings
6. Tobias Loddenkemper
This article has no evaluationsLatest version Jul 23, 2025
Large Language Models Improve Cancer Survival Prediction Using Real-World Clinical Notes

This article has 20 authors:
1. Niklas Kiermeyer
2. Tim Lenfers
3. Amin Dada
4. Julian Friedrich
5. Sameh Khattab
6. Eric Knop
7. Jan Egger
8. Markus Pauly
9. Andreas Jung
10. Grégoire Montavon
11. Jens T. Siveke
12. Marcel Wiesweg
13. Stefan Kasper
14. Ulf P. Neumann
15. Frederick Klauschen
16. Sylvia Hartmann
17. Martin Schuler
18. Philipp Keyl
19. Jens Kleesiek
20. Julius Keyl
This article has no evaluationsLatest version Aug 19, 2025
CLEVER: Clinical Large Language Model Evaluationby Expert Review

This article has 4 authors:
1. Veysel Kocaman
2. Mustafa Kaya
3. Andrei Ferrer
4. David Talby
This article has no evaluationsLatest version Jul 23, 2025

Listed in

Abstract

Article activity feed

Related articles

Predicting Annotation Yield in Artificial Intelligence-Ranked Electronic Health Record Cohorts: A Regression-Based Framework for Efficient Manual Review

Large Language Models Improve Cancer Survival Prediction Using Real-World Clinical Notes

CLEVER: Clinical Large Language Model Evaluationby Expert Review