Medical pre-training and fine-tuning improve large-language-model prediction of rheumatoid-arthritis disease activity
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large-language models (LLMs) already excel at extracting clinical facts from electronic health records and drafting differential diagnoses. However, when applied to clinical support for rheumatoid arthritis (RA), despite the fact that the accuracy of disease activity prediction is directly related to the adjustment of treatment intensity, the performance of LLMs and how much they can be improved by pre-training and fine-tuning for medical use have not been fully tested. We therefore trained privacy-preserving, on-premise Llama-2 models with medical domain pre-training (Meditron) and QLoRA fine-tuning, and compared their two-year predictions of RA activity and disability with logistic regression, random forest and XGBoost. The refined LLMs surpassed conventional models on most Disease Activity Score (DAS)-based outcomes, matched them on remission tasks, retained reliable calibration, and offered small but consistent net-benefit advantages where high-disease-activity was rare, while avoiding the clinical harm observed for tabular methods in high-disability prediction. These results show that endpoint-specific, locally deployable LLMs can complement or replace established tabular models in RA management without sacrificing data privacy.