An explainable language model predicts survival from medical reports in oncology
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Prognosis estimation is key to personalizing oncology care, yet current models rely on limited and often incomplete clinical and biological data. We designed a solution adapted to any kind of cancer (type and stage) based on narrative electronic medical reports, i.e. the basic working material for oncologists. We used 2.3M medical documents (corresponding to 36,123 patients for whom we had the date of death) to train, validate and test three different approaches. The best survival prediction performances were obtained by taking into account the medical history with sequential reports. This model (K-memBERT-T2) reached a Pearson correlation of 0.655 on the test cohort, 0.621 on a large external cohort of 143k documents (17,633 additional patients) (p-values:<10 − 5 ) and a concordance index of 0.766 when adding 7082 alive and censored patients in this external test cohort. The 3-month binary survival predictions achieved an AUC of 0.852 on the test cohort and 0.875 on the external dataset. The model related to survival duration better than the PS, independently of its mention in texts. We present a non-invasive and interpretable method paving the way for an easy implementation in French-speaking centers.