A Proof-of-Concept Large Language Model Application to Support Clinical Trial Screening in Surgical Oncology

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Introduction

Clinical trials advance the forefront of medical knowledge and rely on consistent patient accrual for success. However, patient screening for clinical trials is resource intensive. There is a need to increase the scalability of trial recruitment while maintaining or improving upon the sensitivity of the current process. We hypothesized we could use a state-of-the-art large language model (LLM), prompt engineering, and publicly available clinical trial data to predict patient eligibility for trials from clinic notes. Here, we present pilot data demonstrating the accuracy of this tool in a cohort of patients being evaluated for pancreas cancer treatment.

Methods

Patients who were screened for clinical trials at a single institution were studied. An LLM application was developed using LangChain and the GPT-4o model to assist in clinical trial screening. Deidentified patient data from clinical notes and trial eligibility criteria from ClinicalTrials.gov were used as inputs. For each patient, the model determined inclusion or exclusion with respect to selected eligibility criteria as well as nine clinical trials. Model responses were graded programmatically against a human rater standard. Time elapsed and cost for running each analysis were recorded.

Results

Of the 24 patients in the test set, 19 were eligible for at least one trial. There were 43 eligible patient-trial matches in the data set. Our model correctly predicted 39 out of 42 (90.7%) of these matches. There were 105 individual eligibility criteria evaluated per patient for a total of 2520 binary criteria. GPT-4o agreed with the raters for 2,438 out of 2,520 (96.7%) binary eligibility criteria. Sensitivity to overall trial eligibility ranged from 87.5% to 100% for 8 out of 9 trials. Specificity ranged from 73.3% to 100% over all nine trials. The median cost for screening a patient was 0.67 USD (0.63-0.74). Median time elapsed was 137.66 seconds (130.04-146.04). Median total token usage across three assistants was 112,266.5 tokens (102,982.0-122,174.2).

Conclusion

Overall, this model showed high sensitivity and specificity in using minimally processed free-text clinical notes to screen patients for appropriate clinical trials using a fraction of the time and cost of existing screening mechanisms. Results showed promise with a small cohort, and future studies are needed to assess its accuracy with a larger sample of patients and trials. This study represents the frontier of pitting of emerging large language model technology against the historically unruly terrain of the electronic medical record, suggesting that the imperfection of free-text clinical notes only slightly hinders the performance of a general-use model compared to previous performance on preprocessed data. These findings highlight that using this tool directly on clinical notes could complement human screening efforts to improve patient accrual at a low time and monetary cost.

Article activity feed