Patient2Sentence: Semantic Compression of Clinical Trial Eligibility Using Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Clinical decision-making generates vast unstructured data that remain underexploited for trial recruitment. We present Patient2Sentence (P2S), a framework that transforms electronic health records into language-based representations to enable automated eligibility screening for oncology trials. Using synthetic patient records derived from three completed breast cancer studies (KATHERINE, MONARCH, and OLYMPIA), we created 25 virtual patients per trial and compared eligibility classification between full records and their condensed “patient sentences.” P2S achieved a mean concordance of 93.2% (95% CI 89.8–96.6%; Cohen’s κ = 0.91) between sentence-level and full-record decisions while reducing token usage by ~67%. This compression preserved semantic fidelity and reduced computational cost approximately threefold. By encoding heterogeneous clinical data into compact natural-language form, P2S provides a reproducible and efficient approach to patient-trial matching, with potential applications across diverse clinical decision-support systems.