Prompt-Orchestrated Large Language Models for Clinical Information Extraction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multidisciplinary Tumor Boards (MTBs) are a key decision step within complex oncology disease pathways , where many clinical specialties contribute to defining personalized and most effective patient treatments. Unfortunately, they usually are rich and highly unstructured clinical narratives, that can be hardly used as structured data for retrospective analyses and to feed decision-support tools. In this study, we propose a prompt-orchestrated information extraction framework leveraging locally deployed Large Language Models (LLMs) to digitalize real-world Italian gynecologic oncology MTB reports. Clinical documents were first segmented into semantically coherent macroareas—including anamnesis, surgery, imaging, therapy , and MTB decisions—and each segment was processed with domain-specific prompts following three tailored strategies: one-shot prompting with post-processing, one-shot prompting with semantic mapping, and chained prompting for multi-step decision extraction. Extraction outputs were evaluated according to a large-scale LLM-as-a-judge paradigm, where three high-capacity LLMs independently compared model predictions with expert-annotated gold standards, and correctness was determined through majority voting. Across more than fifty clinical variables, Gemma 3 (12B) and GPT-OSS (20B) achieved the highest extraction accuracy values, with the best results in anamnesis macroarea, with performance of 0.953 and 0.956. Judge-level analyses demonstrated strong agreement across evaluators, confirming the robustness of the evaluation pipeline. This work illustrates the feasibility of prompt-engineered LLMs for structured reconstruction of MTB reports and identifies the domains where targeted improvements—such as enhanced mapping ontologies, refined prompting, or future instruction tuning—may be required. The extracted information forms a structured foundation for clinical digital tools and may support the development of future AI-driven decision support systems in oncology.