OncoCITE: Multimodal Multi-Agent Reconstruction of Clinical Oncology Knowledge Bases from Scientific Literature
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Precision oncology depends on curated variant knowledge bases, yet manual curation creates latency, coverage gaps and errors, as well as inconsistent and incomplete annotation. Through systematic analysis of the 11,312 items in the CIViC database, we identified structural bottlenecks including long-tail literature distribution, resistance underrepresentation, and prolonged update delays. We developed OncoCITE, a multi-agent AI system for source-grounded extraction and harmonization of clinical genomic evidence from full-text publications. At database scale, we enriched all CIViC evidence items with ontology-standardized identifiers, achieving 83.12% item-level resolution. End-to-end extraction was validated in a disease-specific corpus using a three-way framework in which neither human curation nor AI output was treated as ground truth: the system recovered 84% of valid curated evidence, identified additional high-precision findings, and detected discrepancies in 24.2% of ground-truth items. Prospective application to emerging immunotherapy literature demonstrates real-time evidence synthesis. OncoCITE provides an auditable and open-source framework for scalable precision oncology knowledge curation.