Using Large Language Models to Assemble, Audit, and Prioritize the Therapeutic Landscape
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present an AI-assisted pipeline for disease-specific drug landscape analysis. Given a disease name, the system assembles a comprehensive, evidence-based view of therapeutic assets by integrating structured sources (such as ClinicalTrials.gov and ChEMBL) and unstructured sources (such as publications, press releases, and patents). Large language models are used in a constrained, auditable mode to normalize drug aliases, resolve drug target/mechanism of action annotations, and harmonize program status across records. The output is a disease-centric map that spans preclinical assets, not-yet-approved assets (both active and discontinued/shelved), and FDA-approved drugs suitable for re-purposing. Assets are ranked using interpretable, evidence-based scoring heuristics that combine trial volume and clinical phase, endpoint outcomes, biomarker support, recency of activity, and regulatory designations, along with penalties for safety signals and non-pharmaceutical interventions, as well as proportional adjustments for operational versus scientific discontinuations. Case studies in Alzheimers disease, pancreatic cancer, and cystic fibrosis demonstrate generality, coverage, and discrimination across mechanisms and stages. This framework provides a transparent method to assemble and prioritize the therapeutic landscape for any disease, unifying disparate data into a coherent and analyzable representation.