EpiPathAI: Using Large Language Models to Explore Mechanisms of Life Course Exposure-Outcome Associations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) enhanced with Graph Retrieval-Augmented Generation (GRAG) are promising for life-course epidemiology, which typically depends on costly and incomplete cohort data. Inspired by the epidemiological pathway model, we introduce EpiPathAI, which combines literature-derived causal knowledge graphs with LLMs to mine bridging variables and synthesize potential mechanisms between gestational diabetes and dementia. We test four GRAG strategies on GPT-4 and evaluate the identified mediators with clinical experts and three other LLM reviewers. The knowledge graph identifies 118 bridging variables, including coronary heart disease and chronic kidney disease, previously validated in our data-driven approach through the UK Biobank. EpiPathAI has identified additional clinically meaningful mediators, including high-level low-density lipoprotein (9.8% of effect, 95% CI: 3.7%-23.2%), and depression, which is a reasonable but statistically non-significant mediator in UK Biobank. EpiPathAI serves as a knowledge-driven mechanism mining agent that complements the data-driven approach, providing a compelling foundation for investigating other mediating pathways in future longitudinal cohort studies.

Article activity feed