Using Large Language Models to Explore Mechanisms of Life Course Exposure-Outcome Associations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) with Graph Retrieval-augmented generation (GRAG) are promising in life-course epidemiology by synthesizing fragmented findings and reasoning the chain-of-risk from interested exposure to outcomes. This field typically depends on costly and incomplete cohort data. Inspired by the pathway model in epidemiology, we integrated a literature-derived knowledge graph with LLMs to mine bridging variables and synthesize potential mechanisms between early-life exposure of gestational diabetes (GDM) and later-life outcome of dementia. A causal knowledge graph was built by including empirical findings and excluding hypothetical assertions, identifying 118 bridging variables like chronic kidney diseases and physical activity. Four GRAG strategies were tested on GPT-4 and evaluated by clinical experts and three other LLMs reviewers, GPT-4o, Llama3-70b, and Gemini Adv. The strategy that used a minimal set of literature abstracts for bridging variables between GDM and dementia performed as well as that for all variables from the GDM-Dementia sub-community. Both strategies significantly outperformed those that incorporated all literature abstracts related to GDM or dementia and the baseline GPT-4 RAG without external knowledge. This approach could offer early signals for developing preventive strategies, guiding variable selection in local cohort construction, and supplementing the study of life-course epidemiology.

Article activity feed