ALADYNOULLI: A Bayesian approach to disease progression modeling for genomic discovery and clinical prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding how disease patterns evolve over a lifetime remains a key challenge in medicine. While electronic health records provide rich longitudinal data, existing models typically analyze each disease in isolation, missing the complex interplay between multiple conditions and genetic factors. Here, we combine longitudinal health records with genetic data to model individual trajectories, using a novel dynamic Bayesian framework called ALADYNOULLI that identifies latent disease signatures from longitudinal health records while modeling individual-specific trajectories. Applied across three biobanks with up to 50 years of followup, our model discovers clinically interpretable disease signatures that demonstrate remarkable consistency across diverse populations (79.2% cross-cohort correspondence) and show strong genetic correlations, enabling both accurate prediction of patient risk and discovery of novel genetic associations. The model achieves dramatic improvements in disease prediction across 24 conditions, outperforming established clinical risk scores like PCE, PREVENT and GAIL over short and longer-term horizons. Furthermore, our signature-based approach identifies over 150 genetic loci - many missed by single-disease GWAS - with multiple signatures showing strong genetic signals (cardiovascular h 2 = 0.041, musculoskeletal h 2 = 0.035). Critically, this unified modeling approach significantly improves predictive performance for multiple diseases while revealing distinct biological subtypes within traditional diagnostic categories—demonstrating substantial heterogeneity across diverse conditions including cancer, metabolic disorders, and psychiatric conditions, with Cohen’s d effect sizes up to 3.87 for signature differences between patient clusters ( p ≤ 1 × 10 8 for 95% of comparisons). In conclusion, ALADYNOULLI combines genetics and longitudinal diagnosis to achieve both improved disease prediction and enhanced genetic discovery through a unified framework that captures the complex interplay between genetic predisposition and time-varying disease patterns. Application with link to simulated results available at http://aladynoulli.hms.harvard.edu . Code available with github permission enabled.