Conditional deep generative modeling of blood-based infrared spectra enables controlled in-silico phenotyping studies
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Infrared molecular fingerprinting of blood offers a scalable, minimally invasive window into human physiology, but limited follow-up, imbalanced cohorts, and restricted access to diverse phenotypes constrain systematic studies. Here, we introduce a conditional deep generative framework for synthesizing blood-based infrared spectra that preserves individual-level structure while allowing controlled manipulation of demographic and anthropometric covariates. Using 25,308 spectra from 5,863 ostensibly healthy participants in the longitudinal Health4Hungary – Hungary4Health cohort, we train a Conditional Variational Autoencoder and a Conditional Boundary Equilibrium GAN to generate blood-based infrared spectra conditioned on age, sex, and body mass index. We show that the generated spectra closely match held-out real data across multiple levels and faithfully encode demographic and anthropometric information. We further demonstrate two \textit{in-silico} applications: modeling of individualized healthy aging trajectories that follow cohort-level aging manifolds while retaining subject-specific characteristics, and targeted augmentation of underrepresented body mass index categories. Together, these results establish conditional generative modeling of blood-based infrared spectra as a viable approach for virtual cohort construction, cohort balancing, and controlled \textit{in-silico} phenotyping, paving the way toward more comprehensive and data-efficient studies in precision health.