Conditional deep generative modeling of blood-based infrared spectra enables controlled in-silico phenotyping studies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Infrared molecular fingerprinting of blood offers a scalable, minimally invasive window into human physiology, but limited follow-up, imbalanced cohorts, and restricted access to diverse phenotypes constrain systematic studies. Here, we introduce a conditional deep generative framework for synthesizing blood-based infrared spectra that preserves individual-level structure while allowing controlled manipulation of demographic and anthropometric covariates. Using 25,308 spectra from 5,863 ostensibly healthy participants in the longitudinal Health4Hungary – Hungary4Health cohort, we train a Conditional Variational Autoencoder and a Conditional Boundary Equilibrium GAN to generate blood-based infrared spectra conditioned on age, sex, and body mass index. We show that the generated spectra closely match held-out real data across multiple levels and faithfully encode demographic and anthropometric information. We further demonstrate two \textit{in-silico} applications: modeling of individualized healthy aging trajectories that follow cohort-level aging manifolds while retaining subject-specific characteristics, and targeted augmentation of underrepresented body mass index categories. Together, these results establish conditional generative modeling of blood-based infrared spectra as a viable approach for virtual cohort construction, cohort balancing, and controlled \textit{in-silico} phenotyping, paving the way toward more comprehensive and data-efficient studies in precision health.

Article activity feed