Deep latent variable modelling reveals clinically significant subgroups among transfusion recipients

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Transfusion recipients are a heterogeneous group of patients, yet the identification of these groups has traditionally relied on human-driven univariate analyses and domain knowledge instead of analyzing multivariate characteristics of individuals. Electronic health records (EHR) combined with unsupervised machine learning enables robust, data-driven way for phenotyping patient populations, providing finer-grained view on subgroup characteristics.

Materials and Methods

We introduce an extension to the Variational Autoencoder (VAE) framework and apply the model to EHR data of 19,629 adult transfusion recipients. The latent representation of VAEs approximates a low-dimensional manifold of input data, in which patients with similar characteristics are embedded close to one another. The model integrates clustering via a Gaussian Mixture Model (GMM) prior to identify clinically relevant patient subgroups from diagnosis codes, laboratory values and demographics, while simultaneously classifying the type of transfused products. The final clusters are derived with a modified consensus clustering approach.

Results

We identified six patient groups with distinct diagnosis, laboratory, demographic, and transfusion profiles. These novel clusters provide a refined characterization of transfusion-related phenotypes, revealing more detailed distinctions among patient subgroups. Our model achieved moderate classification accuracy, with AUROC of 0.879, 0.806 and 0.861, and PR-AUC of 0.448, 0.357 and 0.492 for red blood cells (RBC), plasma and platelets, respectively. Clustering accuracy remains consistent across training and test sets.

Conclusions

Data-driven phenotyping of transfusion recipients revealed previously unexplored patient phenotypes differing in multiple characteristics. The model helps to understand the heterogeneous nature of patients requiring transfusion and provides insights on how different blood product profiles shift cluster assignments. These findings underscore the utility of latent variable modelling for population characterization and suggest potential applications in optimizing transfusion strategies as well as blood supply chain management. Validation in external cohorts remains unestablished.

Article activity feed