Representation learning based on proteomic profiles uncovers key cell types and biological processes contributing to the plasma proteome

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The plasma proteome is a potential source of information on health status and physiological condition and holds great potential as candidate biomarkers for diagnosis, prognosis, intervention response monitoring, and patient stratification. As proteins in the plasma can be derived from numerous cellular and tissue sources, and their levels influenced by diverse mechanisms, a comprehensive assessment of patterns of protein variation could provide insight into mechanisms driving health and disease. By applying neural network-based representation learning and unsupervised clustering to the plasma proteomic profiles of 51,180 participants in the UK biobank, we identified 36 protein modules representing major cell types and biological processes present in the plasma proteome. We discovered that the overall abundances of proteins belonging to certain modules are associated with disease status and genetic variants. Those associations reflect complex and multi-faceted mechanisms that regulate protein abundances in circulation. An investigation into the protein modules associated with disease variants uncovered both known disease biology and novel findings that may translate into testable hypotheses. Our approach generates biologically relevant groupings of plasma proteins that can be deployed to inform the design of more predictive biomarker panels and shed new light on the effects of disease-associated genetic variants.

Article activity feed