Machine learning-guided deconvolution of plasma protein levels

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Proteomic techniques now measure thousands of circulating proteins at population scale, driving a surge in biomarker studies and biological clocks. However, their potential impact, generalisability, and biological relevance is hard to assess without understanding the origins and role of the thousands of proteins implicated in these studies. Here, we provide a data-driven identification of the foundations of protein variation that underly their links to ageing and diseases, differ between sexes and ancestries, and help guide protein biomarker and drug target discovery. We use machine learning to systematically identify and quantify the foundations of plasma levels of ∼3,000 protein targets among 43,240 participants of the UK Biobank. Out of >1,700 participant and sample characteristics, we identify a median of 19 factors (range: 1-36) that jointly explained an average of 23.7% (max. 79.9%) of the variance in plasma levels across protein targets. Proteins segregated into distinct clusters according to their explanatory factors, with modifiable characteristics explaining more variance compared to genetic variation (13.3% vs 9.8%). We identify proteins for which the factors underlying their variation differed by sex (n=1414 proteins) or across ancestries (n=86 proteins). We establish a knowledge graph that integrates our findings with genetic studies and drug characteristics to guide identification of potential markers of target engagement. We demonstrate the value of our resource 1) by identifying disease-specific biomarkers, like matrix metalloproteinase 12 for abdominal aortic aneurysm, and 2) by developing a framework for phenotype enrichment of protein signatures from independent studies to identify underlying sources of variation. All results are explorable via an interactive web portal ( https://omicscience.org/apps/prot_foundation ) and can be readily integrated into ongoing studies using an associated R package.

Article activity feed