Molecular maps of diseases from omics data and network embeddings
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Identifying disease-relevant proteins and pathways remains a fundamental challenge in understanding disease mechanisms and supporting therapeutic development. While omics analyses can provide valuable insights, they typically consider each gene/protein separately rather than at the level of biological systems. This can be addressed by combining the omics data with protein networks. We integrate disease-specific omics data with a universal functional association network from STRING, which we represent using node2vec embedding. This way, we constructed disease maps for seven diseases spanning inflammatory, oncological, neurological, and vascular diseases based on genetics, transcriptomics, somatic mutation, and proteomics data. Compared to omics analysis alone, the use of a simple linear model on top of network embedding enabled us to identify 2–4 times as many known disease-relevant proteins at the same specificity. Clustering of the resulting disease maps revealed both functional modules shared by many diseases, such as inflammatory pathways and cancer hallmarks, and disease-specific modules, such as keratinization in atopic dermatitis and extracellular matrix remodeling in aortic aneurysm. Together, these results highlight the value of protein network embedding when analyzing omics data to understand diseases.