Machine learning uncovers circulating biomarkers and molecular heterogeneity in obesity and type 2 diabetes

Erdenetsetseg Nokhoijav
Miklós Káplár
Sándor Csaba Aranyi
András Berzi
Göran Bergström
Konstantinos Antonopoulos
Fredrik Edfors
Miklós Emri
Éva Csősz

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Obesity and Type 2 Diabetes (T2D) are heterogeneous metabolic disorders whose molecular diversity is incompletely defined. We analyzed circulating proteomic profiles from 129 individuals belonging to Control, Obesity, and T2D groups and applied complementary machine-learning approaches, including random forest, multinomial logistic regression with LASSO regularization, support vector machines, and ensemble voting to identify proteins distinguishing the clinical groups. Convergent model outputs revealed a partially overlapping panel of discriminative proteins. Model performance was evaluated in an independent dataset from the Human Protein Atlas (n=834) comprising healthy individuals, patients with Obesity, T2D, or other metabolic diseases. Unsupervised clustering further identified multiple proteomic subgroups within each clinical category, indicating substantial intragroup heterogeneity. Bootstrap random forest with null-model benchmarking highlighted stable cluster-discriminative proteins. These findings demonstrate that integrating circulating proteomics with machine learning can resolve molecular heterogeneity in Obesity and T2D and nominate candidate biomarkers for metabolic disease stratification.

Version published to 10.64898/2026.04.16.718836 on bioRxiv
Apr 20, 2026

Integrated Multi-Omics Analysis for the Identification of Disease-Associated Variations and Prognostic Biomarkers in Triple-Negative Breast Cancer (TNBC)

This article has 2 authors:
1. NAGENDRA MANNEKUNTA
2. ELAMATHI NATRAJAN
This article has no evaluationsLatest version May 6, 2026
Genome-wide analysis and polygenic prediction of clinical obesity and comparison with body mass index

This article has 10 authors:
1. Sohail Zahid
2. Seraj N. Grimes
3. April Kim
4. Zhiqi Yao
5. Allison W. Peng
6. Roger S. Blumenthal
7. Rexford S. Ahima
8. Marios Arvanitis
9. Michael J. Blaha
10. Alexis Battle
This article has no evaluationsLatest version May 22, 2026
Development and Validation of an Interpretable Machine Learning Model to Identify Coexisting Type 2 Diabetes Mellitus in Patients with Metabolic dysfunction-associated fatty liver disease

This article has 14 authors:
1. Hui Zhu
2. Jia Zhang
3. Xi Xu
4. Yi Lv
5. Chenxia Lu
6. Qi Hao
7. Jingjing Huang
8. Miao Peng
9. Jingzhi Wang
10. Ouyang Kani
11. Zixin Shu
12. Shujie Song
13. Xiaodong Li
14. Mingzhong Xiao
This article has no evaluationsLatest version Apr 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrated Multi-Omics Analysis for the Identification of Disease-Associated Variations and Prognostic Biomarkers in Triple-Negative Breast Cancer (TNBC)

Genome-wide analysis and polygenic prediction of clinical obesity and comparison with body mass index

Development and Validation of an Interpretable Machine Learning Model to Identify Coexisting Type 2 Diabetes Mellitus in Patients with Metabolic dysfunction-associated fatty liver disease