Biomarker Signal Architecture in Cardiovascular Machine Learning: Stability, Redundancy, and Minimal High-Yield Panels After Myocardial Infarction

Natalia Piórkowska¹
Agnieszka Olejnik
Alan Ostromęcki
Wiktor Kuliczkowski
Andrzej Mysiak
Iwona Bil-Lula

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Machine-learning models based on circulating biomarkers are increasingly used in cardiovascular research; however, model performance alone provides limited insight into how the predictive signal is distributed across features. We aimed to characterize the biomarker signal architecture of a machine-learning model distinguishing ST-elevation myocardial infarction (STEMI) from non-ST-elevation myocardial infarction (NSTEMI), with a focus on signal concentration, redundancy, and conditional complementarity.

Methods

We conducted a structured secondary analysis of a previously established, leakage-controlled machine-learning framework (n = 152 patients). The BIOMARKERS feature-set variant (10 biomarkers) was evaluated using outer-fold cross-validation. Model structure was interrogated using (i) leave-one-biomarker-out analysis, (ii) pairwise leave-two-out analysis with pair-excess estimation, (iii) cumulative ablation of top-ranked biomarkers, and (iv) forward reconstruction of minimal biomarker panels. Uncertainty was assessed using bootstrap resampling across folds.

Results

The full biomarker model achieved a mean ROC-AUC approaching 0.94. The predictive signal was highly non-uniform, with MMP-2 showing the largest single-feature contribution (mean ΔAUC ≈ 0.16). Pairwise analysis identified conditional complementarity between selected non-lipid biomarkers, particularly MMP-2 and EMMPRIN (pair ΔAUC ≈ 0.26; positive excess over single-feature effects), whereas lipid-related markers formed a highly correlated and largely redundant sub-cluster. Cumulative ablation demonstrated rapid performance collapse following removal of top-ranked biomarkers, consistent with structural signal concentration. Forward panel analysis showed that a compact subset of biomarkers (three features) achieved performance within ∼0.01 ROC-AUC of the full model, indicating the presence of a minimal high-yield panel. Bootstrap confidence intervals suggested that small performance differences should be interpreted with caution.

Conclusions

Predictive performance in this biomarker-based model arises from a structured and unevenly distributed signal architecture, characterized by a dominant core biomarker, conditionally complementary contributors, and a redundant lipid cluster. These findings highlight the importance of evaluating model structure, not only aggregate performance, and suggest that biomarker-based machine-learning systems may benefit from architecture-aware interpretation and simplification strategies.

Version published to 10.64898/2026.05.19.26353638 on medRxiv
May 22, 2026

A Consensus-Driven Stacking Ensemble Framework for Interpretable Cardiovascular Risk Prediction and Clinical Deployment

This article has 11 authors:
1. Shafak Shahriar Sozol
2. Bipul Chandra Dev Nath
3. F. M. Shafiullah Fahim
4. Nusrat Nizam Suzana
5. Jannatul Ferdous Mirza
6. Syed Ahmmed
7. Fatima-Tuz Zohra
8. Abu Hena Abid Zafr
9. Mohammed Nasir Uddin
10. M. Rubaiyat Hossain Mondal
11. Abu Sayed Md. Latiful Hoque
This article has no evaluationsLatest version May 26, 2026
Artificial Intelligence for Cardiac Biomarkers After Myocardial Infarction: A Systematic Review and a Leakage-Aware Modeling Framework

This article has 8 authors:
1. Natalia Piórkowska
2. Agnieszka Olejnik
3. Lech Madeyski
4. Aleksandra Musz
5. Wiktor Kuliczkowski
6. Andrzej Mysiak
7. Aleksandra Żyłka
8. Iwona Bil-Lula
This article has no evaluationsLatest version Apr 29, 2026
ECG-derived age deviation predicts cardiovascular diseases across lead configurations and cohorts

This article has 4 authors:
1. Deniz Aydogdu
2. Farieda Gaber
3. Arash Sorooshmehr
4. Altuna Akalin
This article has no evaluationsLatest version Jun 8, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

A Consensus-Driven Stacking Ensemble Framework for Interpretable Cardiovascular Risk Prediction and Clinical Deployment

Artificial Intelligence for Cardiac Biomarkers After Myocardial Infarction: A Systematic Review and a Leakage-Aware Modeling Framework

ECG-derived age deviation predicts cardiovascular diseases across lead configurations and cohorts