Mechanistic learning to predict and understand minimal residual disease

Sadegh Marzban
Mark Robertson-Tessi
Jeffrey West

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Mechanistic modeling has long been used as a tool to describe the dynamics of biological systems, especially cancer in response to treatment. Their key advantage lies in interpretability of relationships between input parameters and outcomes of interest. Mechanistic models may also be calibrated to a cohort of patients and scaled up to generate a simulated set of virtual patients whose aggregate behavior reproduces key characteristics of the real patient population. In contrast, machine learning techniques offer strong prediction performance, especially for high dimensional datasets that are common in oncology. Here, we employ a Mechanstic Learning framework that combines the advantages of both approaches by training machine learning models on mechanistic parameters inferred from clinical patient data. We assess the ability of virtual clinical cohorts for the purpose of 1) scaling up small cohort sizes and 2) balancing unbalanced patient subgroups in the setting of BCR::ABL1 positive lymphoblastic leukemia. Our mechanistic model (a Markov chain model) contains sixteen parameters that describe the rate of cell fate transitions that occur in patients with B-cell precursor acute lymphoblastic leukemia. The machine learning (a ridge logistic regression model) is trained on these parameters to predict two clinically-relevant features: BCR::ABL1 fusion gene status (positive or negative) and minimal residual disease status (positive or negative) post-induction chemotherapy. Model training is done in an iterative fashion to assess which (and how many) parameters are critical to maintain high predictive performance. Using machine learning models trained on the clinical flow-cytometry data, we find that the stem-like cell state alone is the most predictive feature for both BCR::ABL1-positive and MRD-positive disease, with composite scores (defined as the average of accuracy, balanced accuracy, and area under the curve) of 0.80 and 0.67, respectively. By comparison, mechanistic learning achieves comparable or improved composite scores for BCR::ABL1-positive and MRD-positive disease, with scores of 0.81 and 0.71, respectively, using only de-differentiation for BCR::ABL1 and stem-state persistence together with differentiation-directed exit for MRD. Virtual Patient (VP) expansion is informative for robustness analysis and class balancing, but full cohort expansion introduced additional heterogeneity, reduced predictive performance, and required larger models, whereas VP-based balancing yielded only a modest gain over class weighting at substantially greater computational cost. In summary, a mechanistic-learning approach not only preserves predictive performance, but also provides a biological hypothesis for why stemness is predictive of these clinically relevant outcomes.

Version published to 10.64898/2026.04.16.718968 on bioRxiv
Apr 21, 2026

Pathway-based machine learning for breast cancer risk stratification: an interpretable framework validated in two independent cohorts

This article has 2 authors:
1. Suhaan Thayyil
2. Eshaan Nidee
This article has no evaluationsLatest version Apr 8, 2026
Benchmark of biomarker identification and prognostic modeling methods on diverse censored data

This article has 2 authors:
1. Wesley Fletcher
2. Samiran Sinha
This article has no evaluationsLatest version Apr 1, 2026
Interpretable Predictive Modeling for Medical Data Using Boolean Rule-aware Regression

This article has 2 authors:
1. Mohammad Eskandarian
2. Seyed Amir Malekpour
This article has no evaluationsLatest version May 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Pathway-based machine learning for breast cancer risk stratification: an interpretable framework validated in two independent cohorts

Benchmark of biomarker identification and prognostic modeling methods on diverse censored data

Interpretable Predictive Modeling for Medical Data Using Boolean Rule-aware Regression