Mechanistic learning to predict and understand minimal residual disease

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mechanistic modeling has long been used as a tool to describe the dynamics of biological systems, especially cancer in response to treatment. Their key advantage lies in interpretability of relationships between input parameters and outcomes of interest. Mechanistic models may also be calibrated to a cohort of patients and scaled up to generate a simulated set of virtual patients whose aggregate behavior reproduces key characteristics of the real patient population. In contrast, machine learning techniques offer strong prediction performance, especially for high dimensional datasets that are common in oncology. Here, we employ a Mechanstic Learning framework that combines the advantages of both approaches by training machine learning models on mechanistic parameters inferred from clinical patient data. We assess the ability of virtual clinical cohorts for the purpose of 1) scaling up small cohort sizes and 2) balancing unbalanced patient subgroups in the setting of BCR::ABL1 positive lymphoblastic leukemia. Our mechanistic model (a Markov chain model) contains sixteen parameters that describe the rate of cell fate transitions that occur in patients with B-cell precursor acute lymphoblastic leukemia. The machine learning (a ridge logistic regression model) is trained on these parameters to predict two clinically-relevant features: BCR::ABL1 fusion gene status (positive or negative) and minimal residual disease status (positive or negative) post-induction chemotherapy. Model training is done in an iterative fashion to assess which (and how many) parameters are critical to maintain high predictive performance. Using machine learning models trained on the clinical flow-cytometry data, we find that the stem-like cell state alone is the most predictive feature for both BCR::ABL1-positive and MRD-positive disease, with composite scores (defined as the average of accuracy, balanced accuracy, and area under the curve) of 0.80 and 0.67, respectively. By comparison, mechanistic learning achieves comparable or improved composite scores for BCR::ABL1-positive and MRD-positive disease, with scores of 0.81 and 0.71, respectively, using only de-differentiation for BCR::ABL1 and stem-state persistence together with differentiation-directed exit for MRD. Virtual Patient (VP) expansion is informative for robustness analysis and class balancing, but full cohort expansion introduced additional heterogeneity, reduced predictive performance, and required larger models, whereas VP-based balancing yielded only a modest gain over class weighting at substantially greater computational cost. In summary, a mechanistic-learning approach not only preserves predictive performance, but also provides a biological hypothesis for why stemness is predictive of these clinically relevant outcomes.

Article activity feed