Chemotherapy dose scheduling via Q-learning in a Markov tumor model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We describe a Q-learning approach to optimized chemotherapy dose scheduling in a stochastic finite-cell Markov process that models tumor cell natural selection dynamics. The three competing subpopulations comprising our virtual tumor are a chemo-sensitive population (S), and two chemo-resistant populations, R 1 and R 2 , each resistant to one of two drugs, C 1 and C 2 . The two drugs are toggled off or on which constitute the actions (selection pressure) imposed on our state-variables ( S, R 1 , R 2 ), measured as proportions in our finite state-space of N cancer cells ( S + R 1 + R 2 = N ). After the converged chemo-dosing policies are obtained, corresponding to a given reward structure, we focus on three important aspects of chemotherapy dose scheduling. First, we identify the most likely evolutionary paths of the tumor cell populations in response to the optimized (converged) policies. Second, we quantify the robustness in the ability to reach our target of balanced co-existence in light of incomplete information in both the initial cell-populations as well as the state-variables at each step. Third, we evaluate the efficacy of simplified policies which exploit the symmetries uncovered from an examination of the full policy. Our reward structure is designed to delay the onset of chemo-resistance in the tumor by rewarding a well-balanced mix of co-existing states, while punishing unbalanced subpopulations to avoid extinction.

Article activity feed