Quantifying chromosomal instability from intratumoral karyotype diversity using agent-based modeling and Bayesian inference

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This study seeks to develop a mathematical framework for estimating rates of chromosome missegregation based on known chromosomal properties and observed aneuploidy rates. A derived model is validated using live-cell imaging before being applied to several previously-described datasets from tumors and organoids. The subject matter is of high interest to aneuploidy and genome evolution researchers.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article

Abstract

Chromosomal instability (CIN)—persistent chromosome gain or loss through abnormal mitotic segregation—is a hallmark of cancer that drives aneuploidy. Intrinsic chromosome mis-segregation rate, a measure of CIN, can inform prognosis and is a promising biomarker for response to anti-microtubule agents. However, existing methodologies to measure this rate are labor intensive, indirect, and confounded by selection against aneuploid cells, which reduces observable diversity. We developed a framework to measure CIN, accounting for karyotype selection, using simulations with various levels of CIN and models of selection. To identify the model parameters that best fit karyotype data from single-cell sequencing, we used approximate Bayesian computation to infer mis-segregation rates and karyotype selection. Experimental validation confirmed the extensive chromosome mis-segregation rates caused by the chemotherapy paclitaxel (18.5 ± 0.5/division). Extending this approach to clinical samples revealed that inferred rates fell within direct observations of cancer cell lines. This work provides the necessary framework to quantify CIN in human tumors and develop it as a predictive biomarker.

Article activity feed

  1. Author Response:

    Reviewer #1 (Public Review):

    1. It seems like this model treats chromosome gains and losses equivalently. Is this appropriate? Chromosome loss events are much more toxic than chromosome gain events - as evidenced by the fact that haploinsufficiency is widespread, and all autosomal monosomies are embryonically-lethal while many trisomies are compatible with birth and development. Can the authors consider a model in which losses exert a more significant fitness penalty that chromosome gains?

    While we agree that monosomies are more detrimental than trisomies in non-cancerous tissue, this is not necessarily the case in tumors in which monosomy is often observed (see PMID: 32054838). Nevertheless, to address this critique we have now added a model variant with an additional condition in which cells experience extreme fitness penalties (90% reduction) if any chromosome is haploid. We apply this condition to all selection models and find this attenuates a ploidy increase over time in diploid cells in most selection models (see Figure 3 ‘haploid penalty’).

    1. Chromosomes do not missegregate at the same rate (PMID: 29898405). This point would need to be discussed, and, if feasible, incorporated into the authors' models.

    While this may be true in some contexts, the limited data on this topic (namely Worral et al. Cell Rep. 2018 and Dumont et al. EMBO J. 2020) do not agree on which chromosomes are mis-segregated more often. Worral suggested chromosomes 1-2 are particularly mis-segregated, whereas Dumont finds chromosome 3, 6, X are the highest. These differences may be explained by a context-dependent effects that depend on the model and mechanism of mis-segregation. Worral uses nocodazole washout to generate merotelics whereas Dumont gets mis-segregation through depleting CENP-A. It is unknown which if these mechanisms, if either, is representative of the mechanisms at play in human tumors so we decided to take a general approach assuming equivalent mis-segregation rates. However, we appreciate that this will be a question for other readers and we have now added this to the discussion.

    1. It would be helpful if the authors could clarify their use of live cell imaging (e.g., in Fig 6G). Certain apparent errors that are visible by live-cell imaging (like a lagging chromosome) can be resolved correctly and result in proper segregation. It is not clear whether it is appropriate to directly infer missegregation rates as is done in this paper.

    We did not perform this live cell imaging experiment. We cite these data as being kindly offered by the Kops laboratory and they correspond to the scDNAseq data for normal colon and CRC organoids from Bolhaqueiro et al. Nat Gen. 2019. We agree that chromosome mis-segregation rates cannot be directly inferred by imaging. As you say, lagging chromosomes may resolve and segregate to the correct daughter cell. The fundamental assumption is that, although not all lagging chromosomes mis-segregate, that specimens with higher rate of lagging chromosomes have higher rates of mis-segreation. Because there is no gold-standard measure of CIN in the literature to date, we feel it is necessary to show the correlation between the two and how the data from that study relates to the inferred rates in this study. We have made this clearer in the text.

    1. The authors would need to discuss in greater detail earlier mathematical models of CIN, including PMID: 26212324, 30204765, and 12446840 and explain how their approach improves on this prior work.

    We now provide a more detailed discussion on prior mathematical models, incorporating these and others.

    Reviewer #2 (Public Review):

    Weakness of the framework include: (1) Most notably, the presented framework is lacking expanded characterization and validation of selection models that are biologically relevant.

    We have taken this critique to heart. To address this, we have greatly expanded the models and their characterization. We now explicitly include a neutral model throughout, tested various modifications of the model (Figure 3C-E), and use ABC to enable model selection (see Table 3).

    The current framework simply applies a scalar exponent to already published fitness models for selection. It is unclear what this exponent mirrors biologically, beyond amplifying the selection pressures already explored in existing gene abundance and driver density models.

    We implemented cellular fitness as the sum of normalized chromosome scores such that the fitness of euploid cells is 1 and the probability of division = 0.5. In this framework, within the ‘abundance’ model, a cell with triploidy of chromosome arm 1p would have a fitness of 0.98. With no additional selection, the probability that this cell divides is 0.98 x 0.5 = 0.49. The published fitness models for karyotype selection do not experimentally determine how fitness relates to the probability of division within a given time. For example, there is no clear reason why (or evidence indicating) an extra copy of chromosome arm 1p would reduce the probability of division from 0.5 to precisely 0.49 for a given period. The proposed model of karyotype selection that our ‘abundance’ model is based on only stipulates that aneuploidy of larger chromosomes is more detrimental than small chromosomes. Thus, these fitness values behave as arbitrary units and, therefore, we believe that adjusting and fitting an arbitrary scaling factor to the biological data is appropriate. For example, with an additional selection of S=10, the same cell with trisomy of chromosome arm 1p would divide with a probability of F^S x 0.5 = 0.98^10 x 0.5 = 0.41.

    We could have implemented a multiplicative framework where fitness (F_mult) is defined as the total deviation from euploid fitness (1) multiplied by a scaling factor S (F_mult = S(1 - F)). For the trisomy 1p example, the same fitness value (F^S=0.9810) can be achieved multiplicatively as exponentially via 1 – (9.14 x (1 - 0.98)) ~ 0.98^10. Thus, the same fitness values can be achieved through arbitrary scaling. We regret that this may have been misinterpreted because it was implemented exponentially vs multiplicatively.

    To further address this critique, we have now better fitted the S values with a flat prior probability across all values, shown how it relates to P_misseg in posterior probabilities (e.gs, Figure 6C, Table 3) and performed the separate analysis requested in critique #5 below.

    (2) Towards this, how is the CIN ON-OFF model in which CIN is turned off after so many cell divisions relevant biologically? Typically CIN is a considered a trait that evolves later in cancer progression, that once tolerated, is ongoing and facilitates development of metastasis and drug resistance. A more relevant model to explore would be that of the effect of a whole genome duplication (WGD) event on population evolution, which is thought to facilitate tolerance of ensuing missegregation events (because reduce risk of nullisomy).

    We agree that the CIN ON-OFF model had limited biologic relevance and removed this. To improve on this, we have changed our approach to use constant CIN for a much longer period of time (3000 time steps). We agree that WGD is a relevant phenomenon. However, others have already explicitly modeled this (see PMID: 26212324 and 32139907), so we avoid doing the same. Instead, we show that tetraploid founding cells tolerate high mis-segregation rates better than diploid founding cells.

    (3) The authors utilize two models of karyotype fitness - a gene abundance model and driver density model - to evaluate impact of specific karyotypes on cellular fitness. They also include a hybrid model whose fitness effects are simply the average of these two models, which adds little value as only a weighted average.

    To date we do not have an experimentally-defined human selection model. The gene abundance model is limited in that it considers all genes equally which inadequately considers disease function and essentiality. By contrast the driver density model weights tumor suppressors and oncogenes which may not operate in all context and ignores the essential functions of most other protein-coding genes. We believe the hybrid model can compensate for these mutual defects, but acknowledge the importance an experimentally derived models to adjudicate which is best.

    In silico results shows inferred missegregation rates are extremely disparate across the two primary models. And while a description of these differences is provided, the presented analyses do not make clear the most important question - which of these models is more clinically relevant? Toward this, in Figure 2F, the authors claim the three models approach a triploid state - which is unsupported by the in silico results. Clearly the driver model approaches a triploid state, as previously reported. But the abundance model does not and hybrid only slightly so, given that it is simply a weighted average of these two approaches. Because the authors have developed a Bayesian strategy for inferring which model parameters best fit observed data, it would be very useful to see which model best recapitulates karyotypes observed in cancer cell lines or patient materials.

    We agree that the abundance and hybrid models are unable to approach a triploid state, in earnest, as does the driver and have made that clearer in the text and improved the figure panel in question for clarity. To address your latter point on which model best fits observed data, we have implemented a model selection scheme to do this (see Table 3). This indicates the gene abundance model as the most biologically relevant and provides evidence for stabilizing selection as the primary mode of selection occurring in the organoid and biopsy data we analyzed.

    (4) Topological features of phylogenetic trees, while discriminatory, are largely dependent on accurate phylogenetic tree reconstruction. The latter requires more careful consideration of cell linkages beyond computing pairwise Euclidean distances and performing complete-linkage clustering. For example, a WGD event, would appear very far from its nearest cell ancestor in Euclidean space.

    While more granular cell linkage data would certainly improve phylogenetic reconstruction, low-coverage scDNA- sequencing (0.01-0.05x) is unable to reliably recover SNPs that would enable this approach. Clustering on copy number similarity remains the standard approach at this point (see PMID: 33762732). We have added this to the discussion.

    (5) Finally, experimental validation of the added selection exponential factor is imperative. Works have already shown models of karyotypic evolution without additional selection exponential coefficient can accurately recover rates of missegregation observed in human cell lines and cancers by fluorescent microscopy. Incorporation of this additional weight on selection pressure has not been demonstrated or validated experimentally. This would require experimental sampling of karyotypes longitudinally and is a critical piece of this manuscript's novelty.

    As described in #1 above, the selection values of F are in arbitrary units and so we believe a selection scaling factor is important to include in the model. For example, without additional selection, a hypothetical aneuploid cell with a trisomy resulting in F = 0.95 would be 5% less likely to divide than a euploid cell with F = 1. The exponent scales the selection such that when S = 2, the fitness of the trisomic cell is F ~ 0.9, or 10% less likely to divide. This scaling is necessary to enable both positive and negative selection in a system fitness is decided as the sum of chromosome scores. To further validate the additional weight on selection pressure we did the following:

    1. We constrained the prior distribution of simulated data for our model selection to S=1 giving only the base fitness values without additional scaling. We, again, performed model selection on the data from Bolhaqueiro et al., 2019 and Navin et al., 2011 and found that, with this constrained prior dataset, we inferred mis- segregation rates (see Table 4) that were far below rates seen in cancer cell lines (see Figure 6E).

    2. Given the initial clarification that reviewers were looking for longitudinal analysis, we leveraged data provided by the authors of Bolhaqueiro et al., 2019 where they sequenced single cells from 3 clones from organoid line 16T at 3 weeks and 21 weeks after seeding. We inferred mis-segregation rates and selective pressures in these clones at the 3-week timepoint. We did so under the Abundance model using the same prior distribution of steps given that the diversity of populations under the Abundance model rapidly reach a steady state. When we simulated additional populations using these inferred characteristics we found that the karyotype composition of the simulated populations most closely resembled the biological population than did populations simulated with the unmodified selection values (see Figure 6 — figure supplement 4). This lends credence to the biological relevance of scaled selective pressure vs. unmodified selective pressure.

    Reviewer #3 (Public Review):

    1. Given the importance of the selection paradigm in determining the observed karyotypic heterogeneity, a significant weakness of the work is that there is no attempt to learn the selection paradigm from the observed data. This is important because there is an interrelationship between selection, the chromosomal alteration rate, and the observed data and so the accuracy of the inferred alteration rate is likely to be compromised if an inappropriate model of selection is used.

    We have implemented a model selection strategy to address this critique. Accordingly, we infer mis-segregation rate under each model and take the result of the best-fit model to be the inferred rate. In this case, stabilizing selection under

    1. Somewhat relatedly, how the population of cells grows (e.g. exponential growth vs constant population size) also effects the observed karyotype heterogeneity, but the modelling only allows for exponential growth which may be an inappropriate of the public datasets analysed.

    We have now concurrently modeled chromosomal instability with a constant population size by approximating constant- population Wright Fisher dynamics (see Materials and Methods). We find these models produce similar results at the karyotype level, addressing concerns about the effects of growth patterns on karyotype evolution in this model.

    1. There are some technical concerns about the approximate Bayesian computation analysis (choice of prior distributions, testing for convergence, matching of the growth model to cell growth patterns in the data, and temporal effects) which need to be addressed to ensure this part of the analysis is robust.

    To address these concerns, we improved and more clearly detailed the prior distributions for each inference within the figure legends, we tested for karyotype convergence in each model (see Figure 3), and we demonstrate that inference under the Abundance model is robust to changes in the number of time steps included in the prior data (see Figure 6 — figure supplement 1).

  2. Evaluation Summary:

    This study seeks to develop a mathematical framework for estimating rates of chromosome missegregation based on known chromosomal properties and observed aneuploidy rates. A derived model is validated using live-cell imaging before being applied to several previously-described datasets from tumors and organoids. The subject matter is of high interest to aneuploidy and genome evolution researchers.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    In this paper, Lynch and colleagues seek to develop a mathematical framework for estimating rates of chromosome missegregation based on known chromosomal properties and observed aneuploidy rates. They derive a model that they validate using live-cell imaging and apply it to several previously-described datasets from tumors and organoids. Overall, this paper is interesting, and its subject matter is of high interest to aneuploidy and genome evolution researchers. Below, I suggest a few areas where this manuscript could be improved:

    1. It seems like this model treats chromosome gains and losses equivalently. Is this appropriate? Chromosome loss events are much more toxic than chromosome gain events - as evidenced by the fact that haploinsufficiency is widespread, and all autosomal monosomies are embryonically-lethal while many trisomies are compatible with birth and development. Can the authors consider a model in which losses exert a more significant fitness penalty that chromosome gains?

    2. Chromosomes do not missegregate at the same rate (PMID: 29898405). This point would need to be discussed, and, if feasible, incorporated into the authors' models.

    3. It would be helpful if the authors could clarify their use of live cell imaging (e.g., in Fig 6G). Certain apparent errors that are visible by live-cell imaging (like a lagging chromosome) can be resolved correctly and result in proper segregation. It is not clear whether it is appropriate to directly infer missegregation rates as is done in this paper.

    4. The authors would need to discuss in greater detail earlier mathematical models of CIN, including PMID: 26212324, 30204765, and 12446840 and explain how their approach improves on this prior work.

  4. Reviewer #2 (Public Review):

    The authors have developed in silico model of CIN that explores how rates of missegration and selection pressure impact population structure and intra-tumor karyotypic variance in human tumors. This extends an already established framework for exploring impact of CIN on karyotypic evolution (PMID: 26212324) in two major ways: (1) accounting for selection pressure on aneuploid karyotypes and (2) using Bayesian inference to fit observed karyotype distributions from single cell DNA sequencing data to those generated in silico through sweeps along parameter space. The latter permits quantitative inference of CIN as a rate, in contrast to a state (i.e. aneuploidy), which is a biologically-important distinction the authors well articulate. Subsequently, they perform sensitivity analyses for the impact of sample size (number of cells that must be karyotyped at single cell level) on inferred rates of missegregation, which is highly relevant to the use of quantitative CIN as a clinical biomarker. The latter is of particular interest to the research and clinical communities, given the pervasive association of CIN with metastasis, drug resistance and poor survival outcomes.

    Strengths of the framework include:

    1. Careful distinct of CIN as a rate (not a state of aneuploidy) as a biomarker for cancer progression and therapeutic resistance.

    2. Acknowledges surprisingly low karyotypic variation observed in published single cell DNA sequencing datasets, despite obvious mitotic errors and karyotypic abnormalities observed by chromosome painting in tumor samples. These conflicting observations are yet to be resolved, and largely limit confidence in accuracy and robustness of single cell DNA sequencing in recovering tumor karyotypes. The author's hypothesize this discrepancy stems from karyotypic selection, which we suggest they expand upon in modeling and validation.

    3. Sensitivity analysis of inferred CIN rate to sample size (number of single cells sequenced) is extremely useful and important for clinical biomarker development.

    Weakness of the framework include:

    1. Most notably, the presented framework is lacking expanded characterization and validation of selection models that are biologically relevant. The current framework simply applies a scalar exponent to already published fitness models for selection. It is unclear what this exponent mirrors biologically, beyond amplifying the selection pressures already explored in existing gene abundance and driver density models.

    2. Towards this, how is the CIN ON-OFF model in which CIN is turned off after so many cell divisions relevant biologically? Typically CIN is a considered a trait that evolves later in cancer progression, that once tolerated, is ongoing and facilitates development of metastasis and drug resistance. A more relevant model to explore would be that of the effect of a whole genome duplication (WGD) event on population evolution, which is thought to facilitate tolerance of ensuing missegregation events (because reduce risk of nullisomy).

    3. The authors utilize two models of karyotype fitness - a gene abundance model and driver density model - to evaluate impact of specific karyotypes on cellular fitness. They also include a hybrid model whose fitness effects are simply the average of these two models, which adds little value as only a weighted average. In silico results shows inferred missegregation rates are extremely disparate across the two primary models. And while a description of these differences is provided, the presented analyses do not make clear the most important question - which of these models is more clinically relevant? Toward this, in Figure 2F, the authors claim the three models approach a triploid state - which is unsupported by the in silico results. Clearly the driver model approaches a triploid state, as previously reported. But the abundance model does not and hybrid only slightly so, given that it is simply a weighted average of these two approaches. Because the authors have developed a Bayesian strategy for inferring which model parameters best fit observed data, it would be very useful to see which model best recapitulates karyotypes observed in cancer cell lines or patient materials.

    4. Topological features of phylogenetic trees, while discriminatory, are largely dependent on accurate phylogenetic tree reconstruction. The latter requires more careful consideration of cell linkages beyond computing pairwise Euclidean distances and performing complete-linkage clustering. For example, a WGD event, would appear very far from its nearest cell ancestor in Euclidean space.

    5. Finally, experimental validation of the added selection exponential factor is imperative. Works have already shown models of karyotypic evolution without additional selection exponential coefficient can accurately recover rates of missegregation observed in human cell lines and cancers by fluorescent microscopy. Incorporation of this additional weight on selection pressure has not been demonstrated or validated experimentally. This would require experimental sampling of karyotypes longitudinally and is a critical piece of this manuscript's novelty.

  5. Reviewer #3 (Public Review):

    This manuscript develops an agent-based computational model describing the evolution of chromosomal alterations in a growing population of cells. The model is then compared to various datasets - a single cell DNA sequencing dataset of paclitaxel treated cells generated by the authors themselves and two public single cell sequencing datasets - using Bayesian inference to infer the model parameter describing the chromosomal alteration rate.

    The novelty and strength of the paper is the mathematical modelling-led approach to quantitatively measure chromosome missegregation rates from single cell sequencing data, and to make the measurements in a way that attempts to be robust to the potentially confounding influence of clonal selection on karyotypic alterations.

    The modelling includes a number of different paradigms of how selection acts upon the karyotype, and the authors show that each paradigm strongly effects the diversity of chromosomal alterations that persist in the evolving population of cells. As expected, the rate of missegregation and overall selection strength also strongly effect diversity.

    Given the importance of the selection paradigm in determining the observed karyotypic heterogeneity, a significant weakness of the work is that there is no attempt to learn the selection paradigm from the observed data. This is important because there is an interrelationship between selection, the chromosomal alteration rate, and the observed data and so the accuracy of the inferred alteration rate is likely to be compromised if an inappropriate model of selection is used. Somewhat relatedly, how the population of cells grows (e.g. exponential growth vs constant population size) also effects the observed karyotype heterogeneity, but the modelling only allows for exponential growth which may be an inappropriate of the public datasets analysed.

    There are some technical concerns about the approximate Bayesian computation analysis (choice of prior distributions, testing for convergence, matching of the growth model to cell growth patterns in the data, and temporal effects) which need to be addressed to ensure this part of the analysis is robust.