In silico analysis of DNA re-replication across a complete genome reveals cell-to-cell heterogeneity and genome plasticity

This article has been Reviewed by the following groups

Read the full article

Abstract

DNA replication is a complex and remarkably robust process: despite its inherent uncertainty, manifested through stochastic replication timing at a single-cell level, multiple control mechanisms ensure its accurate and timely completion across a population. Disruptions in these mechanisms lead to DNA re-replication, closely connected to genomic instability and oncogenesis. Here, we present a stochastic hybrid model of DNA re-replication that accurately portrays the interplay between discrete dynamics, continuous dynamics and uncertainty. Using experimental data on the fission yeast genome, model simulations show how different regions respond to re-replication and permit insight into the key mechanisms affecting re-replication dynamics. Simulated and experimental population-level profiles exhibit a good correlation along the genome, robust to model parameters, validating our approach. At a single-cell level, copy numbers of individual loci are affected by intrinsic properties of each locus, in cis effects from adjoining loci and in trans effects from distant loci. In silico analysis and single-cell imaging reveal that cell-to-cell heterogeneity is inherent in re-replication and can lead to genome plasticity and a plethora of genotypic variations.

Article activity feed

  1. ##Author Response

    ###Summary:

    A strength of the work was that the mathematical modeling of re-replication captured variability in origin firing and supported a mechanism that might explain copy number variation observed in many eukaryotes. However, concern was expressed regarding the influence of assumptions made in developing the model on the outcomes and the moderate correlations between simulations and experimental data. Further explanation of the questions being investigated, the validity and nature of assumptions that were used to develop the simulations, and details explaining how these assumptions were built into the modeling were considered important. Some attempt to align the modeling outcomes with known re-replication hotspots would also improve the study. Some of the parameters used for modeling were concerning, including the use of a 16C ploidy cutoff without adequate justification. Reviewers also made suggestions for improving the experimental validation tests. Reviewers also noted places in the manuscript that require additional clarification. Overall, some concerns were raised regarding the experimental methods, and the impact of the insights gained.

    We would like to thank eLife for this Preprint Review service.

    In this manuscript, we present for the first time a model of DNA rereplication, which permits us to analyse how the process evolves at the single-cell level, across a complete genome, over time. This analysis revealed a pronounced heterogeneity at the single cell level, resulting in increased copies of different genomic loci in different cells, and highlighted rereplication as a powerful mechanism for genome plasticity within an evolving population. We would like to thank the reviewers for their critical appraisal of our work and the editor for his summary of the reviews. The points raised were overall easy to address, and we have done so in a revised version of the manuscript, where we have also clarified points which were unclear to the reviewers. Importantly, we have clarified that: there are currently no available methods for studying rereplication dynamics experimentally at the single cell level across the genome, and it is exactly this analysis that our manuscript offers; model assumptions were either standard and previously validated experimentally for DNA replication or subjected to sensitivity analysis with key findings shown to be robust to model assumptions; there was no arbitrary cut-off point in the rereplication process, which was analysed over time - an advantage of our approach. Data were depicted early in the process (2C) and late in the process (16C) but findings were robust across the process; fission yeast cells can be experimentally induced to rereplicate to different extents (from 2C to 16C or even 32C) and our model permits us to capture the process as it evolves at any ploidy; correlations between experimental and simulated data were highly significant and robust to model assumptions.

    We would like to thank the reviewers for their comments, which we believe have helped us improve our manuscript and clarify points of possible misunderstanding. A point-by-point response follows.

    ###Reviewer #1:

    The authors develop and analyse a mathematical model of DNA rereplication in situations, where re-firing of origins during replication is not suppressed. Using the experimentally measured position and relative strength of origins in yeast, the authors simulate DNA copy number profiles in individual cells. They show that the developed model can mostly recapitulate the experimentally measured DNA copy number profile along the genome, but that the simulated profiles are highly variable. The fact that increasing copy number of an origin will facilitate its preferential amplification essentially constitutes a self-reinforcing feedback loop and might be the mechanism that leads to overamplification of some genomic regions. In addition different regions compete for a limiting factor, and thereby repress each others' over-amplification. While the model generates some interesting hypotheses it is unclear in the current version of the manuscript, to what extent they arise from specific model assumptions. The authors do not clearly formulate the scientific questions asked, they do not discuss the model assumptions and their validity and they do not adequately describe how model results depend on those assumptions. Taken together, the scientific process is insufficiently documented in this manuscript, making it difficult to judge whether the conclusions are actually supported by the data.

    The manuscript has been modified to further clarify the underlying questions and model assumptions. We would like to point out that the model was presented in detail in the supplementary material of the original manuscript, which included all model assumptions. In addition, model parameters used for the base-case model were systematically varied, the outcome was presented in a separate paragraph (“Sensitivity Analysis” in Results), and findings were shown to be robust to model assumptions. These points are presented in detail below.

    1. It is not clear what questions the authors want to address with their model. Do they want to understand how the experimentally observed copy number differences between regions arise? The introduction should elaborate more on the open questions in the field and explain why they should be addressed with a mathematical model.

    With this work our goal is to elucidate the fundamental mechanisms and properties underlying DNA re-replication. Specifically, we aim to investigate how re-replication evolves over time along the genome, and how it may lead to different number of copies of different loci at the single-cell level and result in genetic heterogeneity within a population. Given the large number of origins along the genome and the stochasticity of origin firing (Demczuk et al., 2012; Kaykov and Nurse, 2015; Patel et al., 2006), it is unclear how re-replication would evolve along the genome in each individual cell in a re-replicating population and how local properties and genome-wide effects would shape its progression and the resulting increases in the number of copies of specific loci. As no experimental method exists that can analyze DNA re-replication at the single-cell level over time along the genome, we designed a mathematical model that is able to track the firing and refiring of origins and the evolution of the resulting forks along a complete genome over time, and in this way capture the complex stochastic hybrid dynamics of DNA re-replication. Since existing methods to analyze DNA re-replication in vivo only provide static, population-level snapshots (Kiang et al., 2010; Menzel et al., 2020; Mickle et al., 2007), we believe that our in silico model, which is the first modeling framework of DNA re-replication, is an important contribution in the field.

    In the revised version of our manuscript, we have modified the introduction to explain these points in more detail.

    1. One of the main messages of the paper is that the amplification profiles are highly variable across single cells, because that was found in the described simulations. This behavior does however likely depend on specific choices that were made in the simulations, e.g. that the probabilities of the origin state transitions are exponentially distributed. These assumptions should at least be discussed, or better experimentally validated.

    Modeling choices and assumptions are presented in detail in the Supplementary material of the manuscript, and were made to accurately capture the dynamics of origin firing, which is known to be stochastic, as established by many studies in fission yeast (Bechhoefer and Rhind, 2012; Patel et al., 2006; Rhind et al., 2010) and the continuous movement of forks along the DNA. Specifically, the choice of the exponential distribution used for assigning a firing time to each origin has already been discussed and validated in our previous work on normal DNA replication (Lygeros et al., 2008). Indeed, as shown in Figure 2 of (Lygeros et al., 2008), our model was able to accurately reconstruct experimental data derived by single molecule DNA combing experiments (Patel et al., 2006).

    The use of the exponential distribution for transition firing times is standard in stochastic processes in general, including what are known as Piecewise Deterministic Markov Processes (PDMP), the class where the models considered in the paper belong. There are good mathematical reasons for this, for example the "memoryless" property that makes the resulting stochastic process Markov, a basic requirement for the model to be well-posed [M. H. A. Davis, "Markov models and optimization", Monographs on Statistics and Applied Probability, vol. 49, Chapman & Hall, London, 1993]. Practically, assuming an exponential distribution can be quite general, because the rate (the probability with which a transition "fires" per unit time) is allowed to depend on the state of the system, both the discrete state (in our case, the state of individual origins) and the continuous state (in our case, the progress of individual replication forks). It can be shown that one can exploit this dependence to write seemingly more general processes (that at first sight do not have exponential firing times) as PDMP (with exponential firing times) by appropriately defining a state for the system [M. H. A. Davis, "Piecewise-Deterministic Markov Processes: A General Class of Non-Diffusion Stochastic Models", Journal of the Royal Statistical Society. Series B (Methodological), Vol. 46, No. 3 (1984), pp. 353-388]. In the manuscript this feature is exploited in what we call the LF model, where the rate of the exponential firing time of each origin (probability of firing per unit time) depends on the state of the system (specifically, the number of PreR origins), as discussed in the section on Sensitivity Analysis. We have further clarified these in the revised manuscript.

    1. The authors aim at testing their prediction that rereplication is highly variable across cells. To this end they use the LacO/LacI system to estimate locus copy number. The locus intensity is indeed highly variable across cells. However, the Dapi quantification suggests that only a subset of cells actually undergo rereplication under the experimental conditions used (Fig. 4C). Therefore the analysis should atleast be limited to those cells. It would be even better, if a second locus could be labelled in another color to show that rereplication of two loci is anti-correlated as predicted by the model.

    Under the experimental conditions employed (ectopic expression of a mutant version of the licensing factor Cdc18, stably integrated in the genome under a regulatable promoter), the vast majority of cells undergo rereplication but to relatively low levels, resulting in cells with a DNA content of 2C-8C. Though the DNA content of several cells indeed appears similar to the DNA content of normal G2 phase cells, the vast majority (>90%) of cells undergo rereplication, as manifested by the appearance of DNA damage and, eventually, loss of viability. We have chosen this experimental set-up (medium levels of rereplication) as it allows induction of rereplication in practically all cells in the population, without the abnormal nuclear and cellular morphology which accompanies a pronounced increase in DNA content (ie 16C), and would make single-cell imaging more prone to artifacts. Fission yeast cells can be induced to undergo rereplication to various extents, by regulated expression of different versions of Cdc18 to different levels and/or co-expression of Cdt1. We have now explained this more extensively in the revised manuscript and thank the reviewer for identifying a point which may not have been clear in the first version of the manuscript.

    Concerning the possibility of studying two loci at the same time, we have indeed tried to tag a second region with TetR/TetO, however the signal-to-noise ratio and thus reproducible detection of the TetR focus was suboptimal under rereplication conditions. We therefore did not proceed further with this approach.

    1. What does "signal ratio" in Fig. 2 mean? And why are the peaks much higher in the simulations? Would the signal ratio between simulation and experiment correspond better, if an earlier time point in the simulation was selected?

    The definition of signal ratios is given in Results: DNA re-replication at the population level: “Specifically, we computed in silico mean amplification profiles across the genome, referred to as signal ratios in (Kiang et al., 2010), by averaging the number of copies for each origin location and normalizing it to the genome mean in 100 simulations. In these profiles, peaks above 1 correspond to highly re-replicated regions, and valleys below 1 correspond to regions that are under-replicated with respect to the mean.”

    Indeed, as observed by the reviewer, simulated peaks appear overall sharper and higher than experimental peaks. This is expected, since simulated data show the actual number of copies generated, while experimental data are subject to background noise and represent averages of 3 probes and 2 independent experiments. We have clarified this in the Results.

    Last, we chose to compare in silico and experimental profiles at a similar ploidy. Plotting in silico profiles of an earlier timepoint would indeed lead to visually more similar patterns in terms of peak intensity, but we believe this could be misleading for the readers.

    1. From line 248 onwards, the authors compare different assumptions for polymerase speed and conclude that "0.5 kb/min is closer to experimental observations". It is unclear, however, which experimental observations they refer to and what was observed there. The same question arises when they compare the LF and UF models (line 275-277).

    We have now clarified this point. Experimental observations show that under high levels of rereplication, DNA content reaches 16C four to six hours following accumulation of Cdc18 (Nishitani et al., 2000). Estimates for 0.5 kb/min and the LF model are therefore closer to experimental observations.

    1. I find the description of cis- and trans-effects rather confusing. The authors should rather explain what happens in the model. Neighboring strong origins can amplify a weak origin and origins compete for factors. In line 475-476 for example, it should be clarified that the assumption of the LF model could lead to trans-effects, instead of presenting this as a general model prediction.

    In the manuscript, we initially present what we observe in the Results section and then proceed to provide possible explanations in Discussion. We quote from the Discussion: “Such in trans negative regulation of distant origins could be explained by competition for the same limiting factor: high-level amplification of a given locus recruits high levels of the limiting factor, indirectly inhibiting firing of other genomic regions.” and “[…] in cis elements contribute to amplified copy numbers not only directly by passive re-replication, but also implicitly through increasing the firing activity of their neighbors”. To our understanding, these sentences are in complete agreement with the reviewer’s suggestions. Nonetheless, and to make this even more clear, we have modified the Discussion in our revised manuscript.

    1. Throughout the manuscript, a clear distinction should be made between the firing activity of one origin molecule and the cumulative activity of multiple copies of an origin. For example, it should be clarified in line 435 that the cumulative activity of weak origins might increase if they are closed to a strong origin, because they get amplified, instead of just writing "increased firing activity of weak origins".

    We have clarified this point in the revised manuscript.

    1. One of the major conclusions of the manuscript is that rereplication is robust on the population level. It is not clear to me what the authors mean by that. The average amplification levels are probably determined by the origin efficiencies that are put into the model. What would robustness mean in this context?

    As the reviewer points out, one of the important input parameters of the model are origin efficiencies. Since the model is stochastic however, origin efficiencies do not directly determine the amplification levels at a single-cell level. For example, in Figures 3A and Supplementary Figure S4, we show the outcome of 4 random simulations with identical underlying parameters, where it is clear that re-replication can lead to markedly different single-cell amplification levels. Indeed, genome-wide analysis across 100 simulations (Supplementary Figure S5) indicated that on the onset of re-replication, amplification levels are highly unpredictable (again, despite the fact that the input parameters are identical).

    On the contrary, when analyzing amplification profiles at a population level (averaging across sets of 100 simulations), the most highly amplified regions appear to be highly reproducible. We agree with the reviewer that these population level profiles are strongly affected by the origin efficiencies, but they are not determined solely by them. For example, low efficiency origins can be highly amplified, or highly efficient origins can be suppressed (see discussion on in cis and in trans effects) depending on their neighborhood and system-wide effects, and the extend of these effects depends on the fork speed. Sensitivity analysis with respect to different model assumptions, or model parameters (see Results, section Sensitivity Analysis and Supplementary Figure S3) indicated that amplification profiles might appear sharper or flatter, but overall amplification hotspots were highly robust.

    To summarize, in our conclusions (Discussion, section Emerging properties of re-replication) we highlight these properties (stochasticity vs. robustness) and elaborate further on how they emerge during the course of re-replication (onset vs. high re-replication) or depending on the level of analysis (single-cell vs. population level).

    1. It would be helpful if, in Fig. 2 also the origins and their respective efficiencies could be shown to understand to what extent the signal ratio reflects these efficiencies.

    We thank the reviewer for the useful suggestion, which we have incorporated in the revised manuscript.

    1. The methods section should provide more detail.

    We would like to point out that Supplementary Material, including a full mathematical description of the model is available on BioRxiv, which was also available at the time of the preprint review, (https://www.biorxiv.org/content/10.1101/2020.03.30.016576v1.supplementary-material ), and has also been uploaded as a separate document in our GitHub page: https://github.com/rapsoman/DNA_Rereplication

    ###Reviewer #2:

    Here, Rapsomaniki et al have modeled the process of DNA re-replication. The in silico analysis is an extension of their previous work describing normal DNA replication (Lygeros et al 2008). The authors show that there is a large amount of heterogeneity at the single cell level but when these heterogeneous signals are averaged across a population, the signal is robust. The authors support this with simulations and with experimental data, both at the single cell level and at the population level.

    1. It is a bit concerning that simulations were carried out to a ploidy level of 16C. Has it been observed that the DNA content in any given cell can rise to 16 times the initial amount? Figure 3 (simulations) shows that certain chromosomal regions can reach 30x and 160x copies for 2C and 16C. However, Figure 4 (experiment) suggests that copy numbers should only be slightly more in re-replicating conditions, compared to normal replicating conditions. Additionally, in Figure 2, the simulated data seems to be consistently noisier than the experimental data. Taken together, this may suggest that the assumptions in the model do not adequately recapitulate the biological system.

    Fission yeast cells undergo robust rereplication, and reach a ploidy up to 32C - see for example (Kiang et al., 2010; Mickle et al., 2007; Nishitani et al., 2000). 16C is therefore a usual ploidy for rereplicating fission yeast cells, observed under many experimental conditions. In addition, by manipulating the licensing factors over-expressed, different levels of ploidy can be experimentally achieved, ranging from 2C (the normal ploidy of a G2 cell, but with uneven replication) to 32C. In Figure 4, we have employed a truncated form of Cdc18 (d55P6-cdc18 (Baum et al., 1998)), which induces medium-level re-replication, as confirmed by FACS analysis in Supplementary Figure S6A. Under these conditions, the vast majority of the cells (>90%) undergo re-replication, albeit at medium to low levels. We have opted to use this strain to avoid artifacts due to disrupted nuclear morphology under high levels of re-replication We have now clarified this point in the revised manuscript. We would like to point out that in silico analysis is not carried out at 16C only but across different ploidies – it is actually a strength of our approach that we can follow the rereplication process as it evolves, at any ploidy, and we have shown that our conclusions are robust throughout. We show plots at the beginning of the process (2C) and towards the end (16C), at the single-cell and at the population level, to facilitate comparison.

    Last, as also discussed in our response to reviewer 1, simulated data appear sharper, with higher peak values than experimental data (Figure 2). This is expected, since simulated data show the actual number of copies generated, while experimental data are subject to background noise and represent averages of 3 neighboring microarray probes and 2 independent experiments. We have clarified this in the revised manuscript.

    1. This work currently is agnostic to the genes and sequences within the simulated genomes. The authors suggest that DNA re-replication can result in gene duplications. It might strengthen the manuscript if the authors are able to show that re-replication hotspots coincide with gene duplication events in S pombe. It should be relatively straightforward to overlap the hotspots found in this analysis with known gene duplication events in the literature.

    We agree with the reviewer that comparing our predictions with known gene duplication events in S.pombe would be of interest. Unfortunately to our knowledge no such dataset for fission yeast exists in the literature. The most comprehensive datasets are the ones from (Kiang et al., 2010; Mickle et al., 2007), which analyse rereplicating cells, and which we have already exploited in our paper. We would like to point out that this manuscript aims to show how rereplication evolves genome-wide. Whether the additional copies generated can lead to gene duplication events is beyond the scope of the present manuscript.

    1. The authors have nicely demonstrated that cis activation can be driven by the physical proximity of origins. The authors go on to describe trans suppression in which the activation of one origin suppresses the activation of a different origin. I would argue that this observation is simply the result of randomness in the model and stopping the simulations at fixed points.

    One of the two origins will randomly re-replicate first and simply outpace the other. Stopping the simulations at 16C will simply prevent the lagging origin from catching up the first origin. There does not seem to be an inhibitory mechanism that acts between two origins.

    This can be explained by the following equation: X + Y = constant Where X is the amount of origin 1 and Y is the amount of origin 2.

    It is also possible that the two origins could start re-replicating at the same time. This would result in the data points observed for cluster 2 (Figure 6 BC)

    We thank the reviewer for the positive comments. Indeed, as we elaborate in our Discussion, we believe that the mechanism behind the observed in trans effects is the competition for a factor that exists in a rate-limiting quantity (see also reply to point 6, reviewer 1 above), which is essentially the constant in his/her equation. Though less pronounced, such in-trans effects are also possible in the UF model, and could be due to the total DNA increase being dominated by certain origins, as suggested by the reviewer. We do not suggest anywhere in the manuscript that this inhibition is direct, but rather clearly state that it is an indirect effect.

    ###Reviewer #3:

    This manuscript by Rapsomaniki et al uses mathematical modeling to study the properties of DNA re-replication. They develop a model that shows some consistency with experimental data from S. pombe, and use it to conclude that re-replication is heterogeneous at the single-cell level.

    The simulations have only moderate correlations with experimental data (0.5-0.6). Indeed, simulations and actual data (Figure 2) appear quite different. Despite the statistical significance of the overlap, the limited correspondence brings into question the usefulness of the model compared to directly generating new experimental data.

    We would like to point out that the overlap between experimental and simulated data is highly significant. Firstly, the Spearman correlation coefficient between simulated and experimental genome-wide profiles is highly statistically significant (p values ranging from 7.310-12 to 3.610-41 for the three fission yeast chromosomes). Furthermore, 100.000 repetitions of random peak assignment resulted in only one case where 10 out of 22 peaks overlapped (median 2 out of 22 peaks overlapping), while comparing simulated and experimental data resulted in 14 out of 22 peaks overlapping. Simulations appear more sharp than experimental data, this is however expected as simulated data correspond to the actual number of copies generated, while experimental data are subject to background noise, have a signal-to-noise ratio that is limited by the experimental method employed and represent averages of 3 probes and 2 independent experiments (see Kiang et al., 2010 and also above). We have modified the manuscript to clarify this point. The reviewer suggests that the model is of limited use, because one could trivially generate new experimental data. We would like to point out that existing methods to analyze DNA re-replication in vivo only provide static, population-level snapshots (Kiang et al., 2010; Menzel et al., 2020; Mickle et al., 2007). To date no experimental method can generate single-cell, whole-genome, time-course measurements in re-replicating cells. Our model aims to fill this gap, and for this reason we believe in its usefulness.

    Heterogeneity among single cells, which appears to be one of the main messages of this paper, is not necessarily a surprising finding, and may even arise from the nature of the simulation being stochastic and defined at the level of single origins. They validate this prediction experimentally at a single locus, providing little novel insight.

    We would like to point out that it is the nature of replication in fission yeast which is stochastic, as experimentally shown (Patel et al., 2006), and defined at the level of single origins, and this is captured by the simulations. Heterogeneity amongst single rereplicating cells has not been previously shown or suggested in any organism, at least to the best of our knowledge. It is in our opinion a highly interesting observation, as it provides a powerful mechanism for generating a plethora of different genotypes within a population, from which phenotypic traits could be selected.

    Overall, the insights here are limited and would need to await experimental validation and further empirical data. Given that experimental measurements of re-replication are now feasible genome-wide, the value of these simulations is limited.

    Again, the reviewer seems unaware that no experimental method currently exists for analysing the dynamics of re-replication at a single-cell level genome-wide. We also feel obliged to point out that modeling and in silico analysis is in our opinion of great value for analysing complex biological processes, even when experimental methods are available. Though we are sure this is not what the reviewer really meant, his/her comment appears derogative to a complete field.

    Fork speed is assumed based on limited data and assumptions regarding re-replication fork speed without empirical data.

    As clearly stated in our manuscript (Results, section Modeling DNA re-replication across a complete genome), many studies have estimated fork speed in yeasts in normal DNA replication, with plausible values ranging from 0.5 kb/min to 3 kb/min (Duzdevich et al., 2015; Heichinger et al., 2006; Raghuraman et al., 2001; Sekedat et al., 2010; Yabuki et al., 2002). In our model, we set the base-case value as the lowest estimate (0.5 kb/min), but also explored the model’s sensitivity to this parameter by simulating the model for higher values (1 and 3 kb/min). This analysis indicated that estimates for 0.5 kb/min were closer to biological reality, a non-surprising finding given that fork speed is expected to be slower in re-replication that in normal replication.

    Overall, the comments of reviewer 3 appear in our eyes more derogative than constructive and provide little specific criticism.

    References

    Baum, B., Nishitani, H., Yanow, S., and Nurse, P. (1998). Cdc18 transcription and proteolysis couple S phase to passage through mitosis. The EMBO Journal 17, 5689–5698.

    Bechhoefer, J., and Rhind, N. (2012). Replication timing and its emergence from stochastic processes. Trends in Genetics 28, 374–381.

    Duzdevich, D., Warner, M.D., Ticau, S., Ivica, N.A., Bell, S.P., and Greene, E.C. (2015). The dynamics of eukaryotic replication initiation: origin specificity, licensing, and firing at the singlemolecule level. Mol. Cell 58, 483–494.

    Heichinger, C., Penkett, C.J., Bähler, J., and Nurse, P. (2006). Genome-wide characterization of fission yeast DNA replication origins. The EMBO Journal 25, 5171–5179.

    Kiang, L., Heichinger, C., Watt, S., B\ähler, J., and Nurse, P. (2010). Specific replication origins promote DNA amplification in fission yeast. Journal of Cell Science 123, 3047–3051.

    Lygeros, J., Koutroumpas, K., Dimopoulos, S., Legouras, I., Kouretas, P., Heichinger, C., Nurse, P., and Lygerou, Z. (2008). Stochastic hybrid modeling of DNA replication across a complete genome. Proceedings of the National Academy of Sciences 105, 12295–12300.

    Menzel, J., Tatman, P., and Black, J.C. (2020). Isolation and analysis of rereplicated DNA by Rerep-Seq. Nucleic Acids Res 48, e58–e58.

    Mickle, K.L., Oliva, A., Huberman, J.A., and Leatherwood, J. (2007). Checkpoint effects and telomere amplification during DNA re-replication in fission yeast. BMC Molecular Biology 8, 119.

    Nishitani, H., Lygerou, Z., Nishimoto, T., and Nurse, P. (2000). The Cdt1 protein is required to license DNA for replication in fission yeast. Nature 404, 625–628.

    Patel, P.K., Arcangioli, B., Baker, S.P., Bensimon, A., and Rhind, N. (2006). DNA Replication Origins Fire Stochastically in Fission Yeast. Mol. Biol. Cell 17, 308–316.

    Raghuraman, M.K., Winzeler, E.A., Collingwood, D., Hunt, S., Wodicka, L., Conway, A., Lockhart, D.J., Davis, R.W., Brewer, B.J., and Fangman, W.L. (2001). Replication Dynamics of the Yeast Genome. Science 294, 115–121.

    Rhind, N., Yang, S.C.-H., and Bechhoefer, J. (2010). Reconciling stochastic origin firing with defined replication timing. Chromosome Res 18, 35–43.

    Sekedat, M.D., Fenyö, D., Rogers, R.S., Tackett, A.J., Aitchison, J.D., and Chait, B.T. (2010). GINS motion reveals replication fork progression is remarkably uniform throughout the yeast genome. Molecular Systems Biology 6, 353.

    Yabuki, N., Terashima, H., and Kitada, K. (2002). Mapping of early firing origins on a replication profile of budding yeast. Genes to Cells 7, 781–789.

  2. ###Reviewer #3:

    This manuscript by Rapsomaniki et al uses mathematical modeling to study the properties of DNA re-replication. They develop a model that shows some consistency with experimental data from S. pombe, and use it to conclude that re-replication is heterogeneous at the single-cell level.

    The simulations have only moderate correlations with experimental data (0.5-0.6). Indeed, simulations and actual data (Figure 2) appear quite different. Despite the statistical significance of the overlap, the limited correspondence brings into question the usefulness of the model compared to directly generating new experimental data.

    Heterogeneity among single cells, which appears to be one of the main messages of this paper, is not necessarily a surprising finding, and may even arise from the nature of the simulation being stochastic and defined at the level of single origins. They validate this prediction experimentally at a single locus, providing little novel insight.

    Overall, the insights here are limited and would need to await experimental validation and further empirical data. Given that experimental measurements of re-replication are now feasible genome-wide, the value of these simulations is limited.

    Fork speed is assumed based on limited data and assumptions regarding re-replication fork speed without empirical data.

  3. ###Reviewer #2:

    Here, Rapsomaniki et al have modeled the process of DNA re-replication. The in silico analysis is an extension of their previous work describing normal DNA replication (Lygeros et al 2008). The authors show that there is a large amount of heterogeneity at the single cell level but when these heterogeneous signals are averaged across a population, the signal is robust. The authors support this with simulations and with experimental data, both at the single cell level and at the population level.

    1. It is a bit concerning that simulations were carried out to a ploidy level of 16C. Has it been observed that the DNA content in any given cell can rise to 16 times the initial amount? Figure 3 (simulations) shows that certain chromosomal regions can reach 30x and 160x copies for 2C and 16C. However, Figure 4 (experiment) suggests that copy numbers should only be slightly more in re-replicating conditions, compared to normal replicating conditions. Additionally, in Figure 2, the simulated data seems to be consistently noisier than the experimental data. Taken together, this may suggest that the assumptions in the model do not adequately recapitulate the biological system.

    2. This work currently is agnostic to the genes and sequences within the simulated genomes. The authors suggest that DNA re-replication can result in gene duplications. It might strengthen the manuscript if the authors are able to show that re-replication hotspots coincide with gene duplication events in S pombe. It should be relatively straightforward to overlap the hotspots found in this analysis with known gene duplication events in the literature.

    3. The authors have nicely demonstrated that cis activation can be driven by the physical proximity of origins. The authors go on to describe trans suppression in which the activation of one origin suppresses the activation of a different origin. I would argue that this observation is simply the result of randomness in the model and stopping the simulations at fixed points.

    One of the two origins will randomly re-replicate first and simply outpace the other. Stopping the simulations at 16C will simply prevent the lagging origin from catching up the first origin. There does not seem to be an inhibitory mechanism that acts between two origins.

    This can be explained by the following equation: X + Y = constant Where X is the amount of origin 1 and Y is the amount of origin 2.

    It is also possible that the two origins could start re-replicating at the same time. This would result in the data points observed for cluster 2 (Figure 6 BC)

  4. ###Reviewer #1:

    The authors develop and analyse a mathematical model of DNA rereplication in situations, where re-firing of origins during replication is not suppressed. Using the experimentally measured position and relative strength of origins in yeast, the authors simulate DNA copy number profiles in individual cells. They show that the developed model can mostly recapitulate the experimentally measured DNA copy number profile along the genome, but that the simulated profiles are highly variable. The fact that increasing copy number of an origin will facilitate its preferential amplification essentially constitutes a self-reinforcing feedback loop and might be the mechanism that leads to overamplification of some genomic regions. In addition different regions compete for a limiting factor, and thereby repress each others' over-amplification. While the model generates some interesting hypotheses it is unclear in the current version of the manuscript, to what extent they arise from specific model assumptions. The authors do not clearly formulate the scientific questions asked, they do not discuss the model assumptions and their validity and they do not adequately describe how model results depend on those assumptions. Taken together, the scientific process is insufficiently documented in this manuscript, making it difficult to judge whether the conclusions are actually supported by the data.

    1. It is not clear what questions the authors want to address with their model. Do they want to understand how the experimentally observed copy number differences between regions arise? The introduction should elaborate more on the open questions in the field and explain why they should be addressed with a mathematical model.

    2. One of the main messages of the paper is that the amplification profiles are highly variable across single cells, because that was found in the described simulations. This behavior does however likely depend on specific choices that were made in the simulations, e.g. that the probabilities of the origin state transitions are exponentially distributed. These assumptions should at least be discussed, or better experimentally validated.

    3. The authors aim at testing their prediction that rereplication is highly variable across cells. To this end they use the LacO/LacI system to estimate locus copy number. The locus intensity is indeed highly variable across cells. However, the Dapi quantification suggests that only a subset of cells actually undergo rereplication under the experimental conditions used (Fig. 4C). Therefore the analysis should atleast be limited to those cells. It would be even better, if a second locus could be labelled in another color to show that rereplication of two loci is anti-correlated as predicted by the model.

    4. What does "signal ratio" in Fig. 2 mean? And why are the peaks much higher in the simulations? Would the signal ratio between simulation and experiment correspond better, if an earlier time point in the simulation was selected?

    5. From line 248 onwards, the authors compare different assumptions for polymerase speed and conclude that "0.5 kb/min is closer to experimental observations". It is unclear, however, which experimental observations they refer to and what was observed there. The same question arises when they compare the LF and UF models (line 275-277).

    6. I find the description of cis- and trans-effects rather confusing. The authors should rather explain what happens in the model. Neighboring strong origins can amplify a weak origin and origins compete for factors. In line 475-476 for example, it should be clarified that the assumption of the LF model could lead to trans-effects, instead of presenting this as a general model prediction.

    7. Throughout the manuscript, a clear distinction should be made between the firing activity of one origin molecule and the cumulative activity of multiple copies of an origin. For example, it should be clarified in line 435 that the cumulative activity of weak origins might increase if they are closed to a strong origin, because they get amplified, instead of just writing "increased firing activity of weak origins".

    8. One of the major conclusions of the manuscript is that rereplication is robust on the population level. It is not clear to me what the authors mean by that. The average amplification levels are probably determined by the origin efficiencies that are put into the model. What would robustness mean in this context?

    9. It would be helpful if, in Fig. 2 also the origins and their respective efficiencies could be shown to understand to what extent the signal ratio reflects these efficiencies.

    10. The methods section should provide more detail.

  5. ##Preprint Review

    This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Tim Formosa (University of Utah School of Medicine) served as the Reviewing Editor.

    ###Summary:

    A strength of the work was that the mathematical modeling of re-replication captured variability in origin firing and supported a mechanism that might explain copy number variation observed in many eukaryotes. However, concern was expressed regarding the influence of assumptions made in developing the model on the outcomes and the moderate correlations between simulations and experimental data. Further explanation of the questions being investigated, the validity and nature of assumptions that were used to develop the simulations, and details explaining how these assumptions were built into the modeling were considered important. Some attempt to align the modeling outcomes with known re-replication hotspots would also improve the study. Some of the parameters used for modeling were concerning, including the use of a 16C ploidy cutoff without adequate justification. Reviewers also made suggestions for improving the experimental validation tests. Reviewers also noted places in the manuscript that require additional clarification. Overall, some concerns were raised regarding the experimental methods, and the impact of the insights gained.