Long Term Cost-Effectiveness of Resilient Foods for Global Catastrophes Compared to Artificial General Intelligence Safety

This article has been Reviewed by the following groups

Read the full article

Abstract

Global agricultural catastrophes, which include nuclear winter and abrupt climate change, could have long-term consequences on humanity such as the collapse and nonrecovery of civilization. Using Monte Carlo (probabilistic) models, we analyze the long-term cost-effectiveness of resilient foods (alternative foods) - roughly those independent of sunlight such as mushrooms. One version of the model populated partly by a survey of global catastrophic risk researchers finds the confidence that resilient foods is more cost effective than artificial general intelligence safety is ~86% and ~99% for the 100 millionth dollar spent on resilient foods at the margin now, respectively. Another version of the model based on one of the authors produced ~95% and ~99% confidence, respectively. Considering uncertainty represented within our models, our result is robust: reverting the conclusion required simultaneously changing the 3-5 most important parameters to the pessimistic ends. However, as predicting the long-run trajectory of human civilization is extremely difficult, and model and theory uncertainties are very large, this significantly reduces our overall confidence. Because the agricultural catastrophes could happen immediately and because existing expertise relevant to resilient foods could be co-opted by charitable giving, it is likely optimal to spend most of the money for resilient foods in the next few years. Both cause areas generally save expected current lives inexpensively and should attract greater investment.

Article activity feed

  1. Compiled ratings, author response, and editorial comment


    Ratings and predictions

    Ratings (1-100)

    Evaluator 1 Evaluator 2 Evaluator 3
    Rating category Rating (0-100) 90% CI Rating (0-100) 90% CI Rating (0-100) Confidence
    Overall assessment 40 20-60 80 60-90 65 Medium
    Advancing knowledge and practice 30 20-60 80 70-90 70 Medium
    Methods: Justification, reasonableness, validity, robustness 50 40-60 70 50-90 Not qualified
    Logic & communication 60 40-75 85 65-95 80 Medium-to-high
    Open, collaborative, replicable 70 40-75 73 50-95 Not qualified
    Relevance to global priorities 90 60-95 85 70-90 80 High

    Journal predictions (1-5)

    Evaluator 1 Evaluator 2 Evaluator 3
    Prediction metric Rating (0-5) 90% CI Rating (0-5) 90% CI Rating (0-5) Confidence
    What ‘quality journal’ do you expect this work will be published in? 2 1-2 3.5 3-5 3.5 Medium
    On a ‘scale of journals’, what tier journal should this be published in? 2 1-2 4 3-5 3.5 High

    Author response

    To start with we would like to commend the format and reviewer comments which were of extremely high quality. The evaluations provided well thought out and constructively critical analysis of the work, pointing out several assumptions which could impact findings of the paper while also recognizing the value of the work in spite of some of these assumptions. Research in this space is difficult due to the highly interdisciplinary nature of the questions being asked, and the major uncertainties that need to be addressed. We value good epistemics and understand that it takes many people critically looking at a problem to achieve this, which is what motivated our participation in the Unjournal pilot. A format which allows work to be published and reviewed in an open nuanced manner can reduce the friction of working on such questions and speed up communal sense making on important questions. We are excited to have participated and look forward to seeing how Unjournal progresses. We hope that future work highlighted by the reviewers that addresses assumptions and issues of the paper will be undertaken, by external parties who are better equipped to critically analyse this area of research improving epistemics in relation to nuclear risk, resilient foods, AGI safety and the greater the existential risk space.

    To clarify, the intention of the comparison of resilient foods to AGI safety was chosen as AGI safety is considered the greatest x-risk by many. Consequently, comparison of cost-effectiveness of resilient foods to AGI safety was intended to highlight the merit of resilient foods to motivate further investment, as opposed to motivating redirecting funding from AGI safety to resilient foods.

    We have included responses to aspects of the evaluations below.

    Evaluation 1

    Structure of cost-effectiveness argument

    • The biggest issue with interpretability this causes is that I struggle to understand what features of the analysis are making resilient food appear cost-effective because of some feature of resilient food, and which are making resilient food appear cost-effective because of some feature of AI. The methods used by the authors mean that a mediocre case for resilient food could be made to look highly cost-effective with an exceptionally poor case for AI, since their central result is the multiplier of value on a marginally invested dollar for resilient food vs AI. This is important, because the authors’ argument is that resilient food should be funded because it is more effective than AI Risk management, but this is motivated by AI Risk proponents agreeing AI Risk is important – in scenarios where AI Risk is not worth investing in then this assumption is broken and cost effectiveness analysis against a ’do nothing’ alternative is required. For example, the authors do not investigate scenarios where the benefit of the intervention in the future is negative because “negative impacts would be possible for both resilient foods and AGI safety and there is no obvious reason why either would be more affected”. While this is potentially reasonable on a mathematical level, it does mean that it would be perfectly possible for resilient foods to be net harmful and the paper not correctly identify that funding them is a bad idea – simply because funding AI Risk reduction is an even worse idea, and this is the only given alternative. If the authors want to compare AGI risk mitigation and resilient foods against each other without a ‘do nothing’ common comparator (which I do not think is a good idea), they must at the very least do more to establish that the results of their AI Risk model map closely to the results which cause the AI Risk community to fund AI Risk mitigation so much. As this is not done in the paper, a major issue of interpretability is generated.

    We could have compared to the Open Philanthropy last dollar if that had been available at the time of publishing ($200 trillion per world saved or 0.05 basis points of existential risk per $billion): https://forum.effectivealtruism.org/posts/NbWeRmEsBEknNHqZP/longterm-cost-effectiveness-of-founders-pledge-s-climate. Our median for spending $100 million is ~2x10^-10 far future potential increase per dollar, or 500 basis points per $billion, or ~10,000 times as cost-effective. Ours is about 500 times as cost effective as the upper bound on that page.

    • More generally, this causes the authors to have to write up their results in a non-natural fashion. As an example of the sort of issues this causes, conclusions are expressed in entirely non-natural units in places (“Ratio of resilient foods mean cost effectiveness to AGI safety mean cost effectiveness” given $100m spend), rather than units which would be more natural (“Cost-effectiveness of funding resilient food development”). I cannot find expressed anywhere in the paper a simple table with the average costs and benefits of the two interventions, although a reference is made to Denkenberger & Pearce (2016) where these values were presented for near-term investment in resilient food. This makes it extremely hard for a reader to draw sensible policy conclusions from the paper unless they are already an expert in AGI risk and so have an intuitive sense of what an intervention which is ‘3-6 times more cost-effective than AGI risk reduction’ looks like. The paper might be improved by the authors communicating summary statistics in a more straightforward fashion.

    Figure 5 is Far future potential increase per $, which is an absolute value. That said, we acknowledge that the presentation of findings throughout could have been made more straightforward for non-expert readers and will aim to communicate summary statistics in a more accessible way in future work.

    Continuing on from this point, I don’t understand the conceptual framework that has the authors consider the value of invested dollars in resilient food at the margin. The authors’ model of the value of an invested dollar is an assumption that it is distributed logarithmically. Since the entire premise of the paper hinges on the reasonability of this argument, it is very surprising there is no sensitivity analysis considering different distributions of the relationship between intervention funding and value. Nevertheless, I am also confused as to the model even on the terms the authors describe; the authors’ model appears to be that there is some sort of ‘invention’ step where the resilient food is created and discovered (this is mostly consistent with Denkenberger & Pearce (2016), and is the only interpretation consistent with the question asked in the survey). In which case, the marginal value of the first invested dollar is zero because the ’invention’ of the food is almost a discrete and binary step. The marginal value per dollar continues to be zero until the 86 millionth dollar, where the marginal value is the entire value of the resilient food in its entirety. There seems to be no reason to consider the marginal dollar value of investment when a structural assumption made by the authors is that there is a specific level of funding which entirely saturates the field, and this would make presenting results significantly more straightforward – it is highly nonstandard to use marginal dollars as the unit of cost in a cost-effectiveness analysis, and indeed is so nonstandard I’m not certain fundamental assumptions of cost-effectiveness analysis still hold.

    In the survey, we ask about the job of spending $100 million, but then we refer to the cost per life saved paper which discusses separate interventions of research, planning, and piloting, some of these interventions such as early stage research don't cost very much money and increase the probability of success, which is why we argue marginal thinking makes sense. For instance, significant progress in the last year in prioritizing the most cost-effective resilient foods that also feed a lot of people has been achieved, this could lead to development and deployment of much more effective food production methods for such scenarios.

    Methods

    The presentation of the sensitivity analysis as ‘number of parameters needed to flip’ is nonstandard, but a clever way to intuitively express the level of confidence the authors have in their conclusions. Although clever, I am uncertain if the approach is appropriately implemented; the authors limit themselves to the 95% CI for their definition of an ‘unfavourable’ parameter, and I think this approach hides massive structural uncertainty with the model. For example, in Table 5 the authors suggest their results would only change if the probability of nuclear war per year was 4.8x10^-5 (plus some other variables changing) rather than their estimated of 7x10^-3 (incidentally, I think the values for S model and E model are switched in Table 5 – the value for pr(nuclear war) in the table’s S model column corresponds to the probability given in the E model).

    This appears to be a coincidence, the lowest 5th percentile value of all nuclear war probabilities was used, which was given by the furthest year into the future with no nuclear war. For S model this is 49 years into the future and has a value of 4.8x10^-5 and for E model this is 149 years into the future and has a value of 1.8X10^-4 (see inserted screen shots).

    S model 5th percentile of nuclear war probability per year after x years of no nuclear war (lowest probability of nuclear war per year after 49 years no nuclear war): Figure link

    image1

    E model 5th percentile of nuclear war probability per year after x years of no nuclear war (lowest probability of nuclear war per year after 149 years no nuclear war): Figure link

    image2

    Third, the authors could have done more to make it clear that the ‘Expert Model’ was effectively just another survey with an n of 1. Professor Sandburg, who populated the Expert Model, is also an author on this paper and so it is unclear what if any validation of the Expert Model could reasonably have been undertaken – the E model is therefore likely to suffer from the same drawbacks as the S model. It is also unclear if Professor Sandburg knew the results of the S Model before parameterising his E Model – although this seems highly likely given that 25% of the survey’s respondents were Professor Sandburg’s co-authors. This could be a major source of bias, since presumably the authors would prefer the two models to agree and the expert parameterising the model is a co-author.

    Professor Sandberg was not shown the S model parameters to avoid introducing bias. That said, we acknowledge that the small size of the existential risk field, and influence of several highly cited early works such as the FHI TECHNICAL REPORT Global Catastrophic Risks Survey have the potential to introduce anchoring bias.

    Parameter estimates

    Notwithstanding my concerns about the use of the survey instrument, I have some object level concerns with specific parameters described in the model.

    • The discount rate for both costs and benefits appears to be zero, which is very nonstandard in economic evaluation. Although the authors make reference to “long termism, the view that the future should have a near zero discount rate”, the reference for this position leads to a claim that a zero rate of pure time preference is common, and a footnote observing that “the consensus against discounting future well-being is not universal”. To be clear, pure time preference is only one component of a well-constructed discount rate and therefore a discount rate should still be applied for costs, and probably for future benefits too. Even notwithstanding that I think this is an error of understanding, it is a limitation of the paper that discount rates were not explored, given they seem very likely to have a major impact on conclusions.

    Thank you for highlighting this point, this is an important consideration that would make valuable future work.

    • A second concern I have relating to parameterisation is the conceptual model leading to the authors’ proposed costing for the intervention. The authors explain their conceptual model linking nuclear war risk to agricultural decline commendably clearly, and this expands on the already strong argument in Denkenberger & Pearce (2016). However, I am less clear on their conceptual model linking approximately $86m of research to the widescale post-nuclear deployment of resilient foods. The assumption seems to be (and I stress this is my assumption based on Denkenberger & Pearce (2016) – it would help if the authors could make it explicit) that $86m purchases the ‘invention’ of the resilient food, and once the food is ‘invented’ then it can be deployed when needed with only a little bit of ongoing training (covered by the $86m). This seems to me to be an optimistic assumption; there seems to be no cost associated with disseminating the knowledge, or any raw materials necessary to culture the resilient food. Moreover, the model seems to structurally assume that distribution chains survive the nuclear exchange with 100% certainty (or that the materials are disseminated to every household which would increase costs), and that an existing resilient food pipeline exists at the moment of nuclear exchange which can smoothly take over from the non-resilient food pipeline.

    Denkenberger & Pearce (2016) does not include costs post GCR and only considers R&D, and response and preparedness planning, and related costs pre disaster. This work pre disaster would likely result in expenditure post disaster being significantly lower than stored food.

    I have extremely serious reservations about these points. I think it is fair to say that an economics paper which projected benefits as far into the future as the authors do here without an exploration of discount rates would be automatically rejected by most editors, and it is not clear why the standard should be so different for existential risk analysis. A cost of $86m to mitigate approximately 40% of the impact of a full-scale nuclear war between the US and a peer country seems prima facie absurd, and the level of exploration of such an important parameter is simply not in line with best practice in a cost-effectiveness analysis (especially since this is the parameter on which we might expect the authors to be least expert). I wouldn’t want my reservations about these two points to detract from the very good and careful scholarship elsewhere in the paper, but neither do I want to give the impression that these are just minor technical details – these issues could potentially reverse the authors’ conclusions, and should have been substantially defended in the text.

    We agree that this estimate from the published work is likely low and have since updated our view on cost upwards. The nuclear war probability utilized does not include other sources of nuclear risk such as accidental detonation of nuclear weapons leading to escalation, intentional attack, or dyads involving China.

    Evaluation 2

    The Methods section is well organised and documented, but once in a while it lacks clarity and it uses terminology that may or may not be appropriate. Here’s a list of things Ii found a bit confusing:

    • Terminology
      • The submodels for food and AGI are said to be “independent”; is this meant in a probabilistic way? Are there no hidden/not modelled variables that influence both?

    In reality we anticipate that there are a myriad of ways in which nuclear risk and AGI would interact with one another. Are AI systems implemented in nuclear command and control? If so when and how does this change nuclear war probability? What will data sets used to train AI systems post nuclear exchange look like compared to present? Post nuclear exchange will there be greater pressure to utilize autonomous systems? How many/which chip fabs will be destroyed during a nuclear exchange?

    Capturing such interactions in the model in a rigorous way would have required a considerable section within the paper, which was beyond the scope of what could be included. We raised that the submodels are independent to make people aware of this simplifying assumption.

    We believe that investigating the interdependence of x-risks is an important open question that would make valuable future work.

    • The “expert” model was quite confusing for me, maybe because “Sandberg” and the reference number after “Sandberg” don’t match, or maybe because I was expecting a survey vs. expert judgement quantification of uncertainty. As I said (structured) expert judgement is one of my interests: https://link.springer.com/book/10.1007/978-3-030-46474-5

    There is an error in the referencing, this should have linked to the following guesstimate model: Denkenberger, D., Sandberg, A., Cotton-Barrat, O., Dewey, D., & Li, S. (2019b). Food without the sun and AI X risk cost effectiveness general far future impact publication. Guesstimate. https://www.getguesstimate.com/models/11691

    • In the caption of fig 2, “index nodes” and “variable nodes” are introduced. Index nodes are later described, but I don't think I understood what was meant by “variable” nodes. Aren’t all probabilistic nodes variable?

    This language comes from analytica taxonomies of the different types of nodes, this is simply describing what the nodes are in the analytica implementation. See this link for more information: https://docs.analytica.com/index.php/Create_and_edit_nodes

    • Underlying assumptions/definitions
      • The structure of the models is not discussed. How did you decide that this is a robust structure (no sensitivity to structure performed as far as I understood)

    An earlier model only considered collapse and nonrecovery of civilization as the route to far future impact. The current structure developed the structure further and is more inclusive.

    • What is meant by “the data from surveys was used directly instead of constructing continuous distributions”?

    Instead of sampling from a distribution created from the survey data, the model randomly draws a survey response value from the index of values for each of the 32000 model runs.

    It is great that the models are available upon request, but it would be even better if they would be public so the computational reproducibility could be evaluated as well.

    Links to the models are available at the following links.

    S-model: https://www.getguesstimate.com/models/13082

    E-model: https://www.getguesstimate.com/models/11691


    Editorial note

    Evaluators were asked to follow the general guidelines available here. They were also provided with this document with additional resources specific to the paper, rationale for its selection, and an ‘editorial’ first pass of aspects of the paper to consider.

    Note that this evaluation was organized during the Unjournal pilot phase and was managed manually using several Google Docs. The format may differ from future evaluations that will be managed with the Kotahi Platform.

    The evaluations were conducted on the version of the article published in the International Journal of Disaster Risk Reduction, however, we have posted the evaluations on the preprint for accessibility.

  2. Evaluation 3


    Ratings and predictions

    Ratings (1-100)

    • Overall assessment: 65 Confidence: Medium
    • Advancing knowledge and practice: 70 Confidence: Medium
    • Methods: Justification, reasonableness, validity, robustness: Not qualified
    • Logic & communication: 80 Confidence: Medium-to-high
    • Open, collaborative, replicable: Not qualified
    • Relevance to global priorities: 80 Confidence: High

    Journal predictions (1-5)

    • What ‘quality journal’ do you expect this work will be published in? 3.5 Confidence: Medium
    • On a ‘scale of journals’, what tier journal should this be published in? 3.5 Confidence: Medium

    Written report

    I am a political scientist specializing in science policy (i.e., how expertise and knowledge production influences the policymaking process and vice-versa), with a focus on “decision making under conditions of uncertainty,” R&D prioritization, and the governance of systemic and catastrophic risk. With respect to the various categories of expertise highlighted by the authors, I can reasonably be considered a “policy analyst.”

    Potential conflict of interest/source of bias: one of the authors (Dr. Anders Sandberg) is a friend and former colleague. He was a member of my PhD dissertation committee.

    A quick further note on the potential conflict of interest/bias of the authors (three of the four are associated with ALLFED, which, as the authors note, could stand to benefit financially from the main implication of their analysis - that significant funding be allocated to resilient food research in the short-term). In my opinion, this type of “self-advocacy” is commonplace and, to some extent, unavoidable. Interest and curiosity (and by extension, expertise) on a particular topic motivates deep analysis of that topic. It’s unlikely that this kind of deep analysis (which may or may not yield these sorts of “self-confirming” conclusions/recommendations) would ever be carried out by individuals who are not experts on - and often financially implicated in - the topic. I think their flagging of the potential conflict of interest at the end of the paper is sufficient - and exercises like this Unjournal review further increase transparency and invite critical examinations of their findings and “positionality.”

    I am unqualified to provide a meaningful evaluation of several of the issues “flagged” by the authors and editorial team, including: the integration of the sub-models, sensitivity analysis, and alternative approaches to the structure of their Monte Carlo analysis. Therefore, I will focus on several other dimensions of the paper.

    Context and contribution

    This paper has two core goals: (1) to explore the value and limitations of relative long-term cost effectiveness analysis as a prioritization tool for disaster risk mitigation measures in order to improve decision making and (2) to use this prioritization tool to determine if resilient foods are more cost effective than AGI safety (which would make resilient food the highest priority area of GCR/X-risk mitigation research). As I am not qualified to directly weigh in on the extent to which the authors’ achieved either goal, I will reflect on the “worthiness” of this goal within the broader context of work going on in the fields of X-risk/GCR, long-termism, science policy, and public policy - and the extent to which the authors’ findings are effectively communicated to these audiences.

    Within this broader context, I believe that these are indeed worthy (and urgent) objectives. The effective prioritization of scarce resources to the myriad potential R&D projects that could (1) reduce key uncertainties, (2) improve political decision-making, and (3) provide solutions that decrease the impact and/or likelihood of civilization-ending risk events is a massive and urgent research challenge. Governments and granting agencies are desperate for rigorous, evidence-based guidance on how to allocate finite funding across candidate projects. Such prioritization is impeded by uncertainty about the potential benefits of various R&D activities (partially resulting from uncertainty about the likelihood and magnitude of the risk event itself - but also from uncertainty about the potential uncertainty-reducing and harm/likelihood-reducing “power” of the R&D). Therefore, the authors’ cost-effectiveness model, which attempts to decrease uncertainty about the potential uncertainty-reducing and harm/likelihood-reducing “power” of resilient food R&D and compare it to R&D on AGI safety, is an important contribution. It combines and applies a number of existing analytical tools in a novel way and proposes a tool for quantifying the relative value of (deeply uncertain) R&D projects competing for scarce resources.

    Overall, the authors are cautious and vigilant in qualifying their claims - which is essential when conducting analysis that relies on the quasi-quantiative aggregation of the (inter)subjective beliefs of experts and combines several models (each with their own assumptions).

    Theoretical/epistemic uncertainty

    I largely agree with the authors’ dismissal of theoretical/epistemic uncertainty (not that they dismiss its importance or relevance - simply that they believe there is essentially nothing that can be done about it in their analysis). Their suggestion that “results should be interpreted in an epistemically reserved manner” (essentially a plea for intellectual humility) should be a footnote in every scholarly publication - particularly those addressing the far future, X-risk, and value estimations of R&D.

    However, the authors could have bolstered this section of the paper by identifying some potential sources of epistemic uncertainty and suggesting some pathways for further research that might reduce it. I recognize that they are both referring to acknowledged epistemic uncertainties - which may or may not be reducible - as well as unknown epistemic uncertainties (i.e., ignorance - or what they refer to as “cluelessness”). It would have been useful to see a brief discussion of some of these acknowledged epistemic uncertainties (e.g., the impact of resilient foods on public health, immunology, and disease resistance) to emphasize that some epistemic uncertainty could be reduced by exactly the kind of resilient food R&D they are advocating for.

    Presentation of model outputs

    When effectively communicating uncertainties associated with research findings to multiple audiences, there is a fundamental tradeoff between the rigour demanded by other experts and the digestibility/usability demanded by decision makers and lay audiences. For example, this tradeoff has been well-documented in the literature on the IPCC’s uncertainty communication framework (e.g., Janzwood 2020). What fellow-modelers/analysts want/need is usually different from what policymakers want/need. The way that model outputs are communicated in this article (e.g. 84% confidence that the 100 millionth dollar is more cost-effective) leans towards rigour and away from digestibility/usability. A typical policymaker who is unfamiliar with the modeling tools used in this analysis may assume that an 84% probability value was derived from historical frequencies/trials in some sort of experiment - or that it simply reflects an intersubjective assessment of the evidence by the authors of the article. Since the actual story for how this value was calculated is rather complex (it emerges from a model derived from the aggregation of the outputs of two sub-models, which both aggregate various types of expert opinions and other forms of data) - it might be more useful to communicate the final output qualitatively.

    This strategy has been used by the IPCC to varying levels of success. These qualitative uncertainty terms can align with probability intervals. For example, 80-90% confidence could be communicated as “high confidence” or “very confident.” >90% could be communicated as “extremely confident.” There are all sorts of interpretation issues associated with qualitative uncertainty scales - and some scales are certainly more effective than others (again, see Janzwood 2020) but it is often useful to communicate findings in two “parallel tracks” - one for experts and one for a more lay/policy-focused audience.

    Placing the article’s findings within the broader context of global priorities and resource allocation

    Recognizing the hard constraints of word counts - and that a broader discussion of global priorities and resource allocation was likely “out of scope” - this article could be strengthened (or perhaps simply expanded upon in future work) by such a discussion. The critical piece of context is the scarcity of resources and attention within the institutions making funding decisions about civilization-saving R&D (governments, granting organizations, private foundations, etc.). There are two dimensions worth discussing here. First, R&D activities addressing risks that are generally considered low-probability/high-impact with relatively long timelines (although I don’t think the collapse of global agricultural would qualify as low-risk - nor is the likely timeline terribly long - but those are my priors) are competing for scarce funding/attention against R&D activities addressing lower-impact risks believed to be shorter-term and more probable (e.g., climate change, the next pandemic, etc.). I think most risk analysts - even hardcore “long-termists” - would agree that an ideal “R&D funding portfolio” be somewhat diversified across these categories of risk. It is important to acknowledge the complexity associated with resource allocation - not just between X-risks but between X-risks and other risks.

    Second, there is the issue of resource scarcity itself. On the one hand, there are many “high value” candidate R&D projects addressing various risks that societies can invest in - but only a finite amount of funding and attention to allocate between them. So, these organizations must make triage decisions based on some criteria. On the other hand, there are also a lot of “low” or even “negative value” R&D activities being funded by these organizations - in addition to other poor investments - that are providing little social benefit or are actively increasing the likelihood/magnitude of various risks. I believe that it is important in these sort of discussions about R&D prioritization and resource scarcity to point out that the reosource pool need not be this shallow - and to identify some of the most egregious funding inefficiencies (e.g., around fossil fuel infrastructure expansion). It should go without saying— but ideally, we could properly resource both resilient food and AGI safety research.


    Evaluator details

    1. Name: Scott Janzwood
    2. How long have you been in this field? 10 years
    3. How many proposals and papers have you evaluated? ~25 proposals, ~10 papers
  3. Evaluation 2


    Ratings and predictions

    Ratings (1-100)

    • Overall assessment: 80 CI: 60-90
    • Advancing knowledge and practice: 80 CI: 70-90
    • Methods: Justification, reasonableness, validity, robustness: 70 [(60+80+70+70)/4] CI: 50-90
      • Comment: Many components to rate, composite uncertainty in my rating :| many choices left unexplained (on the other hand there were so many choices that it would’ve been hard to justify all), the majority seemed reasonable. Validity and robustness for me are hard to assess, but I based my numbers on the discussion about uncertainties and the sensitivity analysis.
    • Logic & communication: 85 [(90 +80)/2] CI: 65-95
      • Comment: Both excellent, but because the paper is so dense it’s sometimes hard to follow.
    • Open, collaborative, replicable: 73 [(80+80+60)/3] CI: 50-95
      • Comment: The large uncertainty and a reduced score for replicable is due to the various data sources/references used in the quantification of the models. Many can be recovered from other papers which may or may not be replicable/straight forward. A table with all data sources and functional relationships would’ve been useful. Also a bit more info on model E would help reduce my uncertainty here.
    • Relevance to global priorities: 85 CI: 70-90

    Journal predictions (1-5)

    • What ‘quality journal’ do you expect this work will be published in? 3.5 CI: 3–5
      • Comment: Biased by the knowledge that it has been published and my trust in the reviewing process.
    • On a ‘scale of journals’, what tier journal should this be published in? 4 CI: 3-5

    Written report

    I really enjoyed reading this paper and I did learn a lot as well, so thank you for putting it together. It is a very clearly presented, very dense piece of research. My area of expertise however is risk and decision analysis under uncertainty. I am a probabilistic modeller with loads of experience in structured expert judgement used to quantify uncertainty when data are sparse or lacking. My evaluation will therefore not cover the application area as such, as I have no experience with catastrophic or existential risk. I assume the cited literature is appropriate and not a non-representative sample, but I did not spend time verifying this assumption.

    At a first read, both the title and the abstract left me wondering if the present analysis compares cost-effectiveness (of resilient foods) with safety (of AGI), which would’ve been a strange comparison to make. However, after reading the very clear (and dense) Introduction, things became very clear. The only minor comment I have about the Introduction is that it sounds more ambitious than what the results provide with respect to the second objective.

    The Methods section is well organised and documented, but once in a while it lacks clarity and it uses terminology that may or may not be appropriate. Here’s a list of things Ii found a bit confusing:

    • Terminology
      • The first sentence mentioned “parameters” without the context of what these parameters may be (sometimes random variables are called parameters, some other times the parameters of a distribution are referred to as parameters, etc)
      • The probability distribution of the “expected cost effectiveness”. Is “expected” in this context meant in a probabilistic sense, i.e., the expectation of the random variable “cost effectiveness”?
      • The submodels for food and AGI are said to be “independent”; is this meant in a probabilistic way? Are there no hidden/not modelled variables that influence both?
      • The “expert” model was quite confusing for me, maybe because “Sandberg” and the reference number after “Sandberg” don’t match, or maybe because I was expecting a survey vs. expert judgement quantification of uncertainty. As I said (structured) expert judgement is one of my interests (https://link.springer.com/book/10.1007/978-3-030-46474-5 )
      • In the caption of fig 2, “index nodes” and “variable nodes” are introduced. Index nodes are later described, but I don't think I understood what was meant by “variable” nodes. Aren’t all probabilistic nodes variable?
    • Underlying assumptions/definitions
      • Throughout the methods section I missed a table with a list of all variables, how where they measured, on what sort of scale, or using what formula, where were they quantified from (data, surveys, literature +reference, and if taken from other studies, what were the limitations of those studies)
      • Some of the parameters of the, say, Beta distributions are mentioned but not justified
      • The structure of the models is not discussed. How did you decide that this is a robust structure (no sensitivity to structure performed as far as I understood)
      • What is meant by “the data from surveys was used directly instead of constructing continuous distributions”?
      • The arcs in Fig 1 are unclear, some of them seem misplaced, while others seem to be missing. This can be a misunderstanding from my part, so maybe more text about Fig.1 would help.
      • It is unclear if the compiled data sets are compatible. I think the quantification of the model should be documented better or in a more compact way.

    The Results section is very clear and neatly presented and I did enjoy the discussion on the several types of uncertainty.

    It is great that the models are available upon request, but it would be even better if they would be public so the computational reproducibility could be evaluated as well.

    Some of the references are missing links in the text, and at least one does not link to the desired bibitem.


    Evaluator details

    1. Name: Anca Hanea
    2. How long have you been in this field? It depends what you mean by “this field”. I have been working in risk and uncertainty analysis and decision making under uncertainty since 2009 (I’ve defended my PhD in december 2008). I have never collaborated on a global catastrophic or existential risk application. I am familiar with food (in)security and collaborated on a project on that once.
    3. How many proposals and papers have you evaluated? I lost count of the papers. I am a reviewer for more than 20 journals, I have co-edited special issues and I am one of the editors for one journal. I have evaluated less than ten research proposals throughout the years.
  4. Evaluation 1


    Ratings and predictions

    Ratings (1-100)

    • Overall assessment: 40 CI: 20-60
      • Comment: See main review
    • Advancing knowledge and practice: 30 CI: 20-60
      • Comment: The paper itself makes an important argument about resilient foods, but I don’t know if the additional element of AGI risk adds much to Denkenberger & Pearce (2016)
    • Methods: Justification, reasonableness, validity, robustness: 50 CI: 40-60
      • Comment: Very major limitations around the survey method, and implementation of certain parts of the parameter sensitivity analysis. However many elements of a high standard
    • Logic & communication: 60 CI: 40-75
      • Comment: Major limitations around the logic and communication of the theoretical model of cost-effectiveness used in the paper. Minor limitations of readability and reporting which could have been addressed before publication (such as reporting 95% CIs without medians, and not reporting overall cost and benefit estimates)
    • Open, collaborative, replicable: 70 CI: 40-75
      • Comment: Provided models are shared with any reader who asks, I couldn’t ask for more here. Limitations of survey replicability (particularly E model) prevent perfect score
    • Relevance to global priorities: 90 CI: 60-95
      • Comment: I’d be surprised if I ever read a paper with more relevance to global priorities, although as mentioned there are a few version of this argument circulating such as Denkenberger & Pearce (2016)

    Journal predictions (1-5)

    • What ‘quality journal’ do you expect this work will be published in? 2 CI: 1-2
    • On a ‘scale of journals’, what tier journal should this be published in? 2 CI: 1-2

    Written report

    This is a very interesting paper on an important and neglected topic. I’d be surprised if I ever again read a paper with such potential importance to global priorities. The authors motivate the discussion well, and should be highly commended for their clear presentation of the structural features of their model, and the thoughtful nature in which uncertainty was addressed head-on in the paper.

    Overall, I suspect the biggest contribution this paper will make is contextualising the existing work done by the authors on resilient food into the broader literature of long-termist interventions. This is a significant achievement, and the authors should feel justifiably proud of having accomplished it. However, the paper unfortunately has a number of structural and technical issues which should significantly reduce a reader’s confidence in the quantitative conclusions which aim to go beyond this contextualisation.

    In general, there are three broad areas where I think there are material issues with the paper:

    1. The theoretical motivation for their specific philosophy of cost-effectiveness, and specifically whether this philosophy is consistent throughout the essay
    2. The appropriateness of the survey methods, in the sense of applying the results of a highly uncertain survey to an already uncertain model
    3. Some specific concerns with parameterisation

    None of these concerns touch upon what I see at the main point of the authors, which I take to be that ‘fragile’ food networks should be contextualised alongside other sources of existential risk. I think this point is solidly made, and important. However, they do suggest that significant additional work may be needed to properly prove the headline claim of the paper, which is that in addition to being a source of existential risk the cost-effectiveness of investing in resilient food is amongst the highest benefit-per-cost of any existential risk mitigation.

    Structure of cost-effectiveness argument

    One significant highlight of the paper is the great ambition it shows in resolving a largely intractable question. Unfortunately, I feel this ambition is also something of a weakness of the paper, since it ends up difficult to follow the logic of the argument throughout.

    • Structurally, the most challenging element of this paper in terms of argumentative flow is the decision to make the comparator for cost-effectiveness analysis ‘AGI Catastrophe’ rather than ‘do nothing’. My understanding is that the authors make this decision to clearly highlight the importance of resilient food – noting that, “if resilient foods were more cost effective than AGI safety, they could be the highest priority [for the existential risk community]” (since the existential risk community currently spends a lot on AGI Risk mitigation). So roughly, they start with the assumption that AI Risk must be cost-effective, and argue that anything more cost-effective than this must therefore also be cost-effective. The logic is sound, but this decision causes a number of problems with interpretability, since it requires the authors to compare an already highly uncertain model of food resilience against a second highly uncertain model of AGI risk.
    • The biggest issue with interpretability this causes is that I struggle to understand what features of the analysis are making resilient food appear cost-effective because of some feature of resilient food, and which are making resilient food appear cost-effective because of some feature of AI. The methods used by the authors mean that a mediocre case for resilient food could be made to look highly cost-effective with an exceptionally poor case for AI, since their central result is the multiplier of value on a marginally invested dollar for resilient food vs AI. This is important, because the authors’ argument is that resilient food should be funded because it is more effective than AI Risk management, but this is motivated by AI Risk proponents agreeing AI Risk is important – in scenarios where AI Risk is not worth investing in then this assumption is broken and cost effectiveness analysis against a ’do nothing’ alternative is required. For example, the authors do not investigate scenarios where the benefit of the intervention in the future is negative because “negative impacts would be possible for both resilient foods and AGI safety and there is no obvious reason why either would be more affected”. While this is potentially reasonable on a mathematical level, it does mean that it would be perfectly possible for resilient foods to be net harmful and the paper not correctly identify that funding them is a bad idea – simply because funding AI Risk reduction is an even worse idea, and this is the only given alternative. If the authors want to compare AGI risk mitigation and resilient foods against each other without a ‘do nothing’ common comparator (which I do not think is a good idea), they must at the very least do more to establish that the results of their AI Risk model map closely to the results which cause the AI Risk community to fund AI Risk mitigation so much. As this is not done in the paper, a major issue of interpretability is generated.
    • A second issue this causes is that the authors must make an awkward ‘assumption of independence’ between nuclear risk, food security risk and AI risk. Although the authors identify this as a limitation of their modelling approach, the assumption does not need to be made if AI risk is not included as a comparator in the model. I don’t think this is a major limitation of the work, but an example of how the choice of comparator has an impact on structural features of the model beyond just the comparator.
    • More generally, this causes the authors to have to write up their results in a non-natural fashion. As an example of the sort of issues this causes, conclusions are expressed in entirely non-natural units in places (“Ratio of resilient foods mean cost effectiveness to AGI safety mean cost effectiveness” given $100m spend), rather than units which would be more natural (“Cost-effectiveness of funding resilient food development”). I cannot find expressed anywhere in the paper a simple table with the average costs and benefits of the two interventions, although a reference is made to Denkenberger & Pearce (2016) where these values were presented for near-term investment in resilient food. This makes it extremely hard for a reader to draw sensible policy conclusions from the paper unless they are already an expert in AGI risk and so have an intuitive sense of what an intervention which is ‘3-6 times more cost-effective than AGI risk reduction’ looks like. The paper might be improved by the authors communicating summary statistics in a more straightforward fashion. For example, I have spent some time looking for the probability the model assigns to no nuclear war before the time horizon (and hence the probability that the money spent on resilient food is ‘wasted’ with respect to the 100% shortfall scenario) but can’t find this – that seems to be quite an important summary statistic but it has to be derived indirectly from the model.

    Fundamentally, I don’t understand why both approaches were not compared to a common scenario of ‘do nothing’ (relative to what we are already doing). The authors’ decision to compare AGI Risk mitigation to resilient foods directly would only be appropriate if the authors expect that increasing funding for resilient food decreased funding for AI safety (that is to say, the authors are claiming that there is a fixed budget for AI-safety-and-food-resilience, and so funding for one must come at the expense of the other). This might be what the authors have in mind as a practical consequence of their argument, as there is an implication that funding for resilient foods might come from existing funding deployed to AGI Risk. But it is not logically necessary that this is the case, and so it creates great conceptual conclusion to include it in a cost-effectiveness framework that requires AI funding and resilient food funding to be strictly alternatives. To be clear, the ‘AI subunit’ is interesting and publishable in its own right, but in my opinion simply adds complexity and uncertainty to an already complex paper.

    Continuing on from this point, I don’t understand the conceptual framework that has the authors consider the value of invested dollars in resilient food at the margin. The authors’ model of the value of an invested dollar is an assumption that it is distributed logarithmically. Since the entire premise of the paper hinges on the reasonability of this argument, it is very surprising there is no sensitivity analysis considering different distributions of the relationship between intervention funding and value. Nevertheless, I am also confused as to the model even on the terms the authors describe; the authors’ model appears to be that there is some sort of ‘invention’ step where the resilient food is created and discovered (this is mostly consistent with Denkenberger & Pearce (2016), and is the only interpretation consistent with the question asked in the survey). In which case, the marginal value of the first invested dollar is zero because the ’invention’ of the food is almost a discrete and binary step. The marginal value per dollar continues to be zero until the 86 millionth dollar, where the marginal value is the entire value of the resilient food in its entirety. There seems to be no reason to consider the marginal dollar value of investment when a structural assumption made by the authors is that there is a specific level of funding which entirely saturates the field, and this would make presenting results significantly more straightforward – it is highly nonstandard to use marginal dollars as the unit of cost in a cost-effectiveness analysis, and indeed is so nonstandard I’m not certain fundamental assumptions of cost-effectiveness analysis still hold. I can see why the authors have chosen to bite this bullet for AI risk given the existing literature on the cost of preventing AI Catastrophe, but there seems to be no reason for it when modelling resilient food and it departs sharply from the norm in cost-effectiveness analysis.

    Finally, I don’t understand the structural assumptions motivating the cost-effectiveness of the 10% decline analysis. The authors claim that the mechanism by which resilient foods save lives in the 10% decline analysis is that “the prices [of non-resilient food] would go so high that those in poverty may not be able to afford food” with the implication that resilient foods would be affordable to those in poverty and hence prevent starvation. However, the economic logic of this statement is unclear. It necessitates that the production costs of resilient food is less than the production costs of substitute non-resilient food at the margin, which further implies that producers of resilient food can command supernormal profits during the crisis, which is to say the authors are arguing that resilient foods represent potentially billions of dollars of value to their inventor within the inventor’s lifetime. It is not clear to me why a market-based solution would not emerge for the ‘do nothing’ scenario, which would be a critical issue with the authors’ case since it would remove the assumption that ‘resilient food’ and ‘AGI risk’ are alternative uses of the same money in the 10% scenario, which is necessary for their analysis to function. The authors make the further assumption that preparation for the 100% decline scenario is highly correlated with preparation for the 10% decline scenario, which would mean that a market-based solution emerging prior to nuclear exchange would remove the assumption that ‘resilient food’ and ‘AGI risk’ are alternative uses of the same money in the 100% decline scenario. A supply and demand model might have been a more appropriate model for investigating this effect. Once again, I note that the supply and demand model alone would have been an interesting and publishable piece of work in its own right.

    Overall, I think the paper would have benefitted from more attention being paid to the underlying theory of cost-effectiveness motivating the investigation. Decisions made in places seem to have multiplied uncertainty which could have been resolved with a more consistent approach to analysis. As I highlighted earlier, the issues only stem from the incredible ambition of the paper and the authors should be commended for managing to find a route to connect two separate microsimulations, an analysis of funding at the margin and a supply-and-demand model. Nevertheless, the combination of these three approaches weakens the ability to draw strong conclusions from each of these approaches individually.

    Methods

    With respect to methods, the authors use a Monte Carlo simulation with distributions drawn from a survey of field experts. The use of a Monte Carlo technique here is an appropriate choice given the significant level of uncertainty over parameters. The model appears appropriately described in the paper, and functions well (I have only checked the models in Guesstimate, as I could not make the secondary models in Analytica function). A particular highlight of the paper is the figures clearly laying out the logical interrelationship of elements of the model, which made it significantly easier to follow the flow of the argument. I note the authors use ‘probability more effective than’ as a key result, which I think is a natural unit when working in Guesstimate. This is entirely appropriate, but a known weakness of the approach is that it can bias in favour of poor interventions with high uncertainty. The authors could also have presented a SUCRA analysis which does not have this issue, but they may have considered and rejected this approach as unnecessary given the entirely one-sided nature of the results which a SUCRA would not have reversed.

    The presentation of the sensitivity analysis as ‘number of parameters needed to flip’ is nonstandard, but a clever way to intuitively express the level of confidence the authors have in their conclusions. Although clever, I am uncertain if the approach is appropriately implemented; the authors limit themselves to the 95% CI for their definition of an ‘unfavourable’ parameter, and I think this approach hides massive structural uncertainty with the model. For example, in Table 5 the authors suggest their results would only change if the probability of nuclear war per year was 4.8x10^-5 (plus some other variables changing) rather than their estimated of 7x10^-3 (incidentally, I think the values for S model and E model are switched in Table 5 – the value for pr(nuclear war) in the table’s S model column corresponds to the probability given in the E model). But it is significantly overconfident to say that risk of nuclear war per year could not possibly be below 4.8x10^-5, so I think the authors overstate their certainty when they say “reverting [reversing?] the conclusion required simultaneously changing the 3-5 most important parameters to the pessimistic ends”; in fact it merely requires that the authors have not correctly identified the ‘pessimistic end’ of any one of the five parameters, which seems likely given the limitations in their data which I will discuss momentarily. I personally would have found one- and two-dimensional threshold analysis a more intuitive way to present the results, but I think the authors have a reasonable argument for their approach. As described earlier, I have some concerns that an appropriate amount of structural sensitivity analysis was undertaken, but the presentation of uncertainty analysis is appropriate in its own terms (if somewhat nonstandard).

    Overall, I have no major concerns about the theory or application of the modelling approach. However, I have a number of concerns with the use of the survey instrument:

    First, the authors could have done more to explain the level of uncertainty their survey instrument contains. They received eight responses, which is already a very low number of responses for a quantitative survey. In addition, two of the eight responses were from authors of the paper. The authors discuss ‘response bias’ and ‘demand characteristic bias’ which would not typically be applied to data generated by an approximately autoethnographic process – it is obvious that the authors of a survey instrument know what purpose the instrument is to be used for, and have incentives to make the survey generate novel and interesting findings. It might have been a good sensitivity analysis to exclude responses from the authors and other researchers associated with ALLFED since there is a clear conflict of interest that could bias results here.

    Second, issues with survey data collection are compounded by the fact that some estimates which are given in the S Model are actually not elicited with the survey technique – they are instead cited to Denkenberger & Pearce (2016) and Denkenberger & Pearce (2017). This is described appropriately in the text, but not clearly marked in the summary Table 1 where I would expect to see it, and the limitation this presents is not described clearly. To be explicit, the limitation is that at least two key parameters in the model are based on a sample of the opinions of two of the eight survey respondents, rather than the full set of eight respondents. As an aside on presentation, the decision to present lower and upper credible intervals in Table 1 rather than median is non-standard for an economics paper, although perhaps this is a discipline-specific convention I am unaware of. Regardless, I’m not sure it is appropriate to present the lowest of eight survey responses as the ‘5th percentile’, as it is actually the 13th percentile and giving 95% confidence intervals implies a level of accuracy the survey instrument cannot reach. While I appreciate the 13th percentile of 8 responses will be the same as the 5th centile of 100 samples drawn from those responses, this is not going to be clear to a casual reader of the paper. ‘Median (range)’ might be a better presentation of the survey data in this table, with better clarity on where each estimate comes from. Alternatively, the authors could look at fitting a lognormal distribution to the survey results using e.g. method of moments, and then resample from the new distribution to create a genuine 95% CI. Regardless, given the low number of responses, it might have been appropriate simply to present all eight estimates for each relevant parameter in a table.

    Third, the authors could have done more to make it clear that the ‘Expert Model’ was effectively just another survey with an n of 1. Professor Sandburg, who populated the Expert Model, is also an author on this paper and so it is unclear what if any validation of the Expert Model could reasonably have been undertaken – the E model is therefore likely to suffer from the same drawbacks as the S model. It is also unclear if Professor Sandburg knew the results of the S Model before parameterising his E Model – although this seems highly likely given that 25% of the survey’s respondents were Professor Sandburg’s co-authors. This could be a major source of bias, since presumably the authors would prefer the two models to agree and the expert parameterising the model is a co-author. I also think more work is needed to be done establishing the Expert’s credentials in the field of agricultural R&D (necessary for at least some of the parameter estimates); although I happily accept Professor Sandburg is a world expert on existential risk and a clear choice to act as the parameterising ‘expert’ for most parameters, I think there may have been alternative choices (such as agricultural economists) who may have been more obviously suited to giving some estimates. There is no methodological reason why one expert had to be selected to populate the whole table, and no defence given in the text for why one expert was selected - the paper is highly multidisciplinary and it would be surprising if any one individual had expert knowledge of every relevant element. Overall, this limitation makes me extremely hesitant to accept the authors’ argument that the fact that S model and E model are both robust means the conclusion is equally robust

    Generally, I am sympathetic to the authors’ claim that there is unavoidable uncertainty in the investigation of the far future. However, the survey is a very major source of avoidable uncertainty, and it is not a reasonable decision of the authors to present the uncertainty due to their application of survey methods as the same kind of thing as uncertainty about the future potential of humanity. There are a number of steps the authors could have taken to improve the validity and reliability of their survey results, some of which would not even have required rerunning the survey (to be clear however, I think there is a good case for rerunning the survey to ensure a broader panel of responses). With the exception of the survey, however, methods were generally appropriate and valid.

    Parameter estimates

    Notwithstanding my concerns about the use of the survey instrument, I have some object level concerns with specific parameters described in the model.

    • The discount rate for both costs and benefits appears to be zero, which is very nonstandard in economic evaluation. Although the authors make reference to “long termism, the view that the future should have a near zero discount rate”, the reference for this position leads to a claim that a zero rate of pure time preference is common, and a footnote observing that “the consensus against discounting future well-being is not universal”. To be clear, pure time preference is only one component of a well-constructed discount rate and therefore a discount rate should still be applied for costs, and probably for future benefits too. Even notwithstanding that I think this is an error of understanding, it is a limitation of the paper that discount rates were not explored, given they seem very likely to have a major impact on conclusions.
    • A second concern I have relating to parameterisation is the conceptual model leading to the authors’ proposed costing for the intervention. The authors explain their conceptual model linking nuclear war risk to agricultural decline commendably clearly, and this expands on the already strong argument in Denkenberger & Pearce (2016). However, I am less clear on their conceptual model linking approximately $86m of research to the widescale post-nuclear deployment of resilient foods. The assumption seems to be (and I stress this is my assumption based on Denkenberger & Pearce (2016) – it would help if the authors could make it explicit) that $86m purchases the ‘invention’ of the resilient food, and once the food is ‘invented’ then it can be deployed when needed with only a little bit of ongoing training (covered by the $86m). This seems to me to be an optimistic assumption; there seems to be no cost associated with disseminating the knowledge, or any raw materials necessary to culture the resilient food. Moreover, the model seems to structurally assume that distribution chains survive the nuclear exchange with 100% certainty (or that the materials are disseminated to every household which would increase costs), and that an existing resilient food pipeline exists at the moment of nuclear exchange which can smoothly take over from the non-resilient food pipeline.

    I have extremely serious reservations about these points. I think it is fair to say that an economics paper which projected benefits as far into the future as the authors do here without an exploration of discount rates would be automatically rejected by most editors, and it is not clear why the standard should be so different for existential risk analysis. A cost of $86m to mitigate approximately 40% of the impact of a full-scale nuclear war between the US and a peer country seems prima facie absurd, and the level of exploration of such an important parameter is simply not in line with best practice in a cost-effectiveness analysis (especially since this is the parameter on which we might expect the authors to be least expert). I wouldn’t want my reservations about these two points to detract from the very good and careful scholarship elsewhere in the paper, but neither do I want to give the impression that these are just minor technical details – these issues could potentially reverse the authors’ conclusions, and should have been substantially defended in the text.

    Conclusions

    Overall, this is a novel and insightful paper which is unfortunately burdened with some fairly serious conceptual issues. The authors should be commended for their clear-sighted contextualisation of resilient foods as an issue for discussion in existential risk, and for the scope of their ambition in modelling. Academia would be in a significantly better place if more authors tried to answer far-reaching questions with robust approaches, rather than making incremental contributions to unimportant topics.

    Where the issues of the paper lie are structural weaknesses with the cost-effectiveness philosophy deployed, methodological weaknesses with the survey instrument and two potentially conclusion-reversing issues with parameterisation which should have been given substantially more discussion in the text. I am not convinced that the elements of the paper which are robust are sufficiently robust to overcome these weaknesses – my view is that it would be premature to reallocate funding from AI Risk reduction to resilient food on the basis of this paper alone. The most serious conceptual issue which I think needs to be resolved before this can happen is to demonstrate that ‘do nothing’ would be less cost-effective than investing $86m in resilient foods, given that the ‘do nothing’ approach would potentially include strong market dynamics leaning towards resilient foods. I agree with the authors that an agent-based model might be appropriate for this, although a conventional supply-and-demand model might be simpler.

    I really hope the authors are interested in publishing follow-on work, looking at elements which I have highlighted in this review as being potentially misaligned to the paper that was actually published but which are nevertheless potentially important contributions to knowledge. In particular, the AI subunit is novel and important enough for its own publication.


    Evaluator details

    1. Name: Alex Bates
    2. How long have you been in this field? In the field of cost-effectiveness analysis, 10 years. I wouldn’t consider myself to be in the field of x-risk
    3. How many proposals and papers have you evaluated? I’ve lost count, but probably mid double figures - perhaps 50?