A call to ensure reproducibility of machine learning applications in industrial ecology
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Machine learning (ML) usage in industrial ecology (IE) has grown nearly tenfold in the last decade. In other fields, similar increases in ML adoption have led to the widespread publication of results that cannot be reproduced. This uptick in irreproducibility, driven by a failure to follow best-practices when creating and reporting models - undermines the conclusions and credibility of science. Industrial ecologists have not yet determined whether reproducibility is becoming an issue of concern in their applications of ML. In order to assess this risk, we audited 50 recent IE studies against a ML reproducibility ontology. We find that 84% of surveyed studies suffer from computational reproducibility issues, and 28% exhibit methodological flaws that could introduce data leakage and invalidate findings. Yet, bibliometric analysis shows these potentially irreproducible studies are cited as or more frequently than their non-ML counterparts, which could embed flawed results into the scientific literature. Our findings serve as a call to action for the IE community. We suggest multi-level interventions, including that journals adopt reproducibility checklists and that reviewers prioritize key reproducibility errors over performance metrics, to safeguard the field and maximize the reproducibility of future ML-driven IE research.