Parsimonious machine learning models to estimate environmental footprints of crop production
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Crop production is a major driver of anthropogenic impacts on the environment. Quantifying these impacts requires primary farm-level life cycle inventory data, which are sparse and difficult to collect. Therefore, parsimonious models are needed to predict environmental footprints of crop production. Here, we test the ability of five machine learning modelling techniques (Random Forest (RF), K-Nearest Neighbours (KNN), Artificial Neural Network (ANN), Generalised Boosting Method (GBM), and Linear Modelling (LM)) to predict biodiversity, water and climate footprints of crop production based on limited aggregated farming information, using life cycle data from 121 crops across 57 countries. We found RF to be the most parsimonious model with four predictors for biodiversity (R 2 = 0.88), GBM using five predictors for water (R 2 = 0.89) and ANN for climate footprint (R 2 = 0.58) with four predictors. Uncertainty in model predictions is +/- a factor 2.1, 5.1, and 4.0 (95% confidence interval) for the three footprints, respectively. Key farming information to predict biodiversity and climate footprints are yield, fertiliser, electricity and climatic region. Irrigation, fertiliser and pesticide are important predictors of water footprint. Our study offers predictive models highlighting key predictors of environmental footprints of crop production, to be prioritised for data gap filling.