Global distribution of cattle, goats, sheep and horses at 1-km resolution for 2000—2022 based on sub-national census data and spatiotemporal Machine Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The paper describes the production and evaluation of annual livestock densities of cattle, horses, sheep and goats (including per-pixel 95% probability prediction intervals) at 1~km spatial resolution for the 2000—2022 period using spatiotemporal Machine Learning. A compilation of subnational livestock census data has been imported, harmonized and used as reference data (52,883 census polygons and 678,266 individual data points; covering 86% of the potential land for livestock production) to build predictive models using correlation with a large stack of multi-source harmonized gridded/raster spatial layers (307 individual raster spatial layers harmonized at 1~km spatial resolution). Models were fitted using scikit-learn library with Recursive Feature Elimination and Poisson criteria to represent the distribution of the target variable. Intermediate layers estimating potential land for livestock production based on grassland and cropland extent, along with biophysical and socioeconomic predictors, were used to estimate the spatial domain of livestock. The final predictions at 1~km were further adjusted to annual headcounts based on FAOSTAT national statistics to ensure consistency. Model benchmarking based on 10% hold-out samples and cross-validation with refitting shows that Random Forest outperforms Gradient Boosting Tree for predicting livestock densities, with hold-out validation yielding R-square values of 0.437, 0.53, 0.574, 0.552, and RMSE values of 124.38, 4.01, 42.89 and 23.20 (heads per km-square) for cattle, horses, sheep and goats, respectively. Variable importance analysis shows that the key predictors include socio-economic layers, such as travel time to the nearest ports and cities, annual sub-national Gross Domestic Product (GDP) and religious population distribution. Further evaluation of maps shows that predictions suffer from large gaps in training data in parts of Africa and Asia; the spatial domain of livestock (active grazing/forage areas) is often difficult to validate, with many countries having very specific management cultures that can not be seamlessly represented using existing global raster layers, hence modeling distribution of livestock per country could help increase accuracy. The modeling pipeline is open source and available on Github (https://github.com/wri/global-pasture-watch) with output maps (ML predictions and FAOSTAT-adjusted values) publicly available under CC-BY license on Zenodo (https://doi.org/10.5281/zenodo.14933636).