Global distribution of cattle, horses, goats, sheep and buffaloes at 1 km resolution for 2000 — 2022 based on subnational census data and spatiotemporal Machine Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The paper describes the production and evaluation of annual livestock densities and headcounts of cattle, horses, sheep, goats and buffaloes (including 95% probability prediction intervals) at 1~km spatial resolution for the 2000--2022 period using spatiotemporal machine learning. A compilation of subnational livestock census data has been imported, harmonized and used as reference data (55,336 census polygons and 939,257 individual data entries; covering 147 countries) to build predictive models. A large stack of multi-source harmonized raster data sets (128 individual layers) were used as features. Models were fitted using scikit-map and scikit-learn libraries with recursive feature elimination and Poisson criteria to represent the distribution of the target variable. Intermediate rasters estimating potential land for livestock production based on grassland and cropland extent, along with biophysical features, were used to estimate the spatial domain of livestock. The final predictions at 1~km were further adjusted to annual headcounts based on FAOSTAT national statistics to ensure consistency. Model benchmarking based on 10% test samples (with spatial blocking) shows that Random Forest outperforms Gradient Boosting Tree for predicting livestock densities, with CCC values of 0.603, 0.547, 0.598, 0.622, 0.689, and RMSE values of 104.59, 6.06, 64.09, 67.57, 30.37 (heads per km2) for cattle, horses, sheep, goats and buffaloes. Feature importance analysis shows that the key variables include climate and socio-economic layers, such as water vapour, aridity index, land surface temperature, travel time to the nearest cities, and religious population distribution. Further evaluation of the output layers shows similar distributions to existing global livestock products (FAO Gridded Livestock of The World --- GLW, and Annual Gridded Livestock of the World — AGLW). The spatial domain of livestock (active grazing/forage areas) is often difficult to validate, with many countries having very specific management cultures that can not be seamlessly represented using existing global raster layers, hence modeling distribution of livestock per country using local country-specific features (instead of using global models) could help increase accuracy, specially for regional/local applications. The modeling pipeline is open source and available on Github (https://github.com/wri/global-pasture-watch) with output layers (both original ML predictions and FAOSTAT-adjusted values) publicly available under CC-BY license on Zenodo (https://doi.org/10.5281/zenodo.17491242).

Article activity feed