Integrating Presence-only and Abundance Data to Predict Baobab (Adansonia digitata L.) Distribution: A Bayesian Data Fusion Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Species distribution models (SDMs) are vital tools in ecology and conservation. The integration of increasingly available citizen science data with planned survey data offers a significant opportunity to enhance species distribution estimates. While integrated SDMs often combine presence-only and abundance data, the interdependence between the conditional distributions of these outcomes remains to be elucidated. This study proposes a Bayesian spatial fusion modelling framework to jointly analyse presence-only and abundance data for the African baobab in Benin. The aim was to understand and map the spatial variation in the species’ distribution. We briefly reviewed process-based models for count and point process data and explored various data fusion strategies using Integrated Nested Laplace Approximations (INLA) and Stochastic Partial Differential Equations (SPDE) for inference. The results revealed a heterogeneous baobab distribution across Benin, characterised by a spatial autocorrelation range of 34.4 km (95% Bayesian credible interval, BCI = 27.59-42.52). Key drivers of this distribution include environmental factors such as annual temperature, rainfall of the driest month, soil texture (silt/clay fractions), and slope. A spatial fusion model incorporating shared latent components and common covariates' effects demonstrated the highest performance level, surpassing alternative fusion approaches. The model achieved the highest mean composite scores for the Area Under the ROC Curve (AUC) (0.85±0.02), accuracy (0.77±0.02), and True Skill Statistics (TSS) (0.66±0.05). The shared component model has the capacity to explain datasets' interdependence, estimate covariate effects missed by separate models, and enhance prediction precision. Despite relying on the assumption of an identically shared spatial signal across target responses, this research underscores the potential of spatial fusion modelling for integrating disparate data sources. The findings contribute to advancing SDM inference, particularly in data-limited contexts, and have wider applicability to spatial regression problems involving multisource outcomes.

Article activity feed