Addressing Unobserved Covariates in Species Distribution Models: Impacts on Inferential Quality and Mitigation via Joint Species Distribution Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Species distribution models (SDMs) are widely used in ecology to assess the distribution of species populations across space and time. Correlative SDMs, in particular, are used to infer relationships between species records and environmental variables. A classical approach for implementing this type of SDMs is to employ generalized linear mixed models (GLMMs) as a parametric regression method. However, due to the complexity of species-environment relationships, species distributions may depend on unobserved or unmeasurable covariates. In this article, we first recall certain mathematical results showing that such “omitted covariates” typically introduce statistical issues that can bias the inference of observed covariate effects or yield improper confidence intervals. So far, these results have received little attention in ecology. We then present a comprehensive simulation-based investigation of the statistical impact of unobserved covariates on the inference performance of GL(M)Ms for continuous, count, and binary data. We assessed various regression methods, including both frequentist and Bayesian SDMs, and so-called joint species distribution models (JSDMs) used to account for interspecific covariations in presence–absence data. Our work demonstrates that JSDMs provide a robust statistical approach that mitigates inferential issues arising in SDMs due to missing covariates and enables reliable estimates of environmental effects. We further complemented these simulation results by applying JSDMs and SDMs to several ecological datasets, revealing discrepancies between SDM and JSDM estimation of environmental effects and a better predictive capacity for JSDMs than for SDMs. As a general recommendation, we encourage ecologists and practitioners to consider fitting JSDMs when dealing with community data to be able to evaluate whether any information can be extracted from between-species residuals. Ultimately, our results remain broadly applicable to GL(M)Ms in which important variables are suspected of being omitted, in which case generalized linear latent variable models (GLLVMs) could properly correct inference when different entities might share the same omitted important covariate.