Flexible Methods for Species Distribution Modeling with Small Samples

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

  • Species distribution models (SDMs) are used for understanding where species live or could potentially live and are a key resource for ecological research and conservation decision-making. However, current SDM methods often perform poorly for rare or inadequately sampled species, which includes most species on earth as well as most of those of the greatest conservation concern.

  • Here, we evaluate the performance of three recently developed modeling approaches specifically designed for data-deficient situations: 1) plug-and-play modeling, 2) density-ratio modeling, and 3) environmental-range modeling. We compare the performance of these methods with Maxent, a widely used method. We compare model performance across sample sizes as well as comparisons limited to only data-poor species. We also ask to what extent model cross-validation performance on training data was correlated with model performance on independent, presence-absence data.

  • We show that, across all species, one or more of the plug-and-play, density-ratio, or environmental-range algorithms outperformed Maxent in 72% of cases, with three of the algorithms having AUC distributions not significantly different from Maxent’s. For data-poor species (those with 20 or fewer occurrences), 24 of the algorithms considered had AUC distributions that were not significantly different from Maxent. However, despite these comparable AUC scores, we found that the algorithm outputs (when thresholded to predict presence vs absence) spanned a wide gradient of sensitivity vs. specificity. Specificity and prediction accuracy assessed on training data were strongly correlated with specificity and prediction accuracy assessed on independent presence-absence data, however AUC and sensitivity had weak correlations. We found that only for 16% of species was the model that performed best on the training data the best performing model when evaluated on independent, presence-absence data. Finally, we show how ensembles of models that span the sensitivity-specificity gradient can represent model disagreement in poorly sampled species and improve model predictions.

  • This work supports plug-and-play, density-ratio, and environmental-range modeling as useful alternatives to Maxent, particularly for data-deficient species. While our work suggests that identifying the best model for a given species is challenging, we argue that incorporating the predictions of multiple models provides a useful way forward.

  • Data/Code for peer review

    Anonymized data and code underlying this work are publicly available at: https://anonymous.4open.science/r/Flexible_Methods_for_Small_Sample_Size_Distribution_Modeling

    Article activity feed