Estimating tropical woody species diversity using Sentinel-2A and Random Forest: A case study in Dak Nong, Vietnam
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Mapping woody plant species (WPS) diversity in tropical forests is vital for biodiversity conservation and management, yet field surveys are labor-intensive and difficult to update. This study evaluates the capability of freely available Sentinel-2A multispectral imagery combined with Random Forest (RF) regression to estimate three alpha - diversity indices including Margalef, Shannon, and Simpson in evergreen forests of Dak Nong Province, Central Highlands of Vietnam. Field data from 202 plots were integrated with spectral bands and vegetation indices for model development. RF models achieved moderate predictive capacity, with relative RMSE (rRMSE) < 30% for all indices: Simpson (5.9%), Shannon (12.2%), and Margalef (24.6%). Although independent test R² values were low (0.045–0.234), reflecting the challenges of capturing biodiversity in structurally complex tropical forests, the estimates remained ecologically meaningful and revealed spatial diversity gradients. Variable importance analysis identified NDVI8a, NDWI, and MSI as consistently influential, emphasizing the relevance of red-edge and water-sensitive spectral features. The moving-window NDVI approach supported the Spectral Variability Hypothesis exhibited reduced predictive reliability compared to RF, indicating a trade-off between efficiency and accuracy. Overall, multispectral data and machine learning provide cost-effective and ecologically meaningful estimates that support conservation planning, hotspot identification, and long-term forest monitoring. Future improvements are expected through the integration of higher-resolution optical, radar, or LiDAR data with advanced machine-learning algorithms.