Tree Species Detection and Enhancing Semantic Segmentation Using Machine Learning Models with Integrated Multispectral Channels from PlanetScope and Digital Aerial Photogrammetry in Young Boreal Forest
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The precise identification and classification of tree species in young forests during their early development stages are vital for forest management and silvicultural efforts that support their growth and renewal. Yet, achieving accurate geolocation and species classification is often a labor-intensive and complex task through field-based surveys. Remote sensing technologies combined with machine learning techniques present an encouraging solution, offering a more efficient alternative to conventional field-based methods. This study aimed to detect and classify young forest tree species by employing remote sensing imagery and machine learning techniques. The study involved mainly two different objectives: first, the tree species detection using the latest version of You only Look Once (YOLOv12) and second, semantic segmentation (classification) using random forest, Categorical Boosting (CatBoost) and Convolutional Neural Network (CNN). To the best of our knowledge, this marks the first exploration utilizing YOLOv12 for tree species identification, along with the groundbreaking study that integrates digital aerial photogrammetry with Planet imagery to achieve semantic segmentation in young forest. The study utilized two remote sensing datasets: RGB imagery from UAV ortho photography and RGB-NIR from PlanetScope. For YOLOv12-based tree species detection, only RGB from ortho photography was used, while semantic segmentation was performed with three sets of data: (1) Ortho RGB, (2) Ortho RGB + canopy height model (CHM) + Planet RGB-NIR (8 Bands), and (3) ortho RGB + CHM + Planet RGB-NIR + 12 vegetation indices (20 Bands). With three models applied to these datasets, a total of nine machine learning models were trained and tested using total of 57 images (1024×1024 pixels) and their corresponding mask tiles. The YOLOv12 model achieved 79% overall accuracy, with Scots pine performing best (precision: 97%, recall: 92%, mAP50: 97%, mAP75: 80%) and Norway spruce showing slightly lower accuracy (precision: 94%, recall: 82%, mAP50: 90%, mAP75: 71%). For semantic segmentation, the CatBoost model with 20 Bands outperformed other models, achieving 85% accuracy, 80% Kappa, and 81% MCC, with CHM, EVI, NIRPlanet, GreenPlanet, NDGI, GNDVI, and NDVI being the most influential variables. These results indicate that a simple boosting model like CatBoost can outperform more complex CNNs for semantic segmentation in young forests.