Agriculture surrounding monitoring and object identification based on optimized You Only Look Once and Single Shot Multibox Detector setups using combined vision and thermal images

Krzysztof Tarasiuk
Arkadiusz Mystkowski
Michał Ostaszewski
Konrad K. Kwaśniewski
Andrzej Majka
Jacek Czarnigowski

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper presents a monitoring and object identification method in agricultural environments using both vision and thermal images. We evaluate two distinct approaches: a dual-network architecture, where separate models are trained for each image, and a unified network that integrates both data types into a single processing stream. Multiple prototypes based on You Only Look Once version 8 (YOLOv8) and Single Shot Multibox Detector (SSD) architectures were developed. YOLOv8 abandons the use of Cross Stage Partial (CSP) layers in favor of a simplified architecture based on C2f modules. In this work, we show that this modification reduces architectural complexity and enhances both computational efficiency and inference speed, during object class identification. The SSD design includes the removal of conv5_x , avgpool, fc and softmax layers from the original model and the setting of all strides in conv4_x to 1×1. The backbone is followed by 5 additional convolutional layers, to which five detection heads are attached, and the sixth head is attached to the conv4_x layer. Experimental results show differences between dual and single networks, where the mean Average Precision (mAP@0.5) changes from 0.88 to 0.90. The unified model provides improvement in overall performance due to information fusion during object identification from vision and thermal imagery data streams. The most significant variation was observed when transitioning from YOLOv8 to SSD architecture, where YOLOv8 outperformed SSD by achieving higher mAP@0.5 scores of 0.98 for the Harvester class and 0.94 for the Tractor class. Compared to SSD where mAP@0.5 achieved 0.91 and 0.88, respectively.

Version published to 10.21203/rs.3.rs-7930774/v1 on Research Square
Nov 13, 2025

TAF-YOLO: A Small-Object Detection Network for UAV Aerial Imagery via Visible and Infrared Adaptive Fusion

This article has 7 authors:
1. Zhanhong Zhuo
2. Ruitao Lu
3. Yongxiang Yao
4. Siyu Wang
5. Zhi Zheng
6. Jing Zhang
7. Xiaogang Yang
This article has no evaluationsLatest version Nov 6, 2025
Enhanced YOLOv11n for Small Object Detection in UAV Imagery: Higher Accuracy with Fewer Parameters

This article has 2 authors:
1. Hongkai zhu
2. Xianghua Xie
This article has no evaluationsLatest version Oct 15, 2025
An Enhanced YOLO Framework for Small Object Detection in Complex Agricultural Environments

This article has 2 authors:
1. Zihan Yang
2. Huiyu Xiang
This article has no evaluationsLatest version Sep 25, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

TAF-YOLO: A Small-Object Detection Network for UAV Aerial Imagery via Visible and Infrared Adaptive Fusion

Enhanced YOLOv11n for Small Object Detection in UAV Imagery: Higher Accuracy with Fewer Parameters

An Enhanced YOLO Framework for Small Object Detection in Complex Agricultural Environments