SF-YOLO11: A Real-Time Winter Jujube Detection Model Based on Lightweight Multi-Scale Fusion
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
To address the challenges of detecting winter jujubes in orchard environments—including dense small targets, variable illumination, complex backgrounds, and limited edge deployment capabilities—this paper proposes SF-YOLO11, a lightweight real-time detection model. First, a ternary channel importance metric is proposed that integrates gradient, variance, and task relevance to construct an adaptive pruning strategy. This strategy optimizes C3K2 into PrunedC3K2, achieving 31% model compression. Second, the LightSPPF module is designed through computational graph reconstruction. This module employs a three-stage architecture comprising depthwise separable convolution, adaptive pooling, and dynamic fusion, reducing computational cost by 40%. Finally, we transform multi-scale feature processing into sequence modeling and design the ISSFF module to capture scale dependencies through bidirectional LSTM and multi-head attention mechanisms, thereby improving small target detection accuracy. Experimental results demonstrate that SF-YOLO11 achieves an mAP@0.5 of 92.89%, representing a 3.68% improvement over the baseline. The model contains 1.6M parameters, requires 4.3 GFLOPs of computation, and operates at 128 FPS, achieving optimizations of 38.5%, 33.8%, and 14%, respectively. After INT8 quantization, the model size is reduced to only 1.7 MB, and the FPS increases to 215. In cross-fruit transfer experiments, the model achieves zero-shot mAP@0.5 values of 68.34% and 71.52% on grape and cherry tomato datasets, respectively. After fine-tuning, these values improve to 85.23% and 87.12%, respectively. Robustness tests validate the model's performance under extreme conditions, including strong illumination, low illumination, and occlusion.