Performance Evaluation of YOLOv11 and YOLOv12 Deep Learning Architectures for Automated Detection and Classification of Immature Macauba (Acrocomia aculeata) Fruits

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The automated detection and classification of immature macauba (Acrocomia aculeata) fruits is critical for improving post-harvest processing and quality control. In this study, we present a comparative evaluation of two state-of-the-art YOLO architectures, YOLOv11x and YOLOv12x, trained on the newly constructed VIC01 dataset comprising 1600 annotated images captured under both background-free and natural background conditions. Both models were implemented in PyTorch and trained until the convergence of box regression, classification, and distribution-focal losses. Under an IoU (intersection over union) threshold of 0.50, YOLOv11x and YOLOv12x achieved an identical mean average precision (mAP50) of 0.995 with perfect precision and recall or TPR (true positive rate). Averaged over IoU thresholds from 0.50 to 0.95, YOLOv11x demonstrated superior spatial localization performance (mAP50–95 = 0.973), while YOLOv12x exhibited robust performance in complex background scenarios, achieving a competitive mAP50–95. Inference throughput averaged 3.9 ms per image for YOLOv11x and 6.7 ms for YOLOv12x, highlighting a trade-off between speed and architectural complexity. Fused model representations revealed optimized layer fusion and reduced computational overhead (GFLOPs), facilitating efficient deployment. Confusion-matrix analyses confirmed YOLOv11x’s ability to reject background clutter more effectively than YOLOv12x, whereas precision–recall and F1-score curves indicated both models maintain near-perfect detection balance across thresholds. The public release of the VIC01 dataset and trained weights ensures reproducibility and supports future research. Our results underscore the importance of selecting architectures based on application-specific requirements, balancing detection accuracy, background discrimination, and computational constraints. Future work will extend this framework to additional maturation stages, sensor fusion modalities, and lightweight edge-deployment variants. By facilitating precise immature fruit identification, this work contributes to sustainable production and value addition in macauba processing.

Article activity feed