Machine Vision and Deep Learning for Robotic Harvesting of Shiitake Mushrooms
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Automation and computer vision are increasingly vital in modern agriculture, yet mushroom harvesting remains largely manual due to complex morphology and occluded growing environments. This study investigates the application of deep learning–based instance segmentation and keypoint detection to enable robotic harvesting of Lentinula edodes (shiitake) mushrooms. A dedicated RGB-D image dataset, the first open-access RGB-D dataset for mushroom harvesting, was created using a Microsoft Azure DK 3D camera under varied lighting and backgrounds. Two state-of-the-art segmentation models, YOLOv8-seg and Detectron2 Mask R-CNN, were trained and evaluated under identical conditions to compare accuracy, inference speed, and robustness. YOLOv8 achieved higher mean average precision (mAP = 67.9) and significantly faster inference, while Detectron2 offered comparable qualitative performance and greater flexibility for integration into downstream robotic systems. Experiments comparing RGB and RG-D inputs revealed minimal accuracy differences, suggesting that colour cues alone provide sufficient information for reliable segmentation. A proof-of-concept keypoint-detection model demonstrated the feasibility of identifying stem cut-points for robotic manipulation. These findings confirm that deep learning–based vision systems can accurately detect and localise mushrooms in complex environments, forming a foundation for fully automated harvesting. Future work will focus on expanding datasets, incorporating true four-channel RGB-D networks, and integrating perception with robotic actuation for intelligent agricultural automation.