One of the key challenges in orchard robots is accurately localizing occluded fruits in complex environments, especially when the fruit targets are split into multiple isolated regions within images. Traditional single-task network models exhibit limited capability in discerning fragmented targets that belong to the same fruit but are segmented into multiple spatially isolated regions within images. In addition, fruit localization largely relies on high-cost sensors or additional 3-D localization algorithms. To address this issue, we propose a fruit detection and centroid localization method based on a Multi-Task Wavelet-Enhanced YOLO (MT-WavYOLO) to enhance the success rate of robotic operations on occluded fruit targets. Initially, a lightweight semantic segmentation branch was integrated into the YOLOv8 backbone network to precisely segment exposed fruits, while retaining the original object detection branch to fully identify occluded fruits. To address the diminished sensitivity of conventional models to geometric profiles of heavily occluded fruits, a novel feature fusion module, C2f_WTConv, was designed by incorporating wavelet transform convolution, leveraging the multi-frequency robustness of wavelet representations to enhance the model’s feature extraction capabilities under complex orchard occlusions. Subsequently, a 3D frustum-based point cloud processing method was proposed, combining the detection results from MT-WavYOLO with the semantic segmentation masks to accurately localize occluded fruits. MT-WavYOLO demonstrated a 2%, 1.5%, and 2.2% improvement in Precision, Recall, and mAP50, respectively, on our custom-built dataset compared to the latest YOLOv10s model. Semantic segmentation performance, measured by Intersection over Union (IoU) and Accuracy, was improved by 5.2% and 3.8%, respectively, over the state-of-the-art Deeplabv3+ network. Compared to the adapted multi-task network YOLOP, MT-WavYOLO achieved a 3.4% increase in mAP50 and a 2.7% improvement in IoU. In addition, MT-WavYOLO has a compact footprint of 10.2 M parameters and achieves approximately 27 FPS in real-time inference, thereby meeting the requirements of robotic harvesting operations. The proposed localization method was evaluated through 600 fruit localization tests using six different RGB-D cameras in an orchard environment. The average experimental results demonstrated that the centroid localization and radius estimation errors were reduced by 42.5%, 73.7%, 16.17%, and 11.25%, respectively, compared to traditional 3D bounding box methods and our previous approaches. These results indicate that the MT-WavYOLO combined with the frustum-based method significantly enhances the accuracy of apple localization under complex orchard conditions using consumer-grade sensors, providing a strong practical foundation for non-destructive robotic harvesting.
扫码关注我们
求助内容:
应助结果提醒方式:
