Accurate and efficient fruit detection and localization are essential for the development of automated harvesting systems. Existing mango detection approaches encounter challenges in complex orchard conditions, including variable lighting, foliage occlusion, small targets, and fruit overlapping. To address these challenges, this study developed a robust and lightweight detection model that maintains both high accuracy and computational efficiency, making it suitable for real-time applications. To support model training and validation, the dataset was collected from a mango orchard under diverse illumination and occlusion scenarios, comprising 353 sets of synchronized RGB, depth, and thermal images. To leverage these multimodal data, a fusion strategy was proposed by integrating RGB (textural feature), depth (spatial structure feature), and thermal images (temperature feature) to exploit their complementary strengths. Experimental results using YOLOv8n as the baseline demonstrated that trimodal fusion significantly outperformed single-modality inputs, achieving a 97.2 % average precision (AP), which was 2.4 % higher than the best single-modality. Based on this, GLS-YOLOv8n was proposed by incorporating GhostHGNetv2 as a lightweight backbone, a lightweight shared convolutional detection head (Detect-LSCD) for efficient small-object detection, and C2f-Star module for optimized multimodal feature fusion. At a speed of 65.7 fps, GLS-YOLOv8n achieved an AP of 98.5 % by reducing the parameter size from 3.0 M to 1.4 M (53 % reduction), floating-point operations (FLOPs) from 8.2 G to 5.0 G (39 % reduction), and compressing the model size from 5.98 to 3.06 MB (49 % reduction). The findings of this study demonstrated that GLS-YOLOv8n achieved a good balance between accuracy and efficiency, making it suitable for real-time mango detection under natural environments.
扫码关注我们
求助内容:
应助结果提醒方式:
