Object detection-based intrusion object detection models have been widely applied in railway scene. However, due to the semi-open characteristics inherent in railway environments, the variety of foreign objects is difficult to enumerate exhaustively, which poses severe challenges for railway intrusion object detection. Among existing methods, multimodal open-set detection methods exhibit modal imbalance and over-generalization problems, whereas few-shot detection methods demonstrate limited generalization capacity and degraded detection performance under complex railway environments. To address these problems, this paper proposes ROSD. To address the unreasonable generalization problem of existing methods in railway scene, we construct a CLIP-oriented fine-tuning network CLIP-MLP to obtain target-aware strong generalization capability, and controlled the generalization to the actually required level through a threshold-based multi-granularity classification mechanism. To tackle the modal imbalance problem of multimodal models in the absence of text input, we construct a pseudo-text generation mechanism for model training and establish an image-to-text modal mapping mechanism for model inference. To address the deteriorated detection capability of the model in complex railway scene, we develop an aspect ratio feature enhancement module and a multimodal aspect ratio prediction head to optimize geometric feature extraction and category classification for railway scene targets. Experimental results demonstrate that ROSD achieves a comprehensive detection accuracy of 88.2% mAP on the railway dataset RSDS, which is 4.3% higher than the SOTA model. On the RSDS and COCO datasets, the detection accuracy for novel categories reaches 78.2% mAP and 39.2% mAP respectively, which are 2.2% and 4% higher than the SOTA model.
扫码关注我们
求助内容:
应助结果提醒方式:
