Feature extraction and feature fusion are crucial for sonar image target detection. In terms of feature extraction, due to device limitations and the complexity of the underwater environment, sonar images often exhibit high levels of noise, which results in high similarity between targets and background, thus affecting feature extraction. In terms of feature fusion, transformer-based models rely on self-attention mechanisms, but this leads to a lack of local prior information. The interference from noise and the similarity between targets and background disrupt the computation of global relationships, confusing noisy features with useful ones, leading to insufficient geometric information and ultimately affecting detection accuracy. To address these issues, we propose an advanced detection framework that combines effective feature extraction and multi-scale feature fusion. We introduce a cross-scale channel attention module that dynamically adjusts channel weights by integrating the advantages of the squeeze-and-excitation (SE) module and the efficient multi-scale attention (EMA) module, capturing multi-scale dependencies, suppressing background noise, and enhancing global feature representation. Moreover, to further improve the effectiveness of feature fusion and better leverage geometric information, we design a CNN-based feature fusion perception aggregation network. This network promotes interaction between low-level geometric details and high-level semantic information through skip connections, enhancing feature representation and improving detection accuracy. Experimental results show that our method outperforms some advanced detection models in terms of detection performance.
扫码关注我们
求助内容:
应助结果提醒方式:
