{"title":"MSSF-DCNet: multi-scale selective fusion with dense connectivity network for sonar image object detection","authors":"Yu Dong, Jianlei Zhang, Chun-yan Zhang","doi":"10.1117/12.3032084","DOIUrl":null,"url":null,"abstract":"In the field of underwater target recognition, forward-looking sonar images are widely applied in underwater rescue operations. The emergence of object detection technologies powered by deep learning has significantly enhanced the ability to recognize underwater targets. In object detection, the neck network, serving as a critical intermediary component, plays a vital role. However, traditional Feature Pyramid Networks (FPN) have two main problems: 1) During the feature fusion process, FPN does not modify the importance of features across various levels, resulting in imbalanced features at different scales and loss of scale information. 2) Lack of effective information transmission between features of different scales. In this article, we propose a novel neck network architecture, Multi Scale Selective Fusion with Dense Connectivity Network (MSSF-DCNet), which encompasses two components to tackle the previously mentioned challenges. The first one is the Multi Scale Selection Module, which effectively balances the weights of features at different levels during the feature fusion process by calculating and weighting weights for different scales, better preserving scale information. The second one is the Cross Scale Dense Connection module, which exchanges information between different feature layer levels. The model is capable of capturing global context information at every layer. thereby improving the detection capability of the neck network. By replacing the FPN with MSSF-DCNet in the Faster R-CNN framework, our model achieves an increase in Average Precision (AP) by 1.2, 4.0, and 2.6 points using MobileNet-v2, ResNet50, and SwinTransformer backbones, respectively. Furthermore, when employing ResNet50 as the backbone, MSSF-DCNet enhances the RetinaNet by 3.4 AP and ATSS by 4.1 AP. At the same time, we compared different neck networks with MSSF-DCNet on the Faster R-CNN baseline network, and MSSF-DCNet achieved the best performance in all metrics.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":" 16","pages":"131711U - 131711U-10"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of underwater target recognition, forward-looking sonar images are widely applied in underwater rescue operations. The emergence of object detection technologies powered by deep learning has significantly enhanced the ability to recognize underwater targets. In object detection, the neck network, serving as a critical intermediary component, plays a vital role. However, traditional Feature Pyramid Networks (FPN) have two main problems: 1) During the feature fusion process, FPN does not modify the importance of features across various levels, resulting in imbalanced features at different scales and loss of scale information. 2) Lack of effective information transmission between features of different scales. In this article, we propose a novel neck network architecture, Multi Scale Selective Fusion with Dense Connectivity Network (MSSF-DCNet), which encompasses two components to tackle the previously mentioned challenges. The first one is the Multi Scale Selection Module, which effectively balances the weights of features at different levels during the feature fusion process by calculating and weighting weights for different scales, better preserving scale information. The second one is the Cross Scale Dense Connection module, which exchanges information between different feature layer levels. The model is capable of capturing global context information at every layer. thereby improving the detection capability of the neck network. By replacing the FPN with MSSF-DCNet in the Faster R-CNN framework, our model achieves an increase in Average Precision (AP) by 1.2, 4.0, and 2.6 points using MobileNet-v2, ResNet50, and SwinTransformer backbones, respectively. Furthermore, when employing ResNet50 as the backbone, MSSF-DCNet enhances the RetinaNet by 3.4 AP and ATSS by 4.1 AP. At the same time, we compared different neck networks with MSSF-DCNet on the Faster R-CNN baseline network, and MSSF-DCNet achieved the best performance in all metrics.