{"title":"MCCANet: A Precision and Efficient Bottom Tracking Method Based on Cross-Cue Fusion of Single and Multiple Ping Inputs","authors":"Yanxian Zhang;Guanying Huo","doi":"10.1109/TGRS.2025.3553565","DOIUrl":null,"url":null,"abstract":"The primary purpose of bottom tracking is to identify the boundary between the water column area and the image area in the side scan sonar (SSS) waterfall map. However, noise in the water column, often caused by complex measurement environments, poses significant challenges for automatic bottom tracking. Therefore, we propose a multihead cross-cue attention network (MCCANet), a lightweight network designed to achieve precision and efficient bottom tracking. MCCANet consists of four modules: the input module, encoder, feature fusion module, and decoder. The input module extracts features from one ping and five consecutive pings while maintaining dimensionally consistent outputs. The encoder employs simple 1-D convolutional layers to extract features from 1-D sequences. The feature fusion module fuses and enhances features from single and multiple pings using multihead cross-cue attention (MCCA) mechanism. Finally, the decoder reconstructs the dimensionality and maps the inputs to semantic labels. To train and evaluate the model, we annotate the NY_HudsonRiver_sss-xtf open-source dataset. Compared to the best-performing single-ping bottom tracking method, MCCANet achieves significant improvements in both intersection over union (IoU) and Dice metrics. It reduces the mean offset error (MOE) and total root-mean-square error (TRMSE) by 45.49% and 30.71%, respectively, while achieving a prediction speed of 1985 p/s. Additionally, MCCANet demonstrates robust performance on SSS datasets collected from Yunnan and Taiwan province, further validating its generalization capability. Crucially, experimental results confirm that MCCANet exhibits high robustness to noise.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-15"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937078/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The primary purpose of bottom tracking is to identify the boundary between the water column area and the image area in the side scan sonar (SSS) waterfall map. However, noise in the water column, often caused by complex measurement environments, poses significant challenges for automatic bottom tracking. Therefore, we propose a multihead cross-cue attention network (MCCANet), a lightweight network designed to achieve precision and efficient bottom tracking. MCCANet consists of four modules: the input module, encoder, feature fusion module, and decoder. The input module extracts features from one ping and five consecutive pings while maintaining dimensionally consistent outputs. The encoder employs simple 1-D convolutional layers to extract features from 1-D sequences. The feature fusion module fuses and enhances features from single and multiple pings using multihead cross-cue attention (MCCA) mechanism. Finally, the decoder reconstructs the dimensionality and maps the inputs to semantic labels. To train and evaluate the model, we annotate the NY_HudsonRiver_sss-xtf open-source dataset. Compared to the best-performing single-ping bottom tracking method, MCCANet achieves significant improvements in both intersection over union (IoU) and Dice metrics. It reduces the mean offset error (MOE) and total root-mean-square error (TRMSE) by 45.49% and 30.71%, respectively, while achieving a prediction speed of 1985 p/s. Additionally, MCCANet demonstrates robust performance on SSS datasets collected from Yunnan and Taiwan province, further validating its generalization capability. Crucially, experimental results confirm that MCCANet exhibits high robustness to noise.
底部跟踪的主要目的是在侧扫声纳(SSS)瀑布图中识别水柱区域与图像区域之间的边界。然而,水柱中的噪声通常是由复杂的测量环境引起的,这对自动海底跟踪提出了重大挑战。因此,我们提出了一个多头交叉线索注意网络(MCCANet),这是一个轻量级的网络,旨在实现精确和高效的底部跟踪。MCCANet由四个模块组成:输入模块、编码器、特征融合模块和解码器。输入模块从一个ping和五个连续ping中提取特征,同时保持维度一致的输出。编码器采用简单的一维卷积层从一维序列中提取特征。特征融合模块采用多头交叉线索注意(MCCA)机制,融合和增强来自单个和多个ping的特征。最后,解码器重建维度并将输入映射到语义标签。为了训练和评估模型,我们注释了NY_HudsonRiver_sss-xtf开源数据集。与性能最好的单ping底部跟踪方法相比,MCCANet在IoU (intersection over union)和Dice指标上都有显著改进。平均偏移误差(MOE)和总均方根误差(TRMSE)分别降低45.49%和30.71%,预测速度达到1985 p/s。此外,MCCANet在云南和台湾省收集的SSS数据集上显示了稳健的性能,进一步验证了其泛化能力。关键是,实验结果证实MCCANet对噪声具有很高的鲁棒性。
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.