Marine surveys by robotic underwater and surface vehicles result in substantial quantities of coral reef imagery, however labeling these images is expensive and time-consuming for domain experts. Point label propagation is a technique that uses existing images labeled with sparse points to create augmented ground truth data, which can be used to train a semantic segmentation model. In this work, we show that recent advances in large foundation models facilitate the creation of augmented ground truth masks using only features extracted by the denoised version of the DIstillation of knowledge with NO labels version 2 (DINOv2) foundation model and K-nearest neighbors (KNN), without any pretraining. For images with extremely sparse labels, we use human-in-the-loop principles to enhance annotation efficiency: if there are five point labels per image, our method outperforms the prior state-of-the-art by 19.7% for mean intersection over union (mIoU). When human-in-the-loop labeling is not available, using the denoised DINOv2 features with a KNN still improves on the prior state-of-the-art by 5.8% for mIoU (five grid points). On the semantic segmentation task, we outperform the prior state-of-the-art by 13.5% for mIoU when only five point labels are used for point label propagation. In addition, we perform a comprehensive study into the number and placement of point labels, and make several recommendations for improving the efficiency of labeling images with points.
{"title":"Human-in-the-Loop Segmentation of Multispecies Coral Imagery","authors":"Scarlett Raine;Ross Marchant;Brano Kusy;Frederic Maire;Niko Sünderhauf;Tobias Fischer","doi":"10.1109/JOE.2025.3625691","DOIUrl":"https://doi.org/10.1109/JOE.2025.3625691","url":null,"abstract":"Marine surveys by robotic underwater and surface vehicles result in substantial quantities of coral reef imagery, however labeling these images is expensive and time-consuming for domain experts. Point label propagation is a technique that uses existing images labeled with sparse points to create augmented ground truth data, which can be used to train a semantic segmentation model. In this work, we show that recent advances in large foundation models facilitate the creation of augmented ground truth masks using only features extracted by the denoised version of the DIstillation of knowledge with NO labels version 2 (DINOv2) foundation model and K-nearest neighbors (KNN), without any pretraining. For images with extremely sparse labels, we use human-in-the-loop principles to enhance annotation efficiency: if there are five point labels per image, our method outperforms the prior state-of-the-art by 19.7% for mean intersection over union (mIoU). When human-in-the-loop labeling is not available, using the denoised DINOv2 features with a KNN still improves on the prior state-of-the-art by 5.8% for mIoU (five grid points). On the semantic segmentation task, we outperform the prior state-of-the-art by 13.5% for mIoU when only five point labels are used for point label propagation. In addition, we perform a comprehensive study into the number and placement of point labels, and make several recommendations for improving the efficiency of labeling images with points.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"51 1","pages":"762-779"},"PeriodicalIF":5.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1109/JOE.2024.3450532
Miao Yang;Jinyang Zhong;Hansen Zhang;Can Pan;Xinmiao Gao;Chenglong Gong
Underwater object detection (UOD) is more difficult than object detection in air due to the noise caused by irrelevant objects and textures, and the scale variation. These difficulties pose a higher challenge to the feature extraction capability of detectors. Feature pyramid network (FPN) enhances the scale detection capability of detectors, while attention mechanisms effectively suppress irrelevant features. We present a cross-scale attention feature pyramid network (CSAFPN) for UOD. A feature fusion guided (FFG) module is incorporated in the CSAFPN, which constructs cross-scale context information and simultaneously guides the enhancement of all feature maps. Compared to existing FPN-like architectures, CSAFPN excels not only in capturing cross-scale long-range dependencies but also in acquiring compact multi-scale feature maps that specifically emphasize target regions. Extensive experiments on the Brackish2019 data set show that CSAFPN can achieve consistent improvements on various backbones and detectors. Moreover, FFG can be seamlessly integrated into any FPN-like architecture, offering a cost-effective improvement in UOD, resulting in a 1.4% average precision (AP) increase for FPN, a 1.3% AP increase for PANet, and a 1.4% AP increase for neural architecture search-FPN.
{"title":"Cross-Scale Attention Feature Pyramid Network for Challenged Underwater Object Detection","authors":"Miao Yang;Jinyang Zhong;Hansen Zhang;Can Pan;Xinmiao Gao;Chenglong Gong","doi":"10.1109/JOE.2024.3450532","DOIUrl":"https://doi.org/10.1109/JOE.2024.3450532","url":null,"abstract":"Underwater object detection (UOD) is more difficult than object detection in air due to the noise caused by irrelevant objects and textures, and the scale variation. These difficulties pose a higher challenge to the feature extraction capability of detectors. Feature pyramid network (FPN) enhances the scale detection capability of detectors, while attention mechanisms effectively suppress irrelevant features. We present a cross-scale attention feature pyramid network (CSAFPN) for UOD. A feature fusion guided (FFG) module is incorporated in the CSAFPN, which constructs cross-scale context information and simultaneously guides the enhancement of all feature maps. Compared to existing FPN-like architectures, CSAFPN excels not only in capturing cross-scale long-range dependencies but also in acquiring compact multi-scale feature maps that specifically emphasize target regions. Extensive experiments on the Brackish2019 data set show that CSAFPN can achieve consistent improvements on various backbones and detectors. Moreover, FFG can be seamlessly integrated into any FPN-like architecture, offering a cost-effective improvement in UOD, resulting in a 1.4% average precision (AP) increase for FPN, a 1.3% AP increase for PANet, and a 1.4% AP increase for neural architecture search-FPN.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"51 1","pages":"826-835"},"PeriodicalIF":5.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1109/JOE.2025.3617906
Yuyun Chen;Wenguang He;Gangqiang Xiong;Junwu Li;Yaomin Wang
Underwater images often suffer from color distortion and detail loss due to light absorption and scattering, which degrades visual quality and limits practical applications. To address these issues, a novel adaptive algorithm for underwater image color correction and detail enhancement is proposed. The algorithm first applies threshold stretching to adjust the grayscale range, enhancing contrast while mitigating the risk of localized overcompensation. Based on the color distribution, images are categorized into bluish and greenish tones, providing the foundation for the adaptive color compensation method (ACCM). The ACCM is designed to separately compensate different color channels, using the green channel as a reference to restore the most degraded channels while maintaining overall color balance. The compensation process is further constrained by the minimum color loss criterion to ensure consistent color fidelity. Furthermore, an edge detail enhancement method is formulated to recover fine details by amplifying intensity differences between the original image and its smoothed version. Extensive experiments on multiple underwater image data sets demonstrate that the proposed algorithm consistently outperforms state-of-the-art methods, achieving average improvements of 0.0021, 0.0646, 0.3677, and 0.0800 in underwater color image quality evaluation, underwater image quality metric, fog aware density evaluator, and colorfulness contrast fog density index metrics, respectively, underscoring its effectiveness and robustness across diverse underwater environments.
{"title":"NA-UICDE: A Novel Adaptive Algorithm for Underwater Image Color Correction and Detail Enhancement","authors":"Yuyun Chen;Wenguang He;Gangqiang Xiong;Junwu Li;Yaomin Wang","doi":"10.1109/JOE.2025.3617906","DOIUrl":"https://doi.org/10.1109/JOE.2025.3617906","url":null,"abstract":"Underwater images often suffer from color distortion and detail loss due to light absorption and scattering, which degrades visual quality and limits practical applications. To address these issues, a novel adaptive algorithm for underwater image color correction and detail enhancement is proposed. The algorithm first applies threshold stretching to adjust the grayscale range, enhancing contrast while mitigating the risk of localized overcompensation. Based on the color distribution, images are categorized into bluish and greenish tones, providing the foundation for the adaptive color compensation method (ACCM). The ACCM is designed to separately compensate different color channels, using the green channel as a reference to restore the most degraded channels while maintaining overall color balance. The compensation process is further constrained by the minimum color loss criterion to ensure consistent color fidelity. Furthermore, an edge detail enhancement method is formulated to recover fine details by amplifying intensity differences between the original image and its smoothed version. Extensive experiments on multiple underwater image data sets demonstrate that the proposed algorithm consistently outperforms state-of-the-art methods, achieving average improvements of 0.0021, 0.0646, 0.3677, and 0.0800 in underwater color image quality evaluation, underwater image quality metric, fog aware density evaluator, and colorfulness contrast fog density index metrics, respectively, underscoring its effectiveness and robustness across diverse underwater environments.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"51 1","pages":"794-806"},"PeriodicalIF":5.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1109/JOE.2025.3606045
Xupeng Wu;Jian Wang;Jing Wang;Shenghui Rong;Bo He
Underwater monocular depth estimation serves as the foundation for tasks such as 3-D reconstruction of underwater scenes. However, due to the water medium and the absorption and scattering of light in water, the underwater environment undergoes a distinctive imaging process, which presents challenges in accurately estimating depth from a single image. The existing methods fail to consider the unique characteristics of underwater environments, leading to inadequate estimation results and limited generalization performance. Furthermore, underwater depth estimation requires extracting and fusing both local and global features, which is not fully explored in existing methods. In this article, an end-to-end learning framework for underwater monocular depth estimation called UMono is presented, which incorporates underwater image formation model characteristics into the network architecture, and effectively utilizes both local and global features of an underwater image. Specifically, UMono consists of an encoder with a hybrid architecture of a convolutional neural network (CNN) and Transformer and a decoder guided by a medium transmission map. First, we develop an underwater deep feature extraction (UDFE) block, which leverages the CNN and Transformer in parallel to achieve comprehensive extraction of both local and global features. These features are effectively integrated via the proposed local–global feature fusion (LGFF) module. By stacking the UDFE block as the basic unit, we constructed a hybrid encoder that generates four-stage hierarchical features. Subsequently, the medium transmission map is incorporated into the network as underwater domain knowledge, together with the encoded hierarchical features, is fed into the underwater depth information aggregation (UDIA) module, which aggregates depth information from the physical model and the neural network by a proposed cross attention mechanism. Then, the aggregated features serve as the guiding information for each decoding stage, facilitating the model in achieving comprehensive scene understanding and precise depth estimation. The final estimated depth map is obtained through consecutive upsampling processing. Experimental results demonstrate that the proposed method is effective for underwater monocular depth estimation and outperforms the existing methods in both quantitative and qualitative analyses.
{"title":"UMono: Physical-Model-Informed Hybrid CNN–Transformer Framework for Underwater Monocular Depth Estimation","authors":"Xupeng Wu;Jian Wang;Jing Wang;Shenghui Rong;Bo He","doi":"10.1109/JOE.2025.3606045","DOIUrl":"https://doi.org/10.1109/JOE.2025.3606045","url":null,"abstract":"Underwater monocular depth estimation serves as the foundation for tasks such as 3-D reconstruction of underwater scenes. However, due to the water medium and the absorption and scattering of light in water, the underwater environment undergoes a distinctive imaging process, which presents challenges in accurately estimating depth from a single image. The existing methods fail to consider the unique characteristics of underwater environments, leading to inadequate estimation results and limited generalization performance. Furthermore, underwater depth estimation requires extracting and fusing both local and global features, which is not fully explored in existing methods. In this article, an end-to-end learning framework for underwater monocular depth estimation called UMono is presented, which incorporates underwater image formation model characteristics into the network architecture, and effectively utilizes both local and global features of an underwater image. Specifically, UMono consists of an encoder with a hybrid architecture of a convolutional neural network (CNN) and Transformer and a decoder guided by a medium transmission map. First, we develop an underwater deep feature extraction (UDFE) block, which leverages the CNN and Transformer in parallel to achieve comprehensive extraction of both local and global features. These features are effectively integrated via the proposed local–global feature fusion (LGFF) module. By stacking the UDFE block as the basic unit, we constructed a hybrid encoder that generates four-stage hierarchical features. Subsequently, the medium transmission map is incorporated into the network as underwater domain knowledge, together with the encoded hierarchical features, is fed into the underwater depth information aggregation (UDIA) module, which aggregates depth information from the physical model and the neural network by a proposed cross attention mechanism. Then, the aggregated features serve as the guiding information for each decoding stage, facilitating the model in achieving comprehensive scene understanding and precise depth estimation. The final estimated depth map is obtained through consecutive upsampling processing. Experimental results demonstrate that the proposed method is effective for underwater monocular depth estimation and outperforms the existing methods in both quantitative and qualitative analyses.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"51 1","pages":"780-793"},"PeriodicalIF":5.3,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-05DOI: 10.1109/JOE.2025.3628413
{"title":"2025 Index IEEE Journal of Oceanic Engineering","authors":"","doi":"10.1109/JOE.2025.3628413","DOIUrl":"https://doi.org/10.1109/JOE.2025.3628413","url":null,"abstract":"","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"50 4","pages":"1-57"},"PeriodicalIF":5.3,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11230041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1109/JOE.2025.3604170
Junjie Wen;Guidong Yang;Benyun Zhao;Lei Lei;Zhi Gao;Xi Chen;Ben M. Chen
Underwater environments present significant challenges, such as image degradation and domain discrepancies, that severely impact object detection performance. Traditional approaches often use image enhancement as a preprocessing step, but this adds computational overhead, latency, and can even degrade detection accuracy. To address these issues, we propose a novel underwater object detection framework that jointly trains image enhancement within a multitask architecture. This framework employs a progressive training strategy to iteratively improve detection performance through enhancement and introduces a domain-adaptation mechanism to align features across domains at both image and object levels. Experimental results demonstrate that our method achieves state-of-the-art performance across diverse data sets, with real-time detection at 105.93 frames per second and over +15$%$ mean average precision absolute improvement in unseen environments, underscoring its potential for real-world underwater applications.
{"title":"Joint Image Enhancement for Underwater Object Detection in Various Domains","authors":"Junjie Wen;Guidong Yang;Benyun Zhao;Lei Lei;Zhi Gao;Xi Chen;Ben M. Chen","doi":"10.1109/JOE.2025.3604170","DOIUrl":"https://doi.org/10.1109/JOE.2025.3604170","url":null,"abstract":"Underwater environments present significant challenges, such as image degradation and domain discrepancies, that severely impact object detection performance. Traditional approaches often use image enhancement as a preprocessing step, but this adds computational overhead, latency, and can even degrade detection accuracy. To address these issues, we propose a novel underwater object detection framework that jointly trains image enhancement within a multitask architecture. This framework employs a progressive training strategy to iteratively improve detection performance through enhancement and introduces a domain-adaptation mechanism to align features across domains at both image and object levels. Experimental results demonstrate that our method achieves state-of-the-art performance across diverse data sets, with real-time detection at 105.93 frames per second and over +15<inline-formula><tex-math>$%$</tex-math></inline-formula> mean average precision absolute improvement in unseen environments, underscoring its potential for real-world underwater applications.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"51 1","pages":"807-825"},"PeriodicalIF":5.3,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1109/JOE.2025.3590072
Zhi Xia;Ziang Zhang;Jie Shi;Zihao Zhao
Strong narrowband signals mixed with weak broadband signals can interfere with the direction estimation of broadband targets using the cross-correlation method of a two-hydrophone array. To tackle this problem, this article proposes a method for separating narrowband signals from sonar-received signals. The proposed method integrates neural network autoencoders with time-frequency masking techniques. We have designed a lightweight neural network autoencoder that can be trained using purely simulated data. This autoencoder extracts narrowband line spectrum features from the time–frequency distribution of the sonar-received signals and generates time–frequency masks. Subsequently, the time-frequency masking method is employed to isolate the narrowband components from the sonar-received signals. The proposed method was validated using data from the SWellEx-96 experiment. In the cross-correlation results of the original data from two hydrophones, strong narrowband signals dominated and obscured the weaker broadband signals. By removing the narrowband components from the two-hydrophone signals using the method proposed in this article, the cross-correlation of the processed data clearly revealed the time history of the broadband signal characteristics. This result confirms the effectiveness of the proposed method.
{"title":"Research on Ship Radiated Noise Separation Method Based on Neural Network Autoencoder Combined With Time–Frequency Masking","authors":"Zhi Xia;Ziang Zhang;Jie Shi;Zihao Zhao","doi":"10.1109/JOE.2025.3590072","DOIUrl":"https://doi.org/10.1109/JOE.2025.3590072","url":null,"abstract":"Strong narrowband signals mixed with weak broadband signals can interfere with the direction estimation of broadband targets using the cross-correlation method of a two-hydrophone array. To tackle this problem, this article proposes a method for separating narrowband signals from sonar-received signals. The proposed method integrates neural network autoencoders with time-frequency masking techniques. We have designed a lightweight neural network autoencoder that can be trained using purely simulated data. This autoencoder extracts narrowband line spectrum features from the time–frequency distribution of the sonar-received signals and generates time–frequency masks. Subsequently, the time-frequency masking method is employed to isolate the narrowband components from the sonar-received signals. The proposed method was validated using data from the SWellEx-96 experiment. In the cross-correlation results of the original data from two hydrophones, strong narrowband signals dominated and obscured the weaker broadband signals. By removing the narrowband components from the two-hydrophone signals using the method proposed in this article, the cross-correlation of the processed data clearly revealed the time history of the broadband signal characteristics. This result confirms the effectiveness of the proposed method.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"50 4","pages":"3248-3263"},"PeriodicalIF":5.3,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-15DOI: 10.1109/JOE.2025.3582350
Xi Zhao;Qiangqiang Yuan;Hongzhou Chai;Zhongyang Yuan
The limited availability of in situ ocean observations poses significant challenges to real-time oceanographic applications, particularly in hydroacoustic measurements where accuracy critically depends on the spatiotemporal variability of sound speed. To address the sparsity of sound-speed profile (SSPs), this study proposes an advanced modeling framework for constructing regional sound speed fields by integrating temporal and spatial dynamics. Specifically, the variation mechanisms of temperature and salinity are analyzed, and empirical orthogonal function decomposition is used to extract compact SSP representations. A novel multimodel temporal prediction architecture, combining seasonal-trend decomposition using LOESS, long short-term memory, multivariate unsupervised domain adaptation, and inverted Transformer, captures complex seasonal and adaptive patterns. Meanwhile, spatial modeling adopts a particle swarm optimization least squares support vector machine approach to enhance interpolation across diverse marine environments. Experiments show that the model outperforms existing methods, achieving a root-mean-square error of 0.812 m/s and a mean absolute percentage error of 0.037%. Its robust prediction capability supports accurate multibeam bathymetric processing even without direct SSP observations, confirming its practical value for real-time ocean mapping.
{"title":"A New Region Ocean Sound Velocity Field Model Considering Variation Mechanisms of Temperature and Salt","authors":"Xi Zhao;Qiangqiang Yuan;Hongzhou Chai;Zhongyang Yuan","doi":"10.1109/JOE.2025.3582350","DOIUrl":"https://doi.org/10.1109/JOE.2025.3582350","url":null,"abstract":"The limited availability of in situ ocean observations poses significant challenges to real-time oceanographic applications, particularly in hydroacoustic measurements where accuracy critically depends on the spatiotemporal variability of sound speed. To address the sparsity of sound-speed profile (SSPs), this study proposes an advanced modeling framework for constructing regional sound speed fields by integrating temporal and spatial dynamics. Specifically, the variation mechanisms of temperature and salinity are analyzed, and empirical orthogonal function decomposition is used to extract compact SSP representations. A novel multimodel temporal prediction architecture, combining seasonal-trend decomposition using LOESS, long short-term memory, multivariate unsupervised domain adaptation, and inverted Transformer, captures complex seasonal and adaptive patterns. Meanwhile, spatial modeling adopts a particle swarm optimization least squares support vector machine approach to enhance interpolation across diverse marine environments. Experiments show that the model outperforms existing methods, achieving a root-mean-square error of 0.812 m/s and a mean absolute percentage error of 0.037%. Its robust prediction capability supports accurate multibeam bathymetric processing even without direct SSP observations, confirming its practical value for real-time ocean mapping.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"50 4","pages":"3218-3234"},"PeriodicalIF":5.3,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-28DOI: 10.1109/JOE.2025.3586648
Xiaohui Chu;Zhenzhe Hou;Haoran Duan;Lijun Xu;Runze Hu
Knowledge distillation (KD) is a predominant technique to streamline deep-learning-based recognition models for practical underwater deployments. However, existing KD methods for underwater acoustic target recognition face two problems: 1) the knowledge learning paradigm is not very consistent with the characteristics of underwater acoustics and 2) the complexity of acoustic signals in ocean environments leads to different prediction capacities in teacher and student models. This induces feature misalignment in the knowledge transfer, rendering suboptimal results. To address these problems, we propose a new distillation paradigm, i.e., channel–spatial aligned global knowledge distillation (CSGKD). Considering that the channel features (indicating the loudness of signals) and spatial features (indicating the propagation patterns of signals) in Mel spectrograms are discriminative for acoustic signal recognition, we design the knowledge-transferring scheme from “channel–spatial” aspects for effective feature extraction. Furthermore, CSGKD introduces a global multilayer alignment strategy, where all student layers collectively correspond to a single teacher layer. This allows the student model to dissect acoustic signals at a granular level, thereby capturing intricate patterns and nuances. CSGKD achieves a seamless blend of richness and efficiency, ensuring swift processing while being detail oriented. Extensive experiments on two real-world oceanic data sets confirm the superior performance of CSGKD compared to existing KD methods, i.e., achieving an accuracy (ACC) of 82.37% ($uparrow$ 2.49% versus 79.88%). Notably, CSGKD showcases an 8.87% improvement in the ACC of the lightweight student model.
{"title":"Channel–Spatial Aligned Global Knowledge Distillation for Underwater Acoustic Target Recognition","authors":"Xiaohui Chu;Zhenzhe Hou;Haoran Duan;Lijun Xu;Runze Hu","doi":"10.1109/JOE.2025.3586648","DOIUrl":"https://doi.org/10.1109/JOE.2025.3586648","url":null,"abstract":"Knowledge distillation (KD) is a predominant technique to streamline deep-learning-based recognition models for practical underwater deployments. However, existing KD methods for underwater acoustic target recognition face two problems: 1) the knowledge learning paradigm is not very consistent with the characteristics of underwater acoustics and 2) the complexity of acoustic signals in ocean environments leads to different prediction capacities in teacher and student models. This induces feature misalignment in the knowledge transfer, rendering suboptimal results. To address these problems, we propose a new distillation paradigm, i.e., channel–spatial aligned global knowledge distillation (CSGKD). Considering that the channel features (indicating the loudness of signals) and spatial features (indicating the propagation patterns of signals) in Mel spectrograms are discriminative for acoustic signal recognition, we design the knowledge-transferring scheme from “channel–spatial” aspects for effective feature extraction. Furthermore, CSGKD introduces a global multilayer alignment strategy, where all student layers collectively correspond to a single teacher layer. This allows the student model to dissect acoustic signals at a granular level, thereby capturing intricate patterns and nuances. CSGKD achieves a seamless blend of richness and efficiency, ensuring swift processing while being detail oriented. Extensive experiments on two real-world oceanic data sets confirm the superior performance of CSGKD compared to existing KD methods, i.e., achieving an accuracy (ACC) of 82.37% (<inline-formula><tex-math>$uparrow$</tex-math></inline-formula> 2.49% versus 79.88%). Notably, CSGKD showcases an 8.87% improvement in the ACC of the lightweight student model.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"50 4","pages":"3145-3159"},"PeriodicalIF":5.3,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-20DOI: 10.1109/JOE.2025.3583780
Mohammad Reza Mousavi;Len Zedel
This article presents an acoustic method for remotely measuring the ocean sound speed profile using a single directional transmitter and at least two receivers. By employing cross-correlation techniques to estimate the time of flight of echo-received signals, the proposed approach calculates both the average sound speeds and the depths of ocean reflectors, resulting in the estimation of the sound speed profile. To validate the method, simulations are conducted using a ray acoustic propagation model that includes both time-invariant conditions and time-varying statistical effects. Key system parameters, including pulse characteristics, transducer geometry, signal-to-noise ratio, and reflector density, are analyzed. The accuracy of the estimated sound speed profiles is assessed by comparing them with the input profiles used in the simulation model. Using the proposed approach, a nonuniform average sound speed profile is measured with a root-mean-square error of 0.67 m/s up to 125 m, using a 22 m transducer array, highlighting its practicality for ocean sound speed monitoring. Experimental evaluation was conducted with a 7.39-m array in the National Research Council of Canada towing tank (200 m × 12 m × 7 m). The tank results showed a standard deviation below 2.5 m/s up to 20-m range, increasing to 12 m/s at 40 m due to reduced signal-to-clutter ratio in the high-interference environment.
本文提出了一种用单方向发射机和至少两个接收机远程测量海洋声速剖面的声学方法。该方法利用互相关技术估计回波接收信号的飞行时间,同时计算平均声速和海洋反射体的深度,从而估计声速剖面。为了验证该方法,使用包含时不变条件和时变统计效应的射线声传播模型进行了模拟。分析了系统关键参数,包括脉冲特性、换能器几何形状、信噪比和反射器密度。通过与仿真模型中使用的输入声速分布进行比较,评估了估计声速分布的准确性。利用所提出的方法,利用22米传感器阵列测量了非均匀平均声速剖面,在125米范围内的均方根误差为0.67 m/s,突出了其在海洋声速监测中的实用性。实验评价采用加拿大国家研究委员会拖曳槽(200 m × 12 m × 7 m)的7.39 m阵列进行。坦克测试结果显示,在20米范围内,标准偏差低于2.5米/秒,在40米范围内,由于在高干扰环境中降低了信杂波比,标准偏差增加到12米/秒。
{"title":"Ocean Sound Speed Profile Measurement Using a Pulse–Echo Technique","authors":"Mohammad Reza Mousavi;Len Zedel","doi":"10.1109/JOE.2025.3583780","DOIUrl":"https://doi.org/10.1109/JOE.2025.3583780","url":null,"abstract":"This article presents an acoustic method for remotely measuring the ocean sound speed profile using a single directional transmitter and at least two receivers. By employing cross-correlation techniques to estimate the time of flight of echo-received signals, the proposed approach calculates both the average sound speeds and the depths of ocean reflectors, resulting in the estimation of the sound speed profile. To validate the method, simulations are conducted using a ray acoustic propagation model that includes both time-invariant conditions and time-varying statistical effects. Key system parameters, including pulse characteristics, transducer geometry, signal-to-noise ratio, and reflector density, are analyzed. The accuracy of the estimated sound speed profiles is assessed by comparing them with the input profiles used in the simulation model. Using the proposed approach, a nonuniform average sound speed profile is measured with a root-mean-square error of 0.67 m/s up to 125 m, using a 22 m transducer array, highlighting its practicality for ocean sound speed monitoring. Experimental evaluation was conducted with a 7.39-m array in the National Research Council of Canada towing tank (200 m × 12 m × 7 m). The tank results showed a standard deviation below 2.5 m/s up to 20-m range, increasing to 12 m/s at 40 m due to reduced signal-to-clutter ratio in the high-interference environment.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"50 4","pages":"3184-3200"},"PeriodicalIF":5.3,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}