Challenging underwater environmental conditions cause severe degradation of underwater images, including intensity attenuation and colour distortion, leading to incomplete feature representation and posing difficulties for state-of-the-art detectors. To address this issue, we propose a cross-channel feature fusion network (CCFF-Net), which innovatively enriches feature representation through complementary feature fusion across multiple channels. The network comprises three key components. First, an adaptive greyscale image generation module is designed to dynamically enhance channel information specialized for object appearance, selectively reinforcing detector-preferred cues in greyscale representation. Second, a cross-channel feature fusion module is introduced to facilitate information interaction between greyscale and chromatic channels, compensating for potential feature degradation caused by colour distortion and intensity attenuation by generating complementary and enhanced feature representations. Third, an enhanced feature pyramid network-based multi-scale feature fusion module is proposed to improve detection performance by reinforcing feature representations for both small-scale and occluded objects. Extensive experiments on four public datasets validate the effectiveness of CCFF-Net, achieving mAP improvements of 2.9%, 2.6%, 4.0% and 1.8% compared to the baseline on the DUO, UODD, UDD and URPC2020 datasets, respectively, demonstrating the superiority and generalization capability of the proposed method.
{"title":"CCFF-Net: Cross-Channel Feature Fusion Network for Underwater Object Detection","authors":"Zhe Chen, Man Zhou, Yantao Zhu, Mingwei Shen","doi":"10.1049/ipr2.70299","DOIUrl":"https://doi.org/10.1049/ipr2.70299","url":null,"abstract":"<p>Challenging underwater environmental conditions cause severe degradation of underwater images, including intensity attenuation and colour distortion, leading to incomplete feature representation and posing difficulties for state-of-the-art detectors. To address this issue, we propose a cross-channel feature fusion network (CCFF-Net), which innovatively enriches feature representation through complementary feature fusion across multiple channels. The network comprises three key components. First, an adaptive greyscale image generation module is designed to dynamically enhance channel information specialized for object appearance, selectively reinforcing detector-preferred cues in greyscale representation. Second, a cross-channel feature fusion module is introduced to facilitate information interaction between greyscale and chromatic channels, compensating for potential feature degradation caused by colour distortion and intensity attenuation by generating complementary and enhanced feature representations. Third, an enhanced feature pyramid network-based multi-scale feature fusion module is proposed to improve detection performance by reinforcing feature representations for both small-scale and occluded objects. Extensive experiments on four public datasets validate the effectiveness of CCFF-Net, achieving mAP improvements of 2.9%, 2.6%, 4.0% and 1.8% compared to the baseline on the DUO, UODD, UDD and URPC2020 datasets, respectively, demonstrating the superiority and generalization capability of the proposed method.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70299","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taner Cevik, Nazife Cevik, Ali Pasaoglu, Fatih Sahin, Farzad Kiani, Muhammet Sait Ag
This study introduces a novel attention-guided Discrete Wavelet Transform (DWT)-based steganography framework, named Attention-Guided Feature Perturbation (AGFP), which integrates deep visual attention maps with transform-domain embedding to enhance imperceptibility, robustness, and steganalysis resistance. Unlike recent deep-learning-based steganographic systems such as iSCMIS, JARS-Net, and RMSteg, which achieve high visual fidelity but are susceptible to statistical detection, AGFP perturbs only those wavelet coefficients that are identified as perceptually and statistically stable by attention mechanisms extracted from pre-trained CNN models (VGG19, ResNet50, AlexNet, and GoogLeNet). The proposed method is evaluated on the USC-SIPI dataset and the BOSSBase 1.01 benchmark. Experimental results show that AGFP achieves PSNR values between 64.29 and 55.43 dB and SSIM scores between 0.9999 and 0.9989 across varying payloads, indicating consistently high visual quality. While iSCMIS reports slightly higher PSNR and SSIM values, AGFP significantly outperforms all compared methods in bit error rate (BER)—achieving 0.01–0.12, compared to 0.45–0.47 for iSCMIS, 0.31–0.37 for RMSteg, and 0.57–0.75 for JARS-Net. Furthermore, AGFP attains the lowest RS, SPA, and SRM steganalysis detection scores among both classical and deep-learning-based systems. These results confirm that AGFP offers a more balanced and secure steganographic solution, combining high imperceptibility with substantially enhanced robustness and detectability resistance, positioning it as a strong alternative to recent deep-learning-based steganographic frameworks.
{"title":"AGFP: A Deep Attention-Guided Framework for DWT-Based Image Steganography","authors":"Taner Cevik, Nazife Cevik, Ali Pasaoglu, Fatih Sahin, Farzad Kiani, Muhammet Sait Ag","doi":"10.1049/ipr2.70288","DOIUrl":"https://doi.org/10.1049/ipr2.70288","url":null,"abstract":"<p>This study introduces a novel attention-guided Discrete Wavelet Transform (DWT)-based steganography framework, named Attention-Guided Feature Perturbation (AGFP), which integrates deep visual attention maps with transform-domain embedding to enhance imperceptibility, robustness, and steganalysis resistance. Unlike recent deep-learning-based steganographic systems such as iSCMIS, JARS-Net, and RMSteg, which achieve high visual fidelity but are susceptible to statistical detection, AGFP perturbs only those wavelet coefficients that are identified as perceptually and statistically stable by attention mechanisms extracted from pre-trained CNN models (VGG19, ResNet50, AlexNet, and GoogLeNet). The proposed method is evaluated on the USC-SIPI dataset and the BOSSBase 1.01 benchmark. Experimental results show that AGFP achieves PSNR values between 64.29 and 55.43 dB and SSIM scores between 0.9999 and 0.9989 across varying payloads, indicating consistently high visual quality. While iSCMIS reports slightly higher PSNR and SSIM values, AGFP significantly outperforms all compared methods in bit error rate (BER)—achieving 0.01–0.12, compared to 0.45–0.47 for iSCMIS, 0.31–0.37 for RMSteg, and 0.57–0.75 for JARS-Net. Furthermore, AGFP attains the lowest RS, SPA, and SRM steganalysis detection scores among both classical and deep-learning-based systems. These results confirm that AGFP offers a more balanced and secure steganographic solution, combining high imperceptibility with substantially enhanced robustness and detectability resistance, positioning it as a strong alternative to recent deep-learning-based steganographic frameworks.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70288","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The analysis of posterior scleral images is crucial for understanding the progression of myopia and developing effective interventions. However, due to the subtle edge variations in posterior scleral images and their high memory requirements, existing 3D models face limitations in memory consumption, while 2D models struggle to capture cross-slice spatial information, making it difficult to balance the performance and memory usage. Therefore, we propose SFM-UNet, a novel model designed specifically for OCT analysis of the posterior sclera. SFM-UNet integrates frequency and spatial information through a cross-slice spatial-frequency memory module, effectively capturing the complex spatial relationships in 3D images. We conducted experiments on a private dataset EyePS2024 and the publicly available BraTS2020 dataset, where SFM-UNet demonstrated competitive performance across all datasets, demonstrating its effectiveness and practicality in posterior sclera OCT analysis.
{"title":"SFM-UNet: A Spatial-Frequency Memorizable Model for Posterior Sclera OCT Analysis","authors":"Jiashu Xu, Fanglin Chen","doi":"10.1049/ipr2.70297","DOIUrl":"https://doi.org/10.1049/ipr2.70297","url":null,"abstract":"<p>The analysis of posterior scleral images is crucial for understanding the progression of myopia and developing effective interventions. However, due to the subtle edge variations in posterior scleral images and their high memory requirements, existing 3D models face limitations in memory consumption, while 2D models struggle to capture cross-slice spatial information, making it difficult to balance the performance and memory usage. Therefore, we propose SFM-UNet, a novel model designed specifically for OCT analysis of the posterior sclera. SFM-UNet integrates frequency and spatial information through a cross-slice spatial-frequency memory module, effectively capturing the complex spatial relationships in 3D images. We conducted experiments on a private dataset EyePS2024 and the publicly available BraTS2020 dataset, where SFM-UNet demonstrated competitive performance across all datasets, demonstrating its effectiveness and practicality in posterior sclera OCT analysis.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70297","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146129955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huantong Geng, Long Fang, Yingrui Wang, Zhenyu Liu, Zichen Fan
In urban road scenes, the cross-domain data distribution differences caused by light and weather changes make the generalization performance of single-domain trained object detectors in unknown weather scenarios significantly degraded (e.g., daytime-sunny trained models in dusk-rainy scenarios reduce the detection accuracy by more than 46% on average); moreover, faster R-CNN suffers from insufficient generalization capability and inefficient real-time inference due to architectural constraints. To address the above challenges, this paper proposes the CEN-RTDETR method to improve the single-domain generalization capability through a collaborative enhancement strategy: CP-Mix color channel permutation dynamically simulates the RGB color channel permutation to simulate the multi-sky color bias phenomenon, and enhances the color robustness of the input data; NP feature normalization perturbation applies a random feature perturbation to the channel statistics of the shallow feature map to optimize the extraction of texture, color and other basic style features; CORAL Loss minimizes the difference in feature distribution between the source domain and the virtual target domain through second-order statistical matching. The experimental results show that CEN-RTDETR achieves significant performance improvement on the cross-weather scenario dataset DWD: The mean average precision ([email protected]) across different weather scenarios increases from 39.66% to 42.72% (+3.06%), especially for dusk-rainy and night-rainy scenarios, where [email protected] rises from 32.9% and 18.6% to 39.0% (+6.1%) and 24.1% (+5.5%) in extreme weather. The method in this paper effectively solves the cross-domain generalization and efficiency problems in single-domain generalized object detection, which provides new technical possibilities for real-time detection in complex urban road scenes.
{"title":"CEN-RTDETR: A Co-Enhancement-Based Real-Time Single-Domain Generalized Object Detection for Road Scenes","authors":"Huantong Geng, Long Fang, Yingrui Wang, Zhenyu Liu, Zichen Fan","doi":"10.1049/ipr2.70294","DOIUrl":"https://doi.org/10.1049/ipr2.70294","url":null,"abstract":"<p>In urban road scenes, the cross-domain data distribution differences caused by light and weather changes make the generalization performance of single-domain trained object detectors in unknown weather scenarios significantly degraded (e.g., daytime-sunny trained models in dusk-rainy scenarios reduce the detection accuracy by more than 46% on average); moreover, faster R-CNN suffers from insufficient generalization capability and inefficient real-time inference due to architectural constraints. To address the above challenges, this paper proposes the CEN-RTDETR method to improve the single-domain generalization capability through a collaborative enhancement strategy: CP-Mix color channel permutation dynamically simulates the RGB color channel permutation to simulate the multi-sky color bias phenomenon, and enhances the color robustness of the input data; NP feature normalization perturbation applies a random feature perturbation to the channel statistics of the shallow feature map to optimize the extraction of texture, color and other basic style features; CORAL Loss minimizes the difference in feature distribution between the source domain and the virtual target domain through second-order statistical matching. The experimental results show that CEN-RTDETR achieves significant performance improvement on the cross-weather scenario dataset DWD: The mean average precision ([email protected]) across different weather scenarios increases from 39.66% to 42.72% (+3.06%), especially for dusk-rainy and night-rainy scenarios, where [email protected] rises from 32.9% and 18.6% to 39.0% (+6.1%) and 24.1% (+5.5%) in extreme weather. The method in this paper effectively solves the cross-domain generalization and efficiency problems in single-domain generalized object detection, which provides new technical possibilities for real-time detection in complex urban road scenes.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70294","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The assessment of segmentation quality plays a fundamental role in the development, optimization, and comparison of segmentation methods which are used in a wide range of applications. With few exceptions, quality assessment is performed using traditional metrics, which are based on counting the number of erroneous pixels but do not capture the spatial distribution of errors. Established distance-based metrics such as the directed average Hausdorff distance (