首页 > 最新文献

IET Image Processing最新文献

英文 中文
CCFF-Net: Cross-Channel Feature Fusion Network for Underwater Object Detection CCFF-Net:用于水下目标检测的跨通道特征融合网络
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1049/ipr2.70299
Zhe Chen, Man Zhou, Yantao Zhu, Mingwei Shen

Challenging underwater environmental conditions cause severe degradation of underwater images, including intensity attenuation and colour distortion, leading to incomplete feature representation and posing difficulties for state-of-the-art detectors. To address this issue, we propose a cross-channel feature fusion network (CCFF-Net), which innovatively enriches feature representation through complementary feature fusion across multiple channels. The network comprises three key components. First, an adaptive greyscale image generation module is designed to dynamically enhance channel information specialized for object appearance, selectively reinforcing detector-preferred cues in greyscale representation. Second, a cross-channel feature fusion module is introduced to facilitate information interaction between greyscale and chromatic channels, compensating for potential feature degradation caused by colour distortion and intensity attenuation by generating complementary and enhanced feature representations. Third, an enhanced feature pyramid network-based multi-scale feature fusion module is proposed to improve detection performance by reinforcing feature representations for both small-scale and occluded objects. Extensive experiments on four public datasets validate the effectiveness of CCFF-Net, achieving mAP improvements of 2.9%, 2.6%, 4.0% and 1.8% compared to the baseline on the DUO, UODD, UDD and URPC2020 datasets, respectively, demonstrating the superiority and generalization capability of the proposed method.

{"title":"CCFF-Net: Cross-Channel Feature Fusion Network for Underwater Object Detection","authors":"Zhe Chen,&nbsp;Man Zhou,&nbsp;Yantao Zhu,&nbsp;Mingwei Shen","doi":"10.1049/ipr2.70299","DOIUrl":"https://doi.org/10.1049/ipr2.70299","url":null,"abstract":"<p>Challenging underwater environmental conditions cause severe degradation of underwater images, including intensity attenuation and colour distortion, leading to incomplete feature representation and posing difficulties for state-of-the-art detectors. To address this issue, we propose a cross-channel feature fusion network (CCFF-Net), which innovatively enriches feature representation through complementary feature fusion across multiple channels. The network comprises three key components. First, an adaptive greyscale image generation module is designed to dynamically enhance channel information specialized for object appearance, selectively reinforcing detector-preferred cues in greyscale representation. Second, a cross-channel feature fusion module is introduced to facilitate information interaction between greyscale and chromatic channels, compensating for potential feature degradation caused by colour distortion and intensity attenuation by generating complementary and enhanced feature representations. Third, an enhanced feature pyramid network-based multi-scale feature fusion module is proposed to improve detection performance by reinforcing feature representations for both small-scale and occluded objects. Extensive experiments on four public datasets validate the effectiveness of CCFF-Net, achieving mAP improvements of 2.9%, 2.6%, 4.0% and 1.8% compared to the baseline on the DUO, UODD, UDD and URPC2020 datasets, respectively, demonstrating the superiority and generalization capability of the proposed method.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70299","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AGFP: A Deep Attention-Guided Framework for DWT-Based Image Steganography AGFP:基于dwt的图像隐写的深度注意引导框架
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1049/ipr2.70288
Taner Cevik, Nazife Cevik, Ali Pasaoglu, Fatih Sahin, Farzad Kiani, Muhammet Sait Ag

This study introduces a novel attention-guided Discrete Wavelet Transform (DWT)-based steganography framework, named Attention-Guided Feature Perturbation (AGFP), which integrates deep visual attention maps with transform-domain embedding to enhance imperceptibility, robustness, and steganalysis resistance. Unlike recent deep-learning-based steganographic systems such as iSCMIS, JARS-Net, and RMSteg, which achieve high visual fidelity but are susceptible to statistical detection, AGFP perturbs only those wavelet coefficients that are identified as perceptually and statistically stable by attention mechanisms extracted from pre-trained CNN models (VGG19, ResNet50, AlexNet, and GoogLeNet). The proposed method is evaluated on the USC-SIPI dataset and the BOSSBase 1.01 benchmark. Experimental results show that AGFP achieves PSNR values between 64.29 and 55.43 dB and SSIM scores between 0.9999 and 0.9989 across varying payloads, indicating consistently high visual quality. While iSCMIS reports slightly higher PSNR and SSIM values, AGFP significantly outperforms all compared methods in bit error rate (BER)—achieving 0.01–0.12, compared to 0.45–0.47 for iSCMIS, 0.31–0.37 for RMSteg, and 0.57–0.75 for JARS-Net. Furthermore, AGFP attains the lowest RS, SPA, and SRM steganalysis detection scores among both classical and deep-learning-based systems. These results confirm that AGFP offers a more balanced and secure steganographic solution, combining high imperceptibility with substantially enhanced robustness and detectability resistance, positioning it as a strong alternative to recent deep-learning-based steganographic frameworks.

{"title":"AGFP: A Deep Attention-Guided Framework for DWT-Based Image Steganography","authors":"Taner Cevik,&nbsp;Nazife Cevik,&nbsp;Ali Pasaoglu,&nbsp;Fatih Sahin,&nbsp;Farzad Kiani,&nbsp;Muhammet Sait Ag","doi":"10.1049/ipr2.70288","DOIUrl":"https://doi.org/10.1049/ipr2.70288","url":null,"abstract":"<p>This study introduces a novel attention-guided Discrete Wavelet Transform (DWT)-based steganography framework, named Attention-Guided Feature Perturbation (AGFP), which integrates deep visual attention maps with transform-domain embedding to enhance imperceptibility, robustness, and steganalysis resistance. Unlike recent deep-learning-based steganographic systems such as iSCMIS, JARS-Net, and RMSteg, which achieve high visual fidelity but are susceptible to statistical detection, AGFP perturbs only those wavelet coefficients that are identified as perceptually and statistically stable by attention mechanisms extracted from pre-trained CNN models (VGG19, ResNet50, AlexNet, and GoogLeNet). The proposed method is evaluated on the USC-SIPI dataset and the BOSSBase 1.01 benchmark. Experimental results show that AGFP achieves PSNR values between 64.29 and 55.43 dB and SSIM scores between 0.9999 and 0.9989 across varying payloads, indicating consistently high visual quality. While iSCMIS reports slightly higher PSNR and SSIM values, AGFP significantly outperforms all compared methods in bit error rate (BER)—achieving 0.01–0.12, compared to 0.45–0.47 for iSCMIS, 0.31–0.37 for RMSteg, and 0.57–0.75 for JARS-Net. Furthermore, AGFP attains the lowest RS, SPA, and SRM steganalysis detection scores among both classical and deep-learning-based systems. These results confirm that AGFP offers a more balanced and secure steganographic solution, combining high imperceptibility with substantially enhanced robustness and detectability resistance, positioning it as a strong alternative to recent deep-learning-based steganographic frameworks.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70288","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SFM-UNet: A Spatial-Frequency Memorizable Model for Posterior Sclera OCT Analysis SFM-UNet:用于后巩膜OCT分析的空间-频率可记忆模型
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-02 DOI: 10.1049/ipr2.70297
Jiashu Xu, Fanglin Chen

The analysis of posterior scleral images is crucial for understanding the progression of myopia and developing effective interventions. However, due to the subtle edge variations in posterior scleral images and their high memory requirements, existing 3D models face limitations in memory consumption, while 2D models struggle to capture cross-slice spatial information, making it difficult to balance the performance and memory usage. Therefore, we propose SFM-UNet, a novel model designed specifically for OCT analysis of the posterior sclera. SFM-UNet integrates frequency and spatial information through a cross-slice spatial-frequency memory module, effectively capturing the complex spatial relationships in 3D images. We conducted experiments on a private dataset EyePS2024 and the publicly available BraTS2020 dataset, where SFM-UNet demonstrated competitive performance across all datasets, demonstrating its effectiveness and practicality in posterior sclera OCT analysis.

{"title":"SFM-UNet: A Spatial-Frequency Memorizable Model for Posterior Sclera OCT Analysis","authors":"Jiashu Xu,&nbsp;Fanglin Chen","doi":"10.1049/ipr2.70297","DOIUrl":"https://doi.org/10.1049/ipr2.70297","url":null,"abstract":"<p>The analysis of posterior scleral images is crucial for understanding the progression of myopia and developing effective interventions. However, due to the subtle edge variations in posterior scleral images and their high memory requirements, existing 3D models face limitations in memory consumption, while 2D models struggle to capture cross-slice spatial information, making it difficult to balance the performance and memory usage. Therefore, we propose SFM-UNet, a novel model designed specifically for OCT analysis of the posterior sclera. SFM-UNet integrates frequency and spatial information through a cross-slice spatial-frequency memory module, effectively capturing the complex spatial relationships in 3D images. We conducted experiments on a private dataset EyePS2024 and the publicly available BraTS2020 dataset, where SFM-UNet demonstrated competitive performance across all datasets, demonstrating its effectiveness and practicality in posterior sclera OCT analysis.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70297","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146129955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CEN-RTDETR: A Co-Enhancement-Based Real-Time Single-Domain Generalized Object Detection for Road Scenes CEN-RTDETR:基于协同增强的道路场景实时单域广义目标检测
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-02 DOI: 10.1049/ipr2.70294
Huantong Geng, Long Fang, Yingrui Wang, Zhenyu Liu, Zichen Fan

In urban road scenes, the cross-domain data distribution differences caused by light and weather changes make the generalization performance of single-domain trained object detectors in unknown weather scenarios significantly degraded (e.g., daytime-sunny trained models in dusk-rainy scenarios reduce the detection accuracy by more than 46% on average); moreover, faster R-CNN suffers from insufficient generalization capability and inefficient real-time inference due to architectural constraints. To address the above challenges, this paper proposes the CEN-RTDETR method to improve the single-domain generalization capability through a collaborative enhancement strategy: CP-Mix color channel permutation dynamically simulates the RGB color channel permutation to simulate the multi-sky color bias phenomenon, and enhances the color robustness of the input data; NP feature normalization perturbation applies a random feature perturbation to the channel statistics of the shallow feature map to optimize the extraction of texture, color and other basic style features; CORAL Loss minimizes the difference in feature distribution between the source domain and the virtual target domain through second-order statistical matching. The experimental results show that CEN-RTDETR achieves significant performance improvement on the cross-weather scenario dataset DWD: The mean average precision ([email protected]) across different weather scenarios increases from 39.66% to 42.72% (+3.06%), especially for dusk-rainy and night-rainy scenarios, where [email protected] rises from 32.9% and 18.6% to 39.0% (+6.1%) and 24.1% (+5.5%) in extreme weather. The method in this paper effectively solves the cross-domain generalization and efficiency problems in single-domain generalized object detection, which provides new technical possibilities for real-time detection in complex urban road scenes.

{"title":"CEN-RTDETR: A Co-Enhancement-Based Real-Time Single-Domain Generalized Object Detection for Road Scenes","authors":"Huantong Geng,&nbsp;Long Fang,&nbsp;Yingrui Wang,&nbsp;Zhenyu Liu,&nbsp;Zichen Fan","doi":"10.1049/ipr2.70294","DOIUrl":"https://doi.org/10.1049/ipr2.70294","url":null,"abstract":"<p>In urban road scenes, the cross-domain data distribution differences caused by light and weather changes make the generalization performance of single-domain trained object detectors in unknown weather scenarios significantly degraded (e.g., daytime-sunny trained models in dusk-rainy scenarios reduce the detection accuracy by more than 46% on average); moreover, faster R-CNN suffers from insufficient generalization capability and inefficient real-time inference due to architectural constraints. To address the above challenges, this paper proposes the CEN-RTDETR method to improve the single-domain generalization capability through a collaborative enhancement strategy: CP-Mix color channel permutation dynamically simulates the RGB color channel permutation to simulate the multi-sky color bias phenomenon, and enhances the color robustness of the input data; NP feature normalization perturbation applies a random feature perturbation to the channel statistics of the shallow feature map to optimize the extraction of texture, color and other basic style features; CORAL Loss minimizes the difference in feature distribution between the source domain and the virtual target domain through second-order statistical matching. The experimental results show that CEN-RTDETR achieves significant performance improvement on the cross-weather scenario dataset DWD: The mean average precision ([email protected]) across different weather scenarios increases from 39.66% to 42.72% (+3.06%), especially for dusk-rainy and night-rainy scenarios, where [email protected] rises from 32.9% and 18.6% to 39.0% (+6.1%) and 24.1% (+5.5%) in extreme weather. The method in this paper effectively solves the cross-domain generalization and efficiency problems in single-domain generalized object detection, which provides new technical possibilities for real-time detection in complex urban road scenes.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70294","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Distance-Based Metric for Quality Assessment in Image Segmentation 一种新的基于距离的图像分割质量评价方法
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1049/ipr2.70296
Niklas Rottmayer, Claudia Redenbach

The assessment of segmentation quality plays a fundamental role in the development, optimization, and comparison of segmentation methods which are used in a wide range of applications. With few exceptions, quality assessment is performed using traditional metrics, which are based on counting the number of erroneous pixels but do not capture the spatial distribution of errors. Established distance-based metrics such as the directed average Hausdorff distance (dAHD$mathrm{dAHD}$) are difficult to interpret and compare for different methods and datasets. In this paper, we introduce the surface consistency coefficient (SCC$mathrm{SCC}$), a novel distance-based quality metric that quantifies the spatial distribution of errors based on their proximity to the surface of the structure. Through a rigorous analysis using synthetic data and real segmentation results, we demonstrate the robustness and effectiveness of SCC$mathrm{SCC}$ in distinguishing errors near the surface from those farther away. At the same time, SCC$mathrm{SCC}$ is easy to interpret and comparable across different structural contexts.

{"title":"A Novel Distance-Based Metric for Quality Assessment in Image Segmentation","authors":"Niklas Rottmayer,&nbsp;Claudia Redenbach","doi":"10.1049/ipr2.70296","DOIUrl":"https://doi.org/10.1049/ipr2.70296","url":null,"abstract":"<p>The assessment of segmentation quality plays a fundamental role in the development, optimization, and comparison of segmentation methods which are used in a wide range of applications. With few exceptions, quality assessment is performed using traditional metrics, which are based on counting the number of erroneous pixels but do not capture the spatial distribution of errors. Established distance-based metrics such as the directed average Hausdorff distance (<span></span><math>\u0000 <semantics>\u0000 <mi>dAHD</mi>\u0000 <annotation>$mathrm{dAHD}$</annotation>\u0000 </semantics></math>) are difficult to interpret and compare for different methods and datasets. In this paper, we introduce the surface consistency coefficient (<span></span><math>\u0000 <semantics>\u0000 <mi>SCC</mi>\u0000 <annotation>$mathrm{SCC}$</annotation>\u0000 </semantics></math>), a novel distance-based quality metric that quantifies the spatial distribution of errors based on their proximity to the surface of the structure. Through a rigorous analysis using synthetic data and real segmentation results, we demonstrate the robustness and effectiveness of <span></span><math>\u0000 <semantics>\u0000 <mi>SCC</mi>\u0000 <annotation>$mathrm{SCC}$</annotation>\u0000 </semantics></math> in distinguishing errors near the surface from those farther away. At the same time, <span></span><math>\u0000 <semantics>\u0000 <mi>SCC</mi>\u0000 <annotation>$mathrm{SCC}$</annotation>\u0000 </semantics></math> is easy to interpret and comparable across different structural contexts.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70296","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146136679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient-Aware Multiscale Feature Enhancement With Reliability Attention for Multi-Focus Image Fusion 基于可靠性关注的梯度感知多尺度特征增强多聚焦图像融合
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1049/ipr2.70289
Haifeng Gong, Wen Liu, Yaofeng Wu, Xu Liang, Junhao Huang, Jun Xiao, Han Wang, Xiang Yan

Multi-focus image fusion (MFIF) aims to synthesise an all-in-focus image from multiple source images captured with different focal depths. However, effectively distinguishing focused regions from defocused backgrounds remains challenging due to the smooth transitions and lack of ground-truth supervision. MFIF synthesises an all-in-focus image by selecting sharp regions from images captured at different focal planes. However, many deep methods ignore the gradient discrepancy between focused and defocused areas during multi-scale extraction, treat features without quantifying their reliability, and build decision maps using only horizontal and vertical gradients, leading to detail loss and structural misclassification. We propose MSFE-Net, a gradient-aware multi-scale feature enhancement network comprising three core components. First, a gradient‑aware multi‑scale feature enhancement module (GradientAwareMFE) balances fine detail and contextual cues by fusing dual atrous spatial pyramid pooling branches with small and large dilation rates under a learned gradient gate aligned to the Sobel gradient magnitude. Second, focus-reliability attention converts local-variance statistics into a spatial reliability mask and applies squeeze-and-excitation to modulate channels. Third, enhanced spatial frequency fusion integrates four-directional gradients with local context and morphological refinement to produce a robust decision map for pixel-wise fusion. Trained in a self-supervised manner on Microsoft Common Objects in Context (MS-COCO) using mean squared error, structural similarity, and gradient-consistency losses, and evaluated on Lytro, MFFW, and MFI-WHU datasets, MSFE-Net attains state-of-the-art or competitive results on most metrics, delivering sharper edges and fewer artefacts.

{"title":"Gradient-Aware Multiscale Feature Enhancement With Reliability Attention for Multi-Focus Image Fusion","authors":"Haifeng Gong,&nbsp;Wen Liu,&nbsp;Yaofeng Wu,&nbsp;Xu Liang,&nbsp;Junhao Huang,&nbsp;Jun Xiao,&nbsp;Han Wang,&nbsp;Xiang Yan","doi":"10.1049/ipr2.70289","DOIUrl":"https://doi.org/10.1049/ipr2.70289","url":null,"abstract":"<p>Multi-focus image fusion (MFIF) aims to synthesise an all-in-focus image from multiple source images captured with different focal depths. However, effectively distinguishing focused regions from defocused backgrounds remains challenging due to the smooth transitions and lack of ground-truth supervision. MFIF synthesises an all-in-focus image by selecting sharp regions from images captured at different focal planes. However, many deep methods ignore the gradient discrepancy between focused and defocused areas during multi-scale extraction, treat features without quantifying their reliability, and build decision maps using only horizontal and vertical gradients, leading to detail loss and structural misclassification. We propose MSFE-Net, a gradient-aware multi-scale feature enhancement network comprising three core components. First, a gradient‑aware multi‑scale feature enhancement module (GradientAwareMFE) balances fine detail and contextual cues by fusing dual atrous spatial pyramid pooling branches with small and large dilation rates under a learned gradient gate aligned to the Sobel gradient magnitude. Second, focus-reliability attention converts local-variance statistics into a spatial reliability mask and applies squeeze-and-excitation to modulate channels. Third, enhanced spatial frequency fusion integrates four-directional gradients with local context and morphological refinement to produce a robust decision map for pixel-wise fusion. Trained in a self-supervised manner on Microsoft Common Objects in Context (MS-COCO) using mean squared error, structural similarity, and gradient-consistency losses, and evaluated on Lytro, MFFW, and MFI-WHU datasets, MSFE-Net attains state-of-the-art or competitive results on most metrics, delivering sharper edges and fewer artefacts.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70289","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146136677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEN-IML: A Multi-Scale Edge-Aware Network for Image Manipulation Localisation man - iml:一种用于图像处理定位的多尺度边缘感知网络
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1049/ipr2.70295
Lei Zhang, Xiaodong Lu, Qinglong Jia, Minhui Chang

With the rapid development of image editing technology, image manipulation localisation is facing increasingly severe challenges. Traditional methods cannot effectively capture tampering features at different scales and fail to fully utilise the tampering trace information at the boundary of the manipulated regions, resulting in limited localisation accuracy. To address these issues, we propose MEN-IML, a multi-scale edge-aware network for image manipulation localisation. First, we design an efficient multi-scale edge-aware feature fusion module. This module enhances the expressive capability of edge details by assigning learnable weights to features at different scales and incorporating an edge feature awareness mechanism. Second, we adopt depthwise separable convolutional decoders instead of the traditional multi-layer perceptron decoders, which not only focus on extracting spatial features of each channel, but also improve the model's ability to integrate cross-channel information. Finally, we design a multi-scale edge loss to supervise the boundaries of manipulated regions at multiple scales, effectively enhancing the sensitivity of the model to manipulated region boundaries. Cross-dataset experimental results on multiple public datasets demonstrate that, compared to mainstream MVSS-Net++ and the state-of-the-art IML-ViT, the proposed method improves average performance by 13.5% and 6.3%, effectively enhancing the localisation accuracy of image tampering.

{"title":"MEN-IML: A Multi-Scale Edge-Aware Network for Image Manipulation Localisation","authors":"Lei Zhang,&nbsp;Xiaodong Lu,&nbsp;Qinglong Jia,&nbsp;Minhui Chang","doi":"10.1049/ipr2.70295","DOIUrl":"https://doi.org/10.1049/ipr2.70295","url":null,"abstract":"<p>With the rapid development of image editing technology, image manipulation localisation is facing increasingly severe challenges. Traditional methods cannot effectively capture tampering features at different scales and fail to fully utilise the tampering trace information at the boundary of the manipulated regions, resulting in limited localisation accuracy. To address these issues, we propose MEN-IML, a multi-scale edge-aware network for image manipulation localisation. First, we design an efficient multi-scale edge-aware feature fusion module. This module enhances the expressive capability of edge details by assigning learnable weights to features at different scales and incorporating an edge feature awareness mechanism. Second, we adopt depthwise separable convolutional decoders instead of the traditional multi-layer perceptron decoders, which not only focus on extracting spatial features of each channel, but also improve the model's ability to integrate cross-channel information. Finally, we design a multi-scale edge loss to supervise the boundaries of manipulated regions at multiple scales, effectively enhancing the sensitivity of the model to manipulated region boundaries. Cross-dataset experimental results on multiple public datasets demonstrate that, compared to mainstream MVSS-Net++ and the state-of-the-art IML-ViT, the proposed method improves average performance by 13.5% and 6.3%, effectively enhancing the localisation accuracy of image tampering.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146130115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wavelet Quadtree Contrast Limited Adaptive Histogram Equalisation Tablet Enhancement Method for Defect Detection in Low-Contrast Tablets 小波四叉树对比度有限自适应直方图均衡化片剂增强低对比度片剂缺陷检测方法
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1049/ipr2.70293
Zimei Tu, Qinna Wang, Zhicheng Jiang, Jinhua Jiang

In pharmaceutical manufacturing, tablets experience flaws like half-grain tablets, multi-pill tablets and paste tablets as a result of breaking, adhesion or inadequate pressing. The packaging material, typically composed of dual aluminium foil, exhibits high reflectivity. This reduces the overall image contrast and weakens the feature differences between defective areas and the background. This phenomenon significantly affects the accuracy of defect detection in pharmaceutical quality control. To address the above problems, this paper proposes a contrast enhancement method called Wavelet Quadtree Contrast Limited Adaptive Histogram Equalisation Tablet Enhancement (WCTE) for tablet images. The method combines Haar Wavelet Transform and CLAHE with adaptive quadtree chunking to enhance the detection accuracy of low-contrast tablets. The proposed approach uses Haar wavelets to decompose tablet images at multiple scales. It then applies power transform enhancement and soft-threshold denoising to sub-bands of different frequencies, highlighting edge details and suppressing background interference. The image is reconstructed by the inverse wavelet transform. To avoid edge artefacts and local over-enhancement from fixed blocks, an adaptive quadtree CLAHE strategy driven by local variance is applied. This allows adaptive segmentation of enhancement regions and ensures smooth transitions. The YOLOv11 model is utilised to identify the target in the augmented tablet image, facilitating the precise classification of typical flaws such as half-grain tablets, multi-pill tablets and paste tablets. Experimental results show that WCTE-enhanced images outperform both traditional CLAHE and unenhanced ones in entropy, contrast, clarity and average gradient. The comprehensive scores improve by 27.15% and 11.42% relative to the original and CLAHE images, respectively. The YOLOv11 model demonstrates a significant enhancement in defect detection accuracy for WCTE-enhanced images, with improvements of 13.33% and 9.06% for half-grain tablets and paste-like defects, respectively. The accuracy for multi-pill defects remains stable, while the overall mean average precision (mAP) increases by 2.5%, and the false detection rate decreases by 9%. This improvement further substantiates the efficacy and applicability of this method in low-contrast target detection tasks.

{"title":"Wavelet Quadtree Contrast Limited Adaptive Histogram Equalisation Tablet Enhancement Method for Defect Detection in Low-Contrast Tablets","authors":"Zimei Tu,&nbsp;Qinna Wang,&nbsp;Zhicheng Jiang,&nbsp;Jinhua Jiang","doi":"10.1049/ipr2.70293","DOIUrl":"https://doi.org/10.1049/ipr2.70293","url":null,"abstract":"<p>In pharmaceutical manufacturing, tablets experience flaws like half-grain tablets, multi-pill tablets and paste tablets as a result of breaking, adhesion or inadequate pressing. The packaging material, typically composed of dual aluminium foil, exhibits high reflectivity. This reduces the overall image contrast and weakens the feature differences between defective areas and the background. This phenomenon significantly affects the accuracy of defect detection in pharmaceutical quality control. To address the above problems, this paper proposes a contrast enhancement method called Wavelet Quadtree Contrast Limited Adaptive Histogram Equalisation Tablet Enhancement (WCTE) for tablet images. The method combines Haar Wavelet Transform and CLAHE with adaptive quadtree chunking to enhance the detection accuracy of low-contrast tablets. The proposed approach uses Haar wavelets to decompose tablet images at multiple scales. It then applies power transform enhancement and soft-threshold denoising to sub-bands of different frequencies, highlighting edge details and suppressing background interference. The image is reconstructed by the inverse wavelet transform. To avoid edge artefacts and local over-enhancement from fixed blocks, an adaptive quadtree CLAHE strategy driven by local variance is applied. This allows adaptive segmentation of enhancement regions and ensures smooth transitions. The YOLOv11 model is utilised to identify the target in the augmented tablet image, facilitating the precise classification of typical flaws such as half-grain tablets, multi-pill tablets and paste tablets. Experimental results show that WCTE-enhanced images outperform both traditional CLAHE and unenhanced ones in entropy, contrast, clarity and average gradient. The comprehensive scores improve by 27.15% and 11.42% relative to the original and CLAHE images, respectively. The YOLOv11 model demonstrates a significant enhancement in defect detection accuracy for WCTE-enhanced images, with improvements of 13.33% and 9.06% for half-grain tablets and paste-like defects, respectively. The accuracy for multi-pill defects remains stable, while the overall mean average precision (mAP) increases by 2.5%, and the false detection rate decreases by 9%. This improvement further substantiates the efficacy and applicability of this method in low-contrast target detection tasks.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70293","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146130041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shifting the Focus of Digital Pathology: The Raising Relevance of Pre-Processing Phase Over Model Complexity 转移数字病理学的焦点:提高预处理阶段对模型复杂性的相关性
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1049/ipr2.70290
Massimo Salvi, Nicola Michielli, Alessandro Mogetta, Alessandro Gambella, Abdulkadir Sengur, Filippo Molinari, Arkadiusz Gertych

Recent trends in computational pathology favour increasingly complex deep learning architectures, raising the question of whether such complexity is necessary for routine diagnostic tasks. This study challenges this assumption through a comprehensive analysis of the relationship between model complexity, data pre-processing, and performance across four fundamental digital pathology tasks: nuclei counting, steatosis quantification, glomeruli detection, and Ki67 proliferation index (PI) assessment. We evaluated five deep learning models of varying complexity (lightweight: MobileNetV2, U-Net, and more complex: ConvNeXt, K-Net, and Swin Transformer) combined with different image pre-processing techniques. To evaluate model performance without extensive ground truth (GT) annotations, we introduced a validation strategy utilizing the relative absolute deviation (RAD) between network predictions and correlation of performance metrics. Our findings demonstrate that pre-processing strategies, particularly stain normalization (NORM), can be more impactful than model complexity, reducing error rates by up to 50% compared to processing original (ORIG) images. With appropriate pre-processing, lightweight models achieved comparable or superior results to complex models while reducing processing times by up to 40%. Only specific tasks involving complex morphological features, such as glomeruli detection, significantly benefited from more sophisticated architectures. This study provides an evidence-based framework for selecting optimal model-pre-processing combinations in clinical settings, suggesting that investing in pre-processing pipelines rather than model complexity may be more beneficial for routine computational pathology applications.

{"title":"Shifting the Focus of Digital Pathology: The Raising Relevance of Pre-Processing Phase Over Model Complexity","authors":"Massimo Salvi,&nbsp;Nicola Michielli,&nbsp;Alessandro Mogetta,&nbsp;Alessandro Gambella,&nbsp;Abdulkadir Sengur,&nbsp;Filippo Molinari,&nbsp;Arkadiusz Gertych","doi":"10.1049/ipr2.70290","DOIUrl":"https://doi.org/10.1049/ipr2.70290","url":null,"abstract":"<p>Recent trends in computational pathology favour increasingly complex deep learning architectures, raising the question of whether such complexity is necessary for routine diagnostic tasks. This study challenges this assumption through a comprehensive analysis of the relationship between model complexity, data pre-processing, and performance across four fundamental digital pathology tasks: nuclei counting, steatosis quantification, glomeruli detection, and Ki67 proliferation index (PI) assessment. We evaluated five deep learning models of varying complexity (lightweight: MobileNetV2, U-Net, and more complex: ConvNeXt, K-Net, and Swin Transformer) combined with different image pre-processing techniques. To evaluate model performance without extensive ground truth (GT) annotations, we introduced a validation strategy utilizing the relative absolute deviation (RAD) between network predictions and correlation of performance metrics. Our findings demonstrate that pre-processing strategies, particularly stain normalization (NORM), can be more impactful than model complexity, reducing error rates by up to 50% compared to processing original (ORIG) images. With appropriate pre-processing, lightweight models achieved comparable or superior results to complex models while reducing processing times by up to 40%. Only specific tasks involving complex morphological features, such as glomeruli detection, significantly benefited from more sophisticated architectures. This study provides an evidence-based framework for selecting optimal model-pre-processing combinations in clinical settings, suggesting that investing in pre-processing pipelines rather than model complexity may be more beneficial for routine computational pathology applications.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70290","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Facial Recognition Using Unified Optimised Feature Fusion and Selection of Handcrafted and Deep Learning Features 使用统一优化特征融合和选择手工和深度学习特征的鲁棒面部识别
IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1049/ipr2.70292
Farid Ayeche, Adel Alti

Face recognition is a cornerstone of smart system development and remains a challenging research domain, particularly in achieving accurate and real-time identification of facial expression. Existing approaches often struggle to manage dynamic variations in pose, illumination and expression, resulting in reduced accuracy, delayed processing and inconsistent identification. The research difficulties are addressed by introducing a facial recognition framework based on the unified optimised feature vector (UOFV) to enhance robustness and efficiency in constrained environments. The framework combines standard descriptors (e.g., LBP, HOG and HDG) with deep learning features (e.g., CNN-based embeddings), uniting local texture, directional intensity patterns and high-level semantic representations into a single discriminative feature space. These extracted features are then optimised using the binary grey wolf optimisation algorithm, selected for its strong balance between exploration and exploitation. The optimisation systematically selects the most relevant attributes while discarding redundancies, reducing dimensionality without compromising discriminative power and producing the optimised UOFV. The framework is implemented in MATLAB and validated on six public datasets: ORL, YALE, warpPIE10P, dbCMUfaces, LFW and CASIA-WebFace. For classification, six standard machine learning models (e.g., KNN, DT, SVM, NB, DA and RF) were used to identify facial images. Experimental results confirm the effectiveness of the UOFV approach, achieving 99.12% accuracy on ORL, 87.87% on YALE, 100% on warpPIE10P, 99.91% on dbCMUfaces, 98.56% on LFW and 98.48% on CASIA-WebFace. Moreover, the approach demonstrates remarkable efficiency, with an average execution time of only 1.2 milliseconds per recognition task, confirming its suitability for real-time applications. A demonstration code of the proposed framework is publicly available at: https://www.mathworks.com/matlabcentral/fileexchange/178149-face-recognition-using-optimized-feature-extraction-and-ml

{"title":"Robust Facial Recognition Using Unified Optimised Feature Fusion and Selection of Handcrafted and Deep Learning Features","authors":"Farid Ayeche,&nbsp;Adel Alti","doi":"10.1049/ipr2.70292","DOIUrl":"https://doi.org/10.1049/ipr2.70292","url":null,"abstract":"<p>Face recognition is a cornerstone of smart system development and remains a challenging research domain, particularly in achieving accurate and real-time identification of facial expression. Existing approaches often struggle to manage dynamic variations in pose, illumination and expression, resulting in reduced accuracy, delayed processing and inconsistent identification. The research difficulties are addressed by introducing a facial recognition framework based on the unified optimised feature vector (UOFV) to enhance robustness and efficiency in constrained environments. The framework combines standard descriptors (e.g., LBP, HOG and HDG) with deep learning features (e.g., CNN-based embeddings), uniting local texture, directional intensity patterns and high-level semantic representations into a single discriminative feature space. These extracted features are then optimised using the binary grey wolf optimisation algorithm, selected for its strong balance between exploration and exploitation. The optimisation systematically selects the most relevant attributes while discarding redundancies, reducing dimensionality without compromising discriminative power and producing the optimised UOFV. The framework is implemented in MATLAB and validated on six public datasets: ORL, YALE, warpPIE10P, dbCMUfaces, LFW and CASIA-WebFace. For classification, six standard machine learning models (e.g., KNN, DT, SVM, NB, DA and RF) were used to identify facial images. Experimental results confirm the effectiveness of the UOFV approach, achieving 99.12% accuracy on ORL, 87.87% on YALE, 100% on warpPIE10P, 99.91% on dbCMUfaces, 98.56% on LFW and 98.48% on CASIA-WebFace. Moreover, the approach demonstrates remarkable efficiency, with an average execution time of only 1.2 milliseconds per recognition task, confirming its suitability for real-time applications. A demonstration code of the proposed framework is publicly available at: https://www.mathworks.com/matlabcentral/fileexchange/178149-face-recognition-using-optimized-feature-extraction-and-ml</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70292","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146136430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1