首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
NTRF-Net: A fuzzy logic-enhanced convolutional neural network for detecting hidden data in digital images NTRF-Net:一种用于检测数字图像中隐藏数据的模糊逻辑增强卷积神经网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-28 DOI: 10.1016/j.image.2025.117401
Ntivuguruzwa Jean De La Croix , Tohari Ahmad , Fengling Han , Royyana Muslim Ijtihadie
Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F1 Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.
隐写分析的最新进展集中在检测图像中的隐藏信息,但是在高级自适应隐写术中定位隐藏数据的可能位置仍然是一个关键的挑战,特别是对于在公共网络上共享的图像。本文介绍了一种新的隐写分析方法NTRF-Net,用于识别数字图像中被隐写改变的像素的位置。NTRF-Net关注图像的空间特征,在卷积神经网络中结合随机特征选择和模糊逻辑,通过修改图生成、特征分类和像素分类三个阶段进行工作。NTRF-Net显示出较高的准确率,准确率和F1分数分别达到98.2%和86.2%。ROC曲线和AUC值突出了所提出的NTRF-Net的强隐写改变识别能力,其优于现有基准。
{"title":"NTRF-Net: A fuzzy logic-enhanced convolutional neural network for detecting hidden data in digital images","authors":"Ntivuguruzwa Jean De La Croix ,&nbsp;Tohari Ahmad ,&nbsp;Fengling Han ,&nbsp;Royyana Muslim Ijtihadie","doi":"10.1016/j.image.2025.117401","DOIUrl":"10.1016/j.image.2025.117401","url":null,"abstract":"<div><div>Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F<sub>1</sub> Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117401"},"PeriodicalIF":2.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-domain joint deraining network for video rain streak removal 视频雨纹去除的三域联合训练网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-27 DOI: 10.1016/j.image.2025.117400
Wei Wu , Wenzhuo Zhai , Yong Liu , Xianbin Hu , Tailin Yang , Zhu Li
When shot outdoors in rainy weather, a rather complex and dynamic changed rain streak layer will have to be added to an original clean video, greatly degrading the performance of advanced outdoor vision systems. Currently, some excellent video deraining algorithms have been proposed and produce good results. However, these approaches neglect the joint analysis of relations in three important domains of videos, where it is widely known that video data certainly has intrinsic characteristics in temporal, spatial, and frequency domains, respectively. To address this issue, in the paper we propose a Three-domain Joint Deraining Network (TJDNet) for video rain streak removal. It composes of three network branches: temporal-spatial-frequency (TSF) branch, temporal-spatial (TS) branch, and spatial branch. In the proposed TJDNet, to capture spatial property for the current frame, is the common goal of these three branches. Moreover, we develop the TSF branch to specially pursue temporal-frequency relations between the wavelet subbands of the current frame and those of its adjacent frames. Furthermore, the TS branch is also designed to directly seize temporal correlations among successive frames. Finally, across-branch feature fusions are employed to propagate the features of one branch to enrich the information of another branch, further exploiting the characteristics of these three noteworthy domains. Compared with twenty-two state-of-the-art methods, experimental results show our proposed TJDNet achieves significantly better performance in both objective and subjective image qualities, particularly average PSNR increased by up to 2.10 dB. Our code will be available online at https://github.com/YanZhanggugu/TJDNet.
在雨天室外拍摄时,必须在原始的干净视频中添加一个相当复杂和动态变化的雨条纹层,这大大降低了先进的室外视觉系统的性能。目前,已经提出了一些优秀的视频脱轨算法,并取得了良好的效果。然而,这些方法忽略了对视频三个重要领域关系的联合分析,众所周知,视频数据在时间域、空间域和频域分别具有内在特征。为了解决这一问题,本文提出了一种用于视频雨纹去除的三域联合训练网络(TJDNet)。它由三个网络分支组成:时空频率分支(TSF)、时空分支(TS)和空间分支。在建议的tjnet中,捕获当前框架的空间属性是这三个分支的共同目标。此外,我们开发了TSF分支,专门研究当前帧的小波子带与其相邻帧的小波子带之间的时间-频率关系。此外,TS分支也被设计成直接捕捉连续帧之间的时间相关性。最后,利用跨分支特征融合传播一个分支的特征,以丰富另一个分支的信息,进一步利用这三个值得注意的领域的特征。实验结果表明,与22种最先进的方法相比,我们提出的TJDNet在客观和主观图像质量方面都取得了显着改善,特别是平均PSNR提高了2.10 dB。我们的代码将在https://github.com/YanZhanggugu/TJDNet上在线提供。
{"title":"Three-domain joint deraining network for video rain streak removal","authors":"Wei Wu ,&nbsp;Wenzhuo Zhai ,&nbsp;Yong Liu ,&nbsp;Xianbin Hu ,&nbsp;Tailin Yang ,&nbsp;Zhu Li","doi":"10.1016/j.image.2025.117400","DOIUrl":"10.1016/j.image.2025.117400","url":null,"abstract":"<div><div>When shot outdoors in rainy weather, a rather complex and dynamic changed rain streak layer will have to be added to an original clean video, greatly degrading the performance of advanced outdoor vision systems. Currently, some excellent video deraining algorithms have been proposed and produce good results. However, these approaches neglect the joint analysis of relations in three important domains of videos, where it is widely known that video data certainly has intrinsic characteristics in temporal, spatial, and frequency domains, respectively. To address this issue, in the paper we propose a Three-domain Joint Deraining Network (TJDNet) for video rain streak removal. It composes of three network branches: temporal-spatial-frequency (TSF) branch, temporal-spatial (TS) branch, and spatial branch. In the proposed TJDNet, to capture spatial property for the current frame, is the common goal of these three branches. Moreover, we develop the TSF branch to specially pursue temporal-frequency relations between the wavelet subbands of the current frame and those of its adjacent frames. Furthermore, the TS branch is also designed to directly seize temporal correlations among successive frames. Finally, across-branch feature fusions are employed to propagate the features of one branch to enrich the information of another branch, further exploiting the characteristics of these three noteworthy domains. Compared with twenty-two state-of-the-art methods, experimental results show our proposed TJDNet achieves significantly better performance in both objective and subjective image qualities, particularly average PSNR increased by up to 2.10 dB. Our code will be available online at <span><span>https://github.com/YanZhanggugu/TJDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117400"},"PeriodicalIF":2.7,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GIADNet: Gradient Inspired Attention Driven Denoising Network 梯度启发的注意力驱动去噪网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-25 DOI: 10.1016/j.image.2025.117399
Gourab Chatterjee, Debashis Das, Suman Kumar Maji
Image noise, commonly introduced during the acquisition process, significantly degrades visual quality and adversely affects downstream image processing tasks. To address this challenge while preserving fine structural details, we propose GIADNet: a Gradient-Inspired Attention-Driven Denoising Network. The proposed framework integrates gradient-guided feature enhancement, multi-scale representation learning, and attention-based refinement to achieve a superior balance between noise suppression and detail retention. In particular, the gradient information of the noisy input is fused with deep features early in the pipeline to enrich semantic representation. Furthermore, we introduce two dedicated modules: the Multi-Pooling Pixel Attention (MPPA) module, which adaptively emphasizes informative pixels, and the Multi-Scale Attention Block (MSAB), designed to capture hierarchical contextual dependencies across varying spatial resolutions. Extensive experiments on standard benchmarks demonstrate that GIADNet achieves highly competitive performance, surpassing several state-of-the-art methods in both quantitative metrics and visual quality. Ablation studies further validate the effectiveness of each component, underscoring the importance of our attention-guided multi-scale design in advancing the field of image denoising. Code is available at: https://github.com/debashis15/GIADNet.
通常在采集过程中引入的图像噪声会显著降低视觉质量,并对下游图像处理任务产生不利影响。为了解决这一挑战,同时保留精细的结构细节,我们提出了GIADNet:一个梯度启发的注意力驱动去噪网络。该框架集成了梯度引导的特征增强、多尺度表示学习和基于注意的细化,在噪声抑制和细节保留之间取得了良好的平衡。特别是,在管道的早期将噪声输入的梯度信息与深度特征融合,以丰富语义表示。此外,我们引入了两个专用模块:多池像素注意力(MPPA)模块,它自适应地强调信息像素,以及多尺度注意力块(MSAB),旨在捕捉不同空间分辨率下的分层上下文依赖关系。在标准基准上进行的大量实验表明,GIADNet实现了极具竞争力的性能,在定量指标和视觉质量方面都超过了几种最先进的方法。消融研究进一步验证了每个组件的有效性,强调了我们的注意力引导多尺度设计在推进图像去噪领域的重要性。代码可从https://github.com/debashis15/GIADNet获得。
{"title":"GIADNet: Gradient Inspired Attention Driven Denoising Network","authors":"Gourab Chatterjee,&nbsp;Debashis Das,&nbsp;Suman Kumar Maji","doi":"10.1016/j.image.2025.117399","DOIUrl":"10.1016/j.image.2025.117399","url":null,"abstract":"<div><div>Image noise, commonly introduced during the acquisition process, significantly degrades visual quality and adversely affects downstream image processing tasks. To address this challenge while preserving fine structural details, we propose GIADNet: a Gradient-Inspired Attention-Driven Denoising Network. The proposed framework integrates gradient-guided feature enhancement, multi-scale representation learning, and attention-based refinement to achieve a superior balance between noise suppression and detail retention. In particular, the gradient information of the noisy input is fused with deep features early in the pipeline to enrich semantic representation. Furthermore, we introduce two dedicated modules: the Multi-Pooling Pixel Attention (MPPA) module, which adaptively emphasizes informative pixels, and the Multi-Scale Attention Block (MSAB), designed to capture hierarchical contextual dependencies across varying spatial resolutions. Extensive experiments on standard benchmarks demonstrate that GIADNet achieves highly competitive performance, surpassing several state-of-the-art methods in both quantitative metrics and visual quality. Ablation studies further validate the effectiveness of each component, underscoring the importance of our attention-guided multi-scale design in advancing the field of image denoising. Code is available at: <span><span>https://github.com/debashis15/GIADNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117399"},"PeriodicalIF":2.7,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144907393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast tone mapping operator for high dynamic range image using prior information 基于先验信息的高动态范围图像快速色调映射算子
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-22 DOI: 10.1016/j.image.2025.117395
Xueyu Han , Xin Sun , Susanto Rahardja
This paper presents a fast tone mapping operator (TMO) that effectively reproduces high dynamic range (HDR) images on common displays while maintaining visual appeal. The proposed method addresses the trade-off between computational complexity and detail retention inherent in existing global and local TMOs by leveraging prior information. We construct a dynamic range compression model on the HDR luminance channel and introduce two priors to fast generate the low dynamic range (LDR) luminance channel. First, most local regions of the inverted LDR luminance channel have some very low intensity pixels. Second, the luminance of the global light layer is a constant. Besides, we propose an adaptive luminance normalization approach based on the brightness feature of the input HDR image, facilitating the stability of tone mapping performance. Detail enhancement and color attenuation techniques are also presented to improve local contrasts and manage over-saturation. The effectiveness of the proposed TMO is validated through comparison with state-of-the-art methods. Both subjective and objective results show that our method outperforms others in producing high-quality tone-mapped images. Additionally, it exhibits lower computational complexity than local TMOs while remaining comparable to global ones.
本文提出了一种快速色调映射算子(TMO),它能在普通显示器上有效地再现高动态范围(HDR)图像,同时保持视觉吸引力。该方法通过利用先验信息,解决了现有全局和局部tmo固有的计算复杂性和细节保留之间的权衡。在HDR亮度通道上构建动态范围压缩模型,并引入两个先验算法快速生成低动态范围(LDR)亮度通道。首先,大多数局部区域的倒LDR亮度通道有一些非常低的强度像素。其次,全局光层的亮度是一个常数。此外,我们提出了一种基于输入HDR图像亮度特征的自适应亮度归一化方法,促进了色调映射性能的稳定性。细节增强和颜色衰减技术也提出了改善局部对比和管理过饱和。通过与最先进的方法进行比较,验证了所提出的TMO的有效性。主观和客观的结果都表明,我们的方法在产生高质量的色调映射图像方面优于其他方法。此外,它比局部tmo具有更低的计算复杂度,同时与全局tmo保持相当的水平。
{"title":"Fast tone mapping operator for high dynamic range image using prior information","authors":"Xueyu Han ,&nbsp;Xin Sun ,&nbsp;Susanto Rahardja","doi":"10.1016/j.image.2025.117395","DOIUrl":"10.1016/j.image.2025.117395","url":null,"abstract":"<div><div>This paper presents a fast tone mapping operator (TMO) that effectively reproduces high dynamic range (HDR) images on common displays while maintaining visual appeal. The proposed method addresses the trade-off between computational complexity and detail retention inherent in existing global and local TMOs by leveraging prior information. We construct a dynamic range compression model on the HDR luminance channel and introduce two priors to fast generate the low dynamic range (LDR) luminance channel. First, most local regions of the inverted LDR luminance channel have some very low intensity pixels. Second, the luminance of the global light layer is a constant. Besides, we propose an adaptive luminance normalization approach based on the brightness feature of the input HDR image, facilitating the stability of tone mapping performance. Detail enhancement and color attenuation techniques are also presented to improve local contrasts and manage over-saturation. The effectiveness of the proposed TMO is validated through comparison with state-of-the-art methods. Both subjective and objective results show that our method outperforms others in producing high-quality tone-mapped images. Additionally, it exhibits lower computational complexity than local TMOs while remaining comparable to global ones.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117395"},"PeriodicalIF":2.7,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144896079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised image super-resolution recurrent network based on diffusion model 基于扩散模型的无监督图像超分辨递归网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-20 DOI: 10.1016/j.image.2025.117398
Ni Tang , Dongxiao Zhang , Yanyun Qu
Unsupervised image super-resolution offers distinct advantages for real-world applications by eliminating the need for paired high- and low-resolution images. This paper proposes a novel architecture specifically designed for unsupervised learning, consisting of a cycle branch and a diffusion branch. The cycle branch integrates an upsampling and a downsampling network to generate pseudo-paired images from unpaired high- and low-resolution inputs. In parallel, the diffusion branch incorporates two independent diffusion models that refine these pseudo pairs, jointly modeling the processes of image reconstruction and degradation. This collaborative design enhances the authenticity of the pseudo pairs and enriches the detail in the reconstructed images. A key challenge in unsupervised learning is the lack of explicit label supervision, which often leads to inaccurate color restoration. To address this, we introduce a color consistency loss that regulates the cycle branch and promotes color fidelity. Through joint end-to-end training, the two branches complement each other to achieve high-quality reconstruction. Experimental results demonstrate that the proposed method effectively handles real-world low-resolution images, providing a robust and practical solution for image super-resolution.
无监督图像超分辨率通过消除配对高分辨率和低分辨率图像的需要,为现实世界的应用提供了明显的优势。本文提出了一种专门为无监督学习设计的新体系结构,由循环分支和扩散分支组成。循环分支集成了上采样和下采样网络,从未配对的高分辨率和低分辨率输入生成伪配对图像。同时,扩散分支结合了两个独立的扩散模型来细化这些伪对,共同建模图像重建和退化过程。这种协同设计增强了伪对的真实性,丰富了重建图像的细节。无监督学习的一个关键挑战是缺乏明确的标签监督,这经常导致不准确的颜色恢复。为了解决这个问题,我们引入了色彩一致性损失来调节循环分支并提高色彩保真度。通过端到端联合培训,两个分支机构相互补充,实现高质量重建。实验结果表明,该方法可以有效地处理现实世界中的低分辨率图像,为图像超分辨率提供了鲁棒性和实用性的解决方案。
{"title":"Unsupervised image super-resolution recurrent network based on diffusion model","authors":"Ni Tang ,&nbsp;Dongxiao Zhang ,&nbsp;Yanyun Qu","doi":"10.1016/j.image.2025.117398","DOIUrl":"10.1016/j.image.2025.117398","url":null,"abstract":"<div><div>Unsupervised image super-resolution offers distinct advantages for real-world applications by eliminating the need for paired high- and low-resolution images. This paper proposes a novel architecture specifically designed for unsupervised learning, consisting of a cycle branch and a diffusion branch. The cycle branch integrates an upsampling and a downsampling network to generate pseudo-paired images from unpaired high- and low-resolution inputs. In parallel, the diffusion branch incorporates two independent diffusion models that refine these pseudo pairs, jointly modeling the processes of image reconstruction and degradation. This collaborative design enhances the authenticity of the pseudo pairs and enriches the detail in the reconstructed images. A key challenge in unsupervised learning is the lack of explicit label supervision, which often leads to inaccurate color restoration. To address this, we introduce a color consistency loss that regulates the cycle branch and promotes color fidelity. Through joint end-to-end training, the two branches complement each other to achieve high-quality reconstruction. Experimental results demonstrate that the proposed method effectively handles real-world low-resolution images, providing a robust and practical solution for image super-resolution.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117398"},"PeriodicalIF":2.7,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144896078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on a multi-person pose estimation model to balance accuracy and speed 研究一种平衡精度和速度的多人姿态估计模型
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-11 DOI: 10.1016/j.image.2025.117396
Xiangdong Gao, Liying Sun, Fan Zhang
This article presents BASP_YOLO, an enhanced multi-person pose estimation model designed to balance accuracy and speed for real-world applications. To address the computational complexity and limited robustness of existing methods, the proposed model integrates lightweight DSConv layers, a multi-scale fusion module combining BiFPN and efficient attention mechanisms, an optimized spatial pyramid pooling module with CSPC connections, and an SPD-DS module to mitigate channel information loss. Evaluated on the MS COCO dataset, BASP_YOLO achieves a [email protected] of 84.6 % at 54 FPS, outperforming mainstream models like YOLO-Pose and OpenPose. The improvements reduce computational load by 52.2 % while enhancing occlusion handling, small-object detection, and robustness to environmental interference. The effectiveness of the model improvements was further validated using the MPII dataset. This work improves the accuracy of pose estimation while compromising real-time performance as little as possible, advancing deployment feasibility in resource-constrained scenarios.
本文介绍了BASP_YOLO,这是一种增强的多人姿态估计模型,旨在平衡真实应用程序的准确性和速度。为了解决现有方法的计算复杂性和有限的鲁棒性,该模型集成了轻量级DSConv层、结合BiFPN和高效注意机制的多尺度融合模块、具有CSPC连接的优化空间金字塔池模块以及减少信道信息丢失的SPD-DS模块。在MS COCO数据集上进行评估,BASP_YOLO在54 FPS下达到了84.6%的[email protected],优于主流模型如YOLO-Pose和OpenPose。这些改进减少了52.2%的计算负荷,同时增强了遮挡处理、小目标检测和对环境干扰的鲁棒性。利用MPII数据集进一步验证了模型改进的有效性。这项工作提高了姿态估计的准确性,同时尽可能少地影响实时性能,提高了在资源受限场景下部署的可行性。
{"title":"Research on a multi-person pose estimation model to balance accuracy and speed","authors":"Xiangdong Gao,&nbsp;Liying Sun,&nbsp;Fan Zhang","doi":"10.1016/j.image.2025.117396","DOIUrl":"10.1016/j.image.2025.117396","url":null,"abstract":"<div><div>This article presents BASP_YOLO, an enhanced multi-person pose estimation model designed to balance accuracy and speed for real-world applications. To address the computational complexity and limited robustness of existing methods, the proposed model integrates lightweight DSConv layers, a multi-scale fusion module combining BiFPN and efficient attention mechanisms, an optimized spatial pyramid pooling module with CSPC connections, and an SPD-DS module to mitigate channel information loss. Evaluated on the MS COCO dataset, BASP_YOLO achieves a [email protected] of 84.6 % at 54 FPS, outperforming mainstream models like YOLO-Pose and OpenPose. The improvements reduce computational load by 52.2 % while enhancing occlusion handling, small-object detection, and robustness to environmental interference. The effectiveness of the model improvements was further validated using the MPII dataset. This work improves the accuracy of pose estimation while compromising real-time performance as little as possible, advancing deployment feasibility in resource-constrained scenarios.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117396"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144827689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new method to improve the precision of image quality assessment metrics: Piecewise linearization of the relationship between the metrics and mean opinion scores 一种提高图像质量评价指标精度的新方法:对指标与平均评价分数之间的关系分段线性化
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-11 DOI: 10.1016/j.image.2025.117393
Cemre Müge Bilsay , Hakkı Alparslan Ilgın
Measuring the perceptual visual quality is an important task for many image and video processing applications. Although, the most accurate results are obtained through subjective evaluation, the process is quite time-consuming. To ease the process, many image quality assessment (IQA) algorithms are designed using different approaches to account for various aspects of the human visual system (HVS) over the years. Evaluating the performance of these algorithms typically involves comparison of their scores to subjective scores using Pearson Linear Correlation Coefficient (PLCC). However, because the relationship between objective and subjective scores is often inherently nonlinear, applying a nonlinear mapping, most commonly the 5-parameter logistic function proposed by Video Quality Experts Group (VQEG), prior to performance evaluation is a standard practice in the literature. In this paper, we propose a novel piecewise linearization scheme as an alternative to the widely used nonlinear mapping function. Our method employs a data dependent piecewise linear mapping to align objective metric scores with subjective quality scores, which is applicable to many different IQA metrics. We validate the effectiveness of the proposed method on three publicly available datasets (CSIQ, TID2008, TID2013) and seven different IQA metrics, using PLCC as the primary performance indicator. Experimental results show that our linearization method effectively scales metric scores and achieves stronger correlations with subjective scores yielding a higher prediction accuracy. Code to reproduce our results is publicly available at github.com/cemremuge/PiecewiseLinearization.
测量感知视觉质量是许多图像和视频处理应用的重要任务。虽然最准确的结果是通过主观评价获得的,但这个过程相当耗时。为了简化这一过程,多年来,许多图像质量评估(IQA)算法使用不同的方法来设计,以考虑人类视觉系统(HVS)的各个方面。评估这些算法的性能通常涉及使用Pearson线性相关系数(PLCC)将其分数与主观分数进行比较。然而,由于客观和主观评分之间的关系往往是固有的非线性,在性能评估之前应用非线性映射,最常见的是由视频质量专家组(VQEG)提出的5参数逻辑函数,是文献中的标准做法。在本文中,我们提出了一种新的分段线性化格式,作为广泛使用的非线性映射函数的替代方案。我们的方法采用数据依赖的分段线性映射来将客观度量分数与主观质量分数对齐,该方法适用于许多不同的IQA度量。我们使用PLCC作为主要性能指标,在三个公开可用的数据集(CSIQ, TID2008, TID2013)和七个不同的IQA指标上验证了所提出方法的有效性。实验结果表明,我们的线性化方法有效地衡量了度量分数,并与主观分数实现了更强的相关性,从而提高了预测精度。复制我们的结果的代码可在github.com/cemremuge/PiecewiseLinearization上公开获得。
{"title":"A new method to improve the precision of image quality assessment metrics: Piecewise linearization of the relationship between the metrics and mean opinion scores","authors":"Cemre Müge Bilsay ,&nbsp;Hakkı Alparslan Ilgın","doi":"10.1016/j.image.2025.117393","DOIUrl":"10.1016/j.image.2025.117393","url":null,"abstract":"<div><div>Measuring the perceptual visual quality is an important task for many image and video processing applications. Although, the most accurate results are obtained through subjective evaluation, the process is quite time-consuming. To ease the process, many image quality assessment (IQA) algorithms are designed using different approaches to account for various aspects of the human visual system (HVS) over the years. Evaluating the performance of these algorithms typically involves comparison of their scores to subjective scores using Pearson Linear Correlation Coefficient (PLCC). However, because the relationship between objective and subjective scores is often inherently nonlinear, applying a nonlinear mapping, most commonly the 5-parameter logistic function proposed by Video Quality Experts Group (VQEG), prior to performance evaluation is a standard practice in the literature. In this paper, we propose a novel piecewise linearization scheme as an alternative to the widely used nonlinear mapping function. Our method employs a data dependent piecewise linear mapping to align objective metric scores with subjective quality scores, which is applicable to many different IQA metrics. We validate the effectiveness of the proposed method on three publicly available datasets (CSIQ, TID2008, TID2013) and seven different IQA metrics, using PLCC as the primary performance indicator. Experimental results show that our linearization method effectively scales metric scores and achieves stronger correlations with subjective scores yielding a higher prediction accuracy. Code to reproduce our results is publicly available at github.com/cemremuge/PiecewiseLinearization.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117393"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144860329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tree-based hierarchical fusion network for multimodal finger recognition 基于树的多模态手指识别层次融合网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-09 DOI: 10.1016/j.image.2025.117397
Yiwei Huang , Hui Ma , Jianian Li , Mingyang Wang
With digitization comes cyber threats and security vulnerabilities, biometric subject has increasingly evolved from unimodal recognition to more secure and accurate forms of multimodal. However, most existing methods focus on the optimal generation of fusion weighting parameters and the design of models with fixed architecture, and such fixed-architecture fusion methods have difficulties in accurately modeling multimodal finger features with large differences in image distributions. In this paper, a Tree-based Hierarchical Fusion Network (THiFNet) is proposed to fuse features of different modalities by adaptively exploring the common feature space using their interdependencies generated in the convolutional tree. First, in order to extract multi-scale features contained in fingerprint and finger vein images, a Residual Non-Local (Res-NL) backbone network is proposed to compute long-range point-to-point relationships while avoiding the loss of minutiae features extracted by shallow convolutional filters. Further, to adaptively bridge the cross-modal heterogeneity gap, a novel Hierarchical Convolutional Tree (HiCT) is proposed to generate interdependencies between different modalities and within the same modality via channel attention. The primary advantage is that the attention modules used for fusion are dynamically selected by the tree network, modeling a more diverse common feature space and improving accuracy within a limited recognition time. Experimental results on three multimodal finger feature datasets show the framework achieves state-of-the-art results when compared with the other methods.
随着数字化带来的网络威胁和安全漏洞,生物识别学科正从单模态识别向更安全、更准确的多模态识别发展。然而,现有的方法大多集中在融合加权参数的最优生成和固定结构模型的设计上,这种固定结构的融合方法难以准确建模图像分布差异较大的多模态手指特征。本文提出了一种基于树的层次融合网络(THiFNet),通过自适应地探索不同模式的共同特征空间,利用它们在卷积树中产生的相互依赖关系来融合不同模式的特征。首先,为了提取指纹和指静脉图像中包含的多尺度特征,提出了一种残差非局部(Res-NL)骨干网络来计算远程点对点关系,同时避免了浅卷积滤波器提取的细节特征的损失。此外,为了自适应地弥合跨模态异质性差距,提出了一种新的分层卷积树(HiCT),通过信道关注在不同模态之间和同一模态内产生相互依赖关系。该方法的主要优点是用于融合的关注模块是由树状网络动态选择的,可以在有限的识别时间内建模更多样化的共同特征空间,提高识别精度。在三个多模态手指特征数据集上的实验结果表明,与其他方法相比,该框架取得了较好的效果。
{"title":"Tree-based hierarchical fusion network for multimodal finger recognition","authors":"Yiwei Huang ,&nbsp;Hui Ma ,&nbsp;Jianian Li ,&nbsp;Mingyang Wang","doi":"10.1016/j.image.2025.117397","DOIUrl":"10.1016/j.image.2025.117397","url":null,"abstract":"<div><div>With digitization comes cyber threats and security vulnerabilities, biometric subject has increasingly evolved from unimodal recognition to more secure and accurate forms of multimodal. However, most existing methods focus on the optimal generation of fusion weighting parameters and the design of models with fixed architecture, and such fixed-architecture fusion methods have difficulties in accurately modeling multimodal finger features with large differences in image distributions. In this paper, a Tree-based Hierarchical Fusion Network (THiFNet) is proposed to fuse features of different modalities by adaptively exploring the common feature space using their interdependencies generated in the convolutional tree. First, in order to extract multi-scale features contained in fingerprint and finger vein images, a Residual Non-Local (Res-NL) backbone network is proposed to compute long-range point-to-point relationships while avoiding the loss of minutiae features extracted by shallow convolutional filters. Further, to adaptively bridge the cross-modal heterogeneity gap, a novel Hierarchical Convolutional Tree (HiCT) is proposed to generate interdependencies between different modalities and within the same modality via channel attention. The primary advantage is that the attention modules used for fusion are dynamically selected by the tree network, modeling a more diverse common feature space and improving accuracy within a limited recognition time. Experimental results on three multimodal finger feature datasets show the framework achieves state-of-the-art results when compared with the other methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117397"},"PeriodicalIF":2.7,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144827688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image enhancement based on visual perception fusion 基于视觉感知融合的水下图像增强
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-09 DOI: 10.1016/j.image.2025.117394
Dan Xiang , Huihua Wang , Zebin Zhou , Jing Ling , Pan Gao , Jinwen Zhang , Chun Shan
Underwater images often face some unique challenges and problems caused by the complexity of the underwater environment, mainly including color distortion and image blur with low contrast. To address these problems, we propose an underwater image enhancement technique based on visual perceptual fusion. This method is divided into three stages: color correction, contrast enhancement and multi-task fusion method. In color correction, the statistical properties are combined with the relationship between the analyzed color channels to construct an adaptive compensation method for achieving color correction. Additionally, the channels are converted to a linear space to build a color adaptation matrix, enabling the algorithm to adapt to the effects of different light sources. For contrast and detail texture enhancement, the channel information across the color space is analyzed and processed separately in the LAB channel, and an advanced method of multi-scale decomposition is used to enhance the grayscale information in the L channel. Then, the details are fused with the base layer to enhance the overall details of the image. Finally, we calculate the similarity and gradient of the images from the above two methods with the original image, and then by calculating the weights to achieve high-quality underwater images. Through a large number of experiments, it is proved that our method can not only preserve the details and layering of the image, but also the image has better visual effect and good performance in both qualitative and quantitative evaluation.
由于水下环境的复杂性,水下图像经常面临一些独特的挑战和问题,主要包括色彩失真和低对比度下的图像模糊。为了解决这些问题,我们提出了一种基于视觉感知融合的水下图像增强技术。该方法分为色彩校正、对比度增强和多任务融合三个阶段。在色彩校正中,将统计特性与被分析颜色通道之间的关系相结合,构建自适应补偿方法来实现色彩校正。此外,将通道转换为线性空间,构建颜色适应矩阵,使算法能够适应不同光源的效果。为了增强对比度和细节纹理,在LAB通道中对跨颜色空间的通道信息进行单独分析和处理,并采用一种先进的多尺度分解方法增强L通道中的灰度信息。然后,将细节与基础层融合,增强图像的整体细节。最后,我们计算上述两种方法得到的图像与原始图像的相似度和梯度,然后通过计算权重得到高质量的水下图像。通过大量的实验证明,我们的方法不仅保留了图像的细节和层次感,而且图像具有较好的视觉效果,在定性和定量评价方面都有良好的表现。
{"title":"Underwater image enhancement based on visual perception fusion","authors":"Dan Xiang ,&nbsp;Huihua Wang ,&nbsp;Zebin Zhou ,&nbsp;Jing Ling ,&nbsp;Pan Gao ,&nbsp;Jinwen Zhang ,&nbsp;Chun Shan","doi":"10.1016/j.image.2025.117394","DOIUrl":"10.1016/j.image.2025.117394","url":null,"abstract":"<div><div>Underwater images often face some unique challenges and problems caused by the complexity of the underwater environment, mainly including color distortion and image blur with low contrast. To address these problems, we propose an underwater image enhancement technique based on visual perceptual fusion. This method is divided into three stages: color correction, contrast enhancement and multi-task fusion method. In color correction, the statistical properties are combined with the relationship between the analyzed color channels to construct an adaptive compensation method for achieving color correction. Additionally, the channels are converted to a linear space to build a color adaptation matrix, enabling the algorithm to adapt to the effects of different light sources. For contrast and detail texture enhancement, the channel information across the color space is analyzed and processed separately in the LAB channel, and an advanced method of multi-scale decomposition is used to enhance the grayscale information in the L channel. Then, the details are fused with the base layer to enhance the overall details of the image. Finally, we calculate the similarity and gradient of the images from the above two methods with the original image, and then by calculating the weights to achieve high-quality underwater images. Through a large number of experiments, it is proved that our method can not only preserve the details and layering of the image, but also the image has better visual effect and good performance in both qualitative and quantitative evaluation.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117394"},"PeriodicalIF":2.7,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144827687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object detection-based deep autoencoder hashing image retrieval 基于目标检测的深度自编码器哈希图像检索
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-18 DOI: 10.1016/j.image.2025.117384
Uğur Erkan , Ahmet Yilmaz , Abdurrahim Toktas , Qiang Lai , Suo Gao
Image Retrieval (IR), which returns similar images from a large image database, has become an important task as multimedia data grows. Existing studies utilize hash code representing the image features generated from the whole image, including redundant semantics from the background. In this study, a novel Object Detection-based Hashing IR (ODH-IR) scheme using You Only Look Once (YOLO) and an autoencoder is presented to ignore clutter in the images. Integration of YOLO and the autoencoder provides the most representative hash code depending on meaningful objects in the images. The autoencoder is exploited to compress the detected object vector to the desired bit length of the hash code. The ODH-IR scheme is validated by comparison with the state of the art through three well-known datasets in terms of precise metrics. The ODH-IR totally has the best 35 metric results over 36 measurements and the best avg. mean rank of 1.03. Moreover, it is observed from the three illustrative IR examples that it retrieves the most relevant semantics. The results demonstrate that the ODH-IR is an impactful scheme thanks to the effective hashing method through object detection using YOLO and the autoencoder.
随着多媒体数据的增长,从大型图像数据库中返回相似图像的图像检索(IR)已成为一项重要任务。现有的研究利用哈希码表示从整个图像生成的图像特征,包括来自背景的冗余语义。在这项研究中,提出了一种新的基于目标检测的哈希红外(ODH-IR)方案,该方案使用You Only Look Once (YOLO)和自动编码器来忽略图像中的杂波。YOLO和自动编码器的集成根据图像中有意义的对象提供了最具代表性的哈希码。利用自动编码器将检测到的对象向量压缩到哈希码的所需位长度。ODH-IR方案通过三个众所周知的精确度量数据集与最新技术的比较来验证。ODH-IR在36次测量中共获得35个指标的最佳结果,最佳平均排名为1.03。此外,从三个说明性IR示例中可以观察到,它检索了最相关的语义。结果表明,ODH-IR是一种有效的哈希方法,利用YOLO和自编码器进行目标检测。
{"title":"Object detection-based deep autoencoder hashing image retrieval","authors":"Uğur Erkan ,&nbsp;Ahmet Yilmaz ,&nbsp;Abdurrahim Toktas ,&nbsp;Qiang Lai ,&nbsp;Suo Gao","doi":"10.1016/j.image.2025.117384","DOIUrl":"10.1016/j.image.2025.117384","url":null,"abstract":"<div><div>Image Retrieval (IR), which returns similar images from a large image database, has become an important task as multimedia data grows. Existing studies utilize hash code representing the image features generated from the whole image, including redundant semantics from the background. In this study, a novel Object Detection-based Hashing IR (ODH-IR) scheme using You Only Look Once (YOLO) and an autoencoder is presented to ignore clutter in the images. Integration of YOLO and the autoencoder provides the most representative hash code depending on meaningful objects in the images. The autoencoder is exploited to compress the detected object vector to the desired bit length of the hash code. The ODH-IR scheme is validated by comparison with the state of the art through three well-known datasets in terms of precise metrics. The ODH-IR totally has the best 35 metric results over 36 measurements and the best avg. mean rank of 1.03. Moreover, it is observed from the three illustrative IR examples that it retrieves the most relevant semantics. The results demonstrate that the ODH-IR is an impactful scheme thanks to the effective hashing method through object detection using YOLO and the autoencoder.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117384"},"PeriodicalIF":3.4,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144694958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1