Pub Date : 2025-08-28DOI: 10.1016/j.image.2025.117401
Ntivuguruzwa Jean De La Croix , Tohari Ahmad , Fengling Han , Royyana Muslim Ijtihadie
Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F1 Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.
{"title":"NTRF-Net: A fuzzy logic-enhanced convolutional neural network for detecting hidden data in digital images","authors":"Ntivuguruzwa Jean De La Croix , Tohari Ahmad , Fengling Han , Royyana Muslim Ijtihadie","doi":"10.1016/j.image.2025.117401","DOIUrl":"10.1016/j.image.2025.117401","url":null,"abstract":"<div><div>Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F<sub>1</sub> Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117401"},"PeriodicalIF":2.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-27DOI: 10.1016/j.image.2025.117400
Wei Wu , Wenzhuo Zhai , Yong Liu , Xianbin Hu , Tailin Yang , Zhu Li
When shot outdoors in rainy weather, a rather complex and dynamic changed rain streak layer will have to be added to an original clean video, greatly degrading the performance of advanced outdoor vision systems. Currently, some excellent video deraining algorithms have been proposed and produce good results. However, these approaches neglect the joint analysis of relations in three important domains of videos, where it is widely known that video data certainly has intrinsic characteristics in temporal, spatial, and frequency domains, respectively. To address this issue, in the paper we propose a Three-domain Joint Deraining Network (TJDNet) for video rain streak removal. It composes of three network branches: temporal-spatial-frequency (TSF) branch, temporal-spatial (TS) branch, and spatial branch. In the proposed TJDNet, to capture spatial property for the current frame, is the common goal of these three branches. Moreover, we develop the TSF branch to specially pursue temporal-frequency relations between the wavelet subbands of the current frame and those of its adjacent frames. Furthermore, the TS branch is also designed to directly seize temporal correlations among successive frames. Finally, across-branch feature fusions are employed to propagate the features of one branch to enrich the information of another branch, further exploiting the characteristics of these three noteworthy domains. Compared with twenty-two state-of-the-art methods, experimental results show our proposed TJDNet achieves significantly better performance in both objective and subjective image qualities, particularly average PSNR increased by up to 2.10 dB. Our code will be available online at https://github.com/YanZhanggugu/TJDNet.
{"title":"Three-domain joint deraining network for video rain streak removal","authors":"Wei Wu , Wenzhuo Zhai , Yong Liu , Xianbin Hu , Tailin Yang , Zhu Li","doi":"10.1016/j.image.2025.117400","DOIUrl":"10.1016/j.image.2025.117400","url":null,"abstract":"<div><div>When shot outdoors in rainy weather, a rather complex and dynamic changed rain streak layer will have to be added to an original clean video, greatly degrading the performance of advanced outdoor vision systems. Currently, some excellent video deraining algorithms have been proposed and produce good results. However, these approaches neglect the joint analysis of relations in three important domains of videos, where it is widely known that video data certainly has intrinsic characteristics in temporal, spatial, and frequency domains, respectively. To address this issue, in the paper we propose a Three-domain Joint Deraining Network (TJDNet) for video rain streak removal. It composes of three network branches: temporal-spatial-frequency (TSF) branch, temporal-spatial (TS) branch, and spatial branch. In the proposed TJDNet, to capture spatial property for the current frame, is the common goal of these three branches. Moreover, we develop the TSF branch to specially pursue temporal-frequency relations between the wavelet subbands of the current frame and those of its adjacent frames. Furthermore, the TS branch is also designed to directly seize temporal correlations among successive frames. Finally, across-branch feature fusions are employed to propagate the features of one branch to enrich the information of another branch, further exploiting the characteristics of these three noteworthy domains. Compared with twenty-two state-of-the-art methods, experimental results show our proposed TJDNet achieves significantly better performance in both objective and subjective image qualities, particularly average PSNR increased by up to 2.10 dB. Our code will be available online at <span><span>https://github.com/YanZhanggugu/TJDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117400"},"PeriodicalIF":2.7,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-25DOI: 10.1016/j.image.2025.117399
Gourab Chatterjee, Debashis Das, Suman Kumar Maji
Image noise, commonly introduced during the acquisition process, significantly degrades visual quality and adversely affects downstream image processing tasks. To address this challenge while preserving fine structural details, we propose GIADNet: a Gradient-Inspired Attention-Driven Denoising Network. The proposed framework integrates gradient-guided feature enhancement, multi-scale representation learning, and attention-based refinement to achieve a superior balance between noise suppression and detail retention. In particular, the gradient information of the noisy input is fused with deep features early in the pipeline to enrich semantic representation. Furthermore, we introduce two dedicated modules: the Multi-Pooling Pixel Attention (MPPA) module, which adaptively emphasizes informative pixels, and the Multi-Scale Attention Block (MSAB), designed to capture hierarchical contextual dependencies across varying spatial resolutions. Extensive experiments on standard benchmarks demonstrate that GIADNet achieves highly competitive performance, surpassing several state-of-the-art methods in both quantitative metrics and visual quality. Ablation studies further validate the effectiveness of each component, underscoring the importance of our attention-guided multi-scale design in advancing the field of image denoising. Code is available at: https://github.com/debashis15/GIADNet.
{"title":"GIADNet: Gradient Inspired Attention Driven Denoising Network","authors":"Gourab Chatterjee, Debashis Das, Suman Kumar Maji","doi":"10.1016/j.image.2025.117399","DOIUrl":"10.1016/j.image.2025.117399","url":null,"abstract":"<div><div>Image noise, commonly introduced during the acquisition process, significantly degrades visual quality and adversely affects downstream image processing tasks. To address this challenge while preserving fine structural details, we propose GIADNet: a Gradient-Inspired Attention-Driven Denoising Network. The proposed framework integrates gradient-guided feature enhancement, multi-scale representation learning, and attention-based refinement to achieve a superior balance between noise suppression and detail retention. In particular, the gradient information of the noisy input is fused with deep features early in the pipeline to enrich semantic representation. Furthermore, we introduce two dedicated modules: the Multi-Pooling Pixel Attention (MPPA) module, which adaptively emphasizes informative pixels, and the Multi-Scale Attention Block (MSAB), designed to capture hierarchical contextual dependencies across varying spatial resolutions. Extensive experiments on standard benchmarks demonstrate that GIADNet achieves highly competitive performance, surpassing several state-of-the-art methods in both quantitative metrics and visual quality. Ablation studies further validate the effectiveness of each component, underscoring the importance of our attention-guided multi-scale design in advancing the field of image denoising. Code is available at: <span><span>https://github.com/debashis15/GIADNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117399"},"PeriodicalIF":2.7,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144907393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-22DOI: 10.1016/j.image.2025.117395
Xueyu Han , Xin Sun , Susanto Rahardja
This paper presents a fast tone mapping operator (TMO) that effectively reproduces high dynamic range (HDR) images on common displays while maintaining visual appeal. The proposed method addresses the trade-off between computational complexity and detail retention inherent in existing global and local TMOs by leveraging prior information. We construct a dynamic range compression model on the HDR luminance channel and introduce two priors to fast generate the low dynamic range (LDR) luminance channel. First, most local regions of the inverted LDR luminance channel have some very low intensity pixels. Second, the luminance of the global light layer is a constant. Besides, we propose an adaptive luminance normalization approach based on the brightness feature of the input HDR image, facilitating the stability of tone mapping performance. Detail enhancement and color attenuation techniques are also presented to improve local contrasts and manage over-saturation. The effectiveness of the proposed TMO is validated through comparison with state-of-the-art methods. Both subjective and objective results show that our method outperforms others in producing high-quality tone-mapped images. Additionally, it exhibits lower computational complexity than local TMOs while remaining comparable to global ones.
{"title":"Fast tone mapping operator for high dynamic range image using prior information","authors":"Xueyu Han , Xin Sun , Susanto Rahardja","doi":"10.1016/j.image.2025.117395","DOIUrl":"10.1016/j.image.2025.117395","url":null,"abstract":"<div><div>This paper presents a fast tone mapping operator (TMO) that effectively reproduces high dynamic range (HDR) images on common displays while maintaining visual appeal. The proposed method addresses the trade-off between computational complexity and detail retention inherent in existing global and local TMOs by leveraging prior information. We construct a dynamic range compression model on the HDR luminance channel and introduce two priors to fast generate the low dynamic range (LDR) luminance channel. First, most local regions of the inverted LDR luminance channel have some very low intensity pixels. Second, the luminance of the global light layer is a constant. Besides, we propose an adaptive luminance normalization approach based on the brightness feature of the input HDR image, facilitating the stability of tone mapping performance. Detail enhancement and color attenuation techniques are also presented to improve local contrasts and manage over-saturation. The effectiveness of the proposed TMO is validated through comparison with state-of-the-art methods. Both subjective and objective results show that our method outperforms others in producing high-quality tone-mapped images. Additionally, it exhibits lower computational complexity than local TMOs while remaining comparable to global ones.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117395"},"PeriodicalIF":2.7,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144896079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-20DOI: 10.1016/j.image.2025.117398
Ni Tang , Dongxiao Zhang , Yanyun Qu
Unsupervised image super-resolution offers distinct advantages for real-world applications by eliminating the need for paired high- and low-resolution images. This paper proposes a novel architecture specifically designed for unsupervised learning, consisting of a cycle branch and a diffusion branch. The cycle branch integrates an upsampling and a downsampling network to generate pseudo-paired images from unpaired high- and low-resolution inputs. In parallel, the diffusion branch incorporates two independent diffusion models that refine these pseudo pairs, jointly modeling the processes of image reconstruction and degradation. This collaborative design enhances the authenticity of the pseudo pairs and enriches the detail in the reconstructed images. A key challenge in unsupervised learning is the lack of explicit label supervision, which often leads to inaccurate color restoration. To address this, we introduce a color consistency loss that regulates the cycle branch and promotes color fidelity. Through joint end-to-end training, the two branches complement each other to achieve high-quality reconstruction. Experimental results demonstrate that the proposed method effectively handles real-world low-resolution images, providing a robust and practical solution for image super-resolution.
{"title":"Unsupervised image super-resolution recurrent network based on diffusion model","authors":"Ni Tang , Dongxiao Zhang , Yanyun Qu","doi":"10.1016/j.image.2025.117398","DOIUrl":"10.1016/j.image.2025.117398","url":null,"abstract":"<div><div>Unsupervised image super-resolution offers distinct advantages for real-world applications by eliminating the need for paired high- and low-resolution images. This paper proposes a novel architecture specifically designed for unsupervised learning, consisting of a cycle branch and a diffusion branch. The cycle branch integrates an upsampling and a downsampling network to generate pseudo-paired images from unpaired high- and low-resolution inputs. In parallel, the diffusion branch incorporates two independent diffusion models that refine these pseudo pairs, jointly modeling the processes of image reconstruction and degradation. This collaborative design enhances the authenticity of the pseudo pairs and enriches the detail in the reconstructed images. A key challenge in unsupervised learning is the lack of explicit label supervision, which often leads to inaccurate color restoration. To address this, we introduce a color consistency loss that regulates the cycle branch and promotes color fidelity. Through joint end-to-end training, the two branches complement each other to achieve high-quality reconstruction. Experimental results demonstrate that the proposed method effectively handles real-world low-resolution images, providing a robust and practical solution for image super-resolution.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117398"},"PeriodicalIF":2.7,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144896078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-11DOI: 10.1016/j.image.2025.117396
Xiangdong Gao, Liying Sun, Fan Zhang
This article presents BASP_YOLO, an enhanced multi-person pose estimation model designed to balance accuracy and speed for real-world applications. To address the computational complexity and limited robustness of existing methods, the proposed model integrates lightweight DSConv layers, a multi-scale fusion module combining BiFPN and efficient attention mechanisms, an optimized spatial pyramid pooling module with CSPC connections, and an SPD-DS module to mitigate channel information loss. Evaluated on the MS COCO dataset, BASP_YOLO achieves a [email protected] of 84.6 % at 54 FPS, outperforming mainstream models like YOLO-Pose and OpenPose. The improvements reduce computational load by 52.2 % while enhancing occlusion handling, small-object detection, and robustness to environmental interference. The effectiveness of the model improvements was further validated using the MPII dataset. This work improves the accuracy of pose estimation while compromising real-time performance as little as possible, advancing deployment feasibility in resource-constrained scenarios.
{"title":"Research on a multi-person pose estimation model to balance accuracy and speed","authors":"Xiangdong Gao, Liying Sun, Fan Zhang","doi":"10.1016/j.image.2025.117396","DOIUrl":"10.1016/j.image.2025.117396","url":null,"abstract":"<div><div>This article presents BASP_YOLO, an enhanced multi-person pose estimation model designed to balance accuracy and speed for real-world applications. To address the computational complexity and limited robustness of existing methods, the proposed model integrates lightweight DSConv layers, a multi-scale fusion module combining BiFPN and efficient attention mechanisms, an optimized spatial pyramid pooling module with CSPC connections, and an SPD-DS module to mitigate channel information loss. Evaluated on the MS COCO dataset, BASP_YOLO achieves a [email protected] of 84.6 % at 54 FPS, outperforming mainstream models like YOLO-Pose and OpenPose. The improvements reduce computational load by 52.2 % while enhancing occlusion handling, small-object detection, and robustness to environmental interference. The effectiveness of the model improvements was further validated using the MPII dataset. This work improves the accuracy of pose estimation while compromising real-time performance as little as possible, advancing deployment feasibility in resource-constrained scenarios.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117396"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144827689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-11DOI: 10.1016/j.image.2025.117393
Cemre Müge Bilsay , Hakkı Alparslan Ilgın
Measuring the perceptual visual quality is an important task for many image and video processing applications. Although, the most accurate results are obtained through subjective evaluation, the process is quite time-consuming. To ease the process, many image quality assessment (IQA) algorithms are designed using different approaches to account for various aspects of the human visual system (HVS) over the years. Evaluating the performance of these algorithms typically involves comparison of their scores to subjective scores using Pearson Linear Correlation Coefficient (PLCC). However, because the relationship between objective and subjective scores is often inherently nonlinear, applying a nonlinear mapping, most commonly the 5-parameter logistic function proposed by Video Quality Experts Group (VQEG), prior to performance evaluation is a standard practice in the literature. In this paper, we propose a novel piecewise linearization scheme as an alternative to the widely used nonlinear mapping function. Our method employs a data dependent piecewise linear mapping to align objective metric scores with subjective quality scores, which is applicable to many different IQA metrics. We validate the effectiveness of the proposed method on three publicly available datasets (CSIQ, TID2008, TID2013) and seven different IQA metrics, using PLCC as the primary performance indicator. Experimental results show that our linearization method effectively scales metric scores and achieves stronger correlations with subjective scores yielding a higher prediction accuracy. Code to reproduce our results is publicly available at github.com/cemremuge/PiecewiseLinearization.
{"title":"A new method to improve the precision of image quality assessment metrics: Piecewise linearization of the relationship between the metrics and mean opinion scores","authors":"Cemre Müge Bilsay , Hakkı Alparslan Ilgın","doi":"10.1016/j.image.2025.117393","DOIUrl":"10.1016/j.image.2025.117393","url":null,"abstract":"<div><div>Measuring the perceptual visual quality is an important task for many image and video processing applications. Although, the most accurate results are obtained through subjective evaluation, the process is quite time-consuming. To ease the process, many image quality assessment (IQA) algorithms are designed using different approaches to account for various aspects of the human visual system (HVS) over the years. Evaluating the performance of these algorithms typically involves comparison of their scores to subjective scores using Pearson Linear Correlation Coefficient (PLCC). However, because the relationship between objective and subjective scores is often inherently nonlinear, applying a nonlinear mapping, most commonly the 5-parameter logistic function proposed by Video Quality Experts Group (VQEG), prior to performance evaluation is a standard practice in the literature. In this paper, we propose a novel piecewise linearization scheme as an alternative to the widely used nonlinear mapping function. Our method employs a data dependent piecewise linear mapping to align objective metric scores with subjective quality scores, which is applicable to many different IQA metrics. We validate the effectiveness of the proposed method on three publicly available datasets (CSIQ, TID2008, TID2013) and seven different IQA metrics, using PLCC as the primary performance indicator. Experimental results show that our linearization method effectively scales metric scores and achieves stronger correlations with subjective scores yielding a higher prediction accuracy. Code to reproduce our results is publicly available at github.com/cemremuge/PiecewiseLinearization.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117393"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144860329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-09DOI: 10.1016/j.image.2025.117397
Yiwei Huang , Hui Ma , Jianian Li , Mingyang Wang
With digitization comes cyber threats and security vulnerabilities, biometric subject has increasingly evolved from unimodal recognition to more secure and accurate forms of multimodal. However, most existing methods focus on the optimal generation of fusion weighting parameters and the design of models with fixed architecture, and such fixed-architecture fusion methods have difficulties in accurately modeling multimodal finger features with large differences in image distributions. In this paper, a Tree-based Hierarchical Fusion Network (THiFNet) is proposed to fuse features of different modalities by adaptively exploring the common feature space using their interdependencies generated in the convolutional tree. First, in order to extract multi-scale features contained in fingerprint and finger vein images, a Residual Non-Local (Res-NL) backbone network is proposed to compute long-range point-to-point relationships while avoiding the loss of minutiae features extracted by shallow convolutional filters. Further, to adaptively bridge the cross-modal heterogeneity gap, a novel Hierarchical Convolutional Tree (HiCT) is proposed to generate interdependencies between different modalities and within the same modality via channel attention. The primary advantage is that the attention modules used for fusion are dynamically selected by the tree network, modeling a more diverse common feature space and improving accuracy within a limited recognition time. Experimental results on three multimodal finger feature datasets show the framework achieves state-of-the-art results when compared with the other methods.
{"title":"Tree-based hierarchical fusion network for multimodal finger recognition","authors":"Yiwei Huang , Hui Ma , Jianian Li , Mingyang Wang","doi":"10.1016/j.image.2025.117397","DOIUrl":"10.1016/j.image.2025.117397","url":null,"abstract":"<div><div>With digitization comes cyber threats and security vulnerabilities, biometric subject has increasingly evolved from unimodal recognition to more secure and accurate forms of multimodal. However, most existing methods focus on the optimal generation of fusion weighting parameters and the design of models with fixed architecture, and such fixed-architecture fusion methods have difficulties in accurately modeling multimodal finger features with large differences in image distributions. In this paper, a Tree-based Hierarchical Fusion Network (THiFNet) is proposed to fuse features of different modalities by adaptively exploring the common feature space using their interdependencies generated in the convolutional tree. First, in order to extract multi-scale features contained in fingerprint and finger vein images, a Residual Non-Local (Res-NL) backbone network is proposed to compute long-range point-to-point relationships while avoiding the loss of minutiae features extracted by shallow convolutional filters. Further, to adaptively bridge the cross-modal heterogeneity gap, a novel Hierarchical Convolutional Tree (HiCT) is proposed to generate interdependencies between different modalities and within the same modality via channel attention. The primary advantage is that the attention modules used for fusion are dynamically selected by the tree network, modeling a more diverse common feature space and improving accuracy within a limited recognition time. Experimental results on three multimodal finger feature datasets show the framework achieves state-of-the-art results when compared with the other methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117397"},"PeriodicalIF":2.7,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144827688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-09DOI: 10.1016/j.image.2025.117394
Dan Xiang , Huihua Wang , Zebin Zhou , Jing Ling , Pan Gao , Jinwen Zhang , Chun Shan
Underwater images often face some unique challenges and problems caused by the complexity of the underwater environment, mainly including color distortion and image blur with low contrast. To address these problems, we propose an underwater image enhancement technique based on visual perceptual fusion. This method is divided into three stages: color correction, contrast enhancement and multi-task fusion method. In color correction, the statistical properties are combined with the relationship between the analyzed color channels to construct an adaptive compensation method for achieving color correction. Additionally, the channels are converted to a linear space to build a color adaptation matrix, enabling the algorithm to adapt to the effects of different light sources. For contrast and detail texture enhancement, the channel information across the color space is analyzed and processed separately in the LAB channel, and an advanced method of multi-scale decomposition is used to enhance the grayscale information in the L channel. Then, the details are fused with the base layer to enhance the overall details of the image. Finally, we calculate the similarity and gradient of the images from the above two methods with the original image, and then by calculating the weights to achieve high-quality underwater images. Through a large number of experiments, it is proved that our method can not only preserve the details and layering of the image, but also the image has better visual effect and good performance in both qualitative and quantitative evaluation.
{"title":"Underwater image enhancement based on visual perception fusion","authors":"Dan Xiang , Huihua Wang , Zebin Zhou , Jing Ling , Pan Gao , Jinwen Zhang , Chun Shan","doi":"10.1016/j.image.2025.117394","DOIUrl":"10.1016/j.image.2025.117394","url":null,"abstract":"<div><div>Underwater images often face some unique challenges and problems caused by the complexity of the underwater environment, mainly including color distortion and image blur with low contrast. To address these problems, we propose an underwater image enhancement technique based on visual perceptual fusion. This method is divided into three stages: color correction, contrast enhancement and multi-task fusion method. In color correction, the statistical properties are combined with the relationship between the analyzed color channels to construct an adaptive compensation method for achieving color correction. Additionally, the channels are converted to a linear space to build a color adaptation matrix, enabling the algorithm to adapt to the effects of different light sources. For contrast and detail texture enhancement, the channel information across the color space is analyzed and processed separately in the LAB channel, and an advanced method of multi-scale decomposition is used to enhance the grayscale information in the L channel. Then, the details are fused with the base layer to enhance the overall details of the image. Finally, we calculate the similarity and gradient of the images from the above two methods with the original image, and then by calculating the weights to achieve high-quality underwater images. Through a large number of experiments, it is proved that our method can not only preserve the details and layering of the image, but also the image has better visual effect and good performance in both qualitative and quantitative evaluation.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117394"},"PeriodicalIF":2.7,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144827687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-18DOI: 10.1016/j.image.2025.117384
Uğur Erkan , Ahmet Yilmaz , Abdurrahim Toktas , Qiang Lai , Suo Gao
Image Retrieval (IR), which returns similar images from a large image database, has become an important task as multimedia data grows. Existing studies utilize hash code representing the image features generated from the whole image, including redundant semantics from the background. In this study, a novel Object Detection-based Hashing IR (ODH-IR) scheme using You Only Look Once (YOLO) and an autoencoder is presented to ignore clutter in the images. Integration of YOLO and the autoencoder provides the most representative hash code depending on meaningful objects in the images. The autoencoder is exploited to compress the detected object vector to the desired bit length of the hash code. The ODH-IR scheme is validated by comparison with the state of the art through three well-known datasets in terms of precise metrics. The ODH-IR totally has the best 35 metric results over 36 measurements and the best avg. mean rank of 1.03. Moreover, it is observed from the three illustrative IR examples that it retrieves the most relevant semantics. The results demonstrate that the ODH-IR is an impactful scheme thanks to the effective hashing method through object detection using YOLO and the autoencoder.
随着多媒体数据的增长,从大型图像数据库中返回相似图像的图像检索(IR)已成为一项重要任务。现有的研究利用哈希码表示从整个图像生成的图像特征,包括来自背景的冗余语义。在这项研究中,提出了一种新的基于目标检测的哈希红外(ODH-IR)方案,该方案使用You Only Look Once (YOLO)和自动编码器来忽略图像中的杂波。YOLO和自动编码器的集成根据图像中有意义的对象提供了最具代表性的哈希码。利用自动编码器将检测到的对象向量压缩到哈希码的所需位长度。ODH-IR方案通过三个众所周知的精确度量数据集与最新技术的比较来验证。ODH-IR在36次测量中共获得35个指标的最佳结果,最佳平均排名为1.03。此外,从三个说明性IR示例中可以观察到,它检索了最相关的语义。结果表明,ODH-IR是一种有效的哈希方法,利用YOLO和自编码器进行目标检测。
{"title":"Object detection-based deep autoencoder hashing image retrieval","authors":"Uğur Erkan , Ahmet Yilmaz , Abdurrahim Toktas , Qiang Lai , Suo Gao","doi":"10.1016/j.image.2025.117384","DOIUrl":"10.1016/j.image.2025.117384","url":null,"abstract":"<div><div>Image Retrieval (IR), which returns similar images from a large image database, has become an important task as multimedia data grows. Existing studies utilize hash code representing the image features generated from the whole image, including redundant semantics from the background. In this study, a novel Object Detection-based Hashing IR (ODH-IR) scheme using You Only Look Once (YOLO) and an autoencoder is presented to ignore clutter in the images. Integration of YOLO and the autoencoder provides the most representative hash code depending on meaningful objects in the images. The autoencoder is exploited to compress the detected object vector to the desired bit length of the hash code. The ODH-IR scheme is validated by comparison with the state of the art through three well-known datasets in terms of precise metrics. The ODH-IR totally has the best 35 metric results over 36 measurements and the best avg. mean rank of 1.03. Moreover, it is observed from the three illustrative IR examples that it retrieves the most relevant semantics. The results demonstrate that the ODH-IR is an impactful scheme thanks to the effective hashing method through object detection using YOLO and the autoencoder.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117384"},"PeriodicalIF":3.4,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144694958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}