Pub Date : 2025-09-25DOI: 10.1016/j.image.2025.117413
Ming Jin , Huaxiang Zhang , Lei Zhu , Jiande Sun , Li Liu
With the proliferation of video on the Internet, users demand higher precision and efficiency of retrieval technology. The current cross-modal retrieval technology mainly has the following problems: firstly, there is no effective alignment of the same semantic objects between video and text. Secondly, the existing neural networks destroy the spatial features of the video when establishing the temporal features of the video. Finally, the extraction and processing of the text’s local features are too complex, which increases the network complexity. To address the existing problems, we proposed a text-video semantic center alignment network. First, a semantic center alignment module was constructed to promote the alignment of semantic features of the same object across different modalities. Second, a pre-trained BERT based on a residual structure was designed to protect spatial information when inferring temporal information. Finally, the “jieba” library was employed to extract the local key information of the text, thereby simplifying the complexity of local feature extraction. The effectiveness of the network structure was evaluated on the MSVD, MSR-VTT, and DiDeMo datasets.
{"title":"Video and text semantic center alignment for text-video cross-modal retrieval","authors":"Ming Jin , Huaxiang Zhang , Lei Zhu , Jiande Sun , Li Liu","doi":"10.1016/j.image.2025.117413","DOIUrl":"10.1016/j.image.2025.117413","url":null,"abstract":"<div><div>With the proliferation of video on the Internet, users demand higher precision and efficiency of retrieval technology. The current cross-modal retrieval technology mainly has the following problems: firstly, there is no effective alignment of the same semantic objects between video and text. Secondly, the existing neural networks destroy the spatial features of the video when establishing the temporal features of the video. Finally, the extraction and processing of the text’s local features are too complex, which increases the network complexity. To address the existing problems, we proposed a text-video semantic center alignment network. First, a semantic center alignment module was constructed to promote the alignment of semantic features of the same object across different modalities. Second, a pre-trained BERT based on a residual structure was designed to protect spatial information when inferring temporal information. Finally, the “jieba” library was employed to extract the local key information of the text, thereby simplifying the complexity of local feature extraction. The effectiveness of the network structure was evaluated on the MSVD, MSR-VTT, and DiDeMo datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117413"},"PeriodicalIF":2.7,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1016/j.image.2025.117406
Anirban Saha, Harshit Singh, Suman Kumar Maji
Synthetic Aperture Radar (SAR) imagery is inherently marred by speckle noise, which undermines image quality and complicates subsequent analytical endeavors. While numerous strategies have been suggested in existing literature to mitigate this unwanted noise, the challenge of eliminating speckle while conserving subtle structural and textural details inherent in the raw data remains unresolved. In this article, we propose a comprehensive approach combining multi-domain analysis with gradient information processing for SAR. Our method aims to effectively suppress speckle noise while retaining crucial image characteristics. By leveraging multi-domain analysis techniques, we exploit both spatial and frequency domain information to gain a deeper insight into image structures. Additionally, we introduce a novel gradient information processing step that utilizes local gradient attributes to guide the process. Experimental results obtained from synthetic and real SAR imagery illustrate the effectiveness of our approach in terms of speckle noise reduction and preservation of image features. Quantitative assessments demonstrate substantial enhancements in image quality, indicating superior performance compared to current state-of-the-art methods.
{"title":"Integrated multi-channel approach for speckle noise reduction in SAR imagery using gradient, spatial, and frequency analysis","authors":"Anirban Saha, Harshit Singh, Suman Kumar Maji","doi":"10.1016/j.image.2025.117406","DOIUrl":"10.1016/j.image.2025.117406","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) imagery is inherently marred by speckle noise, which undermines image quality and complicates subsequent analytical endeavors. While numerous strategies have been suggested in existing literature to mitigate this unwanted noise, the challenge of eliminating speckle while conserving subtle structural and textural details inherent in the raw data remains unresolved. In this article, we propose a comprehensive approach combining multi-domain analysis with gradient information processing for SAR. Our method aims to effectively suppress speckle noise while retaining crucial image characteristics. By leveraging multi-domain analysis techniques, we exploit both spatial and frequency domain information to gain a deeper insight into image structures. Additionally, we introduce a novel gradient information processing step that utilizes local gradient attributes to guide the process. Experimental results obtained from synthetic and real SAR imagery illustrate the effectiveness of our approach in terms of speckle noise reduction and preservation of image features. Quantitative assessments demonstrate substantial enhancements in image quality, indicating superior performance compared to current state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117406"},"PeriodicalIF":2.7,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-22DOI: 10.1016/j.image.2025.117408
Akhil Kumar , R. Dhanalakshmi , R. Rajesh , R. Sendhil
Shadow detection in computer vision is challenging due to the difficulty in distinguishing shadows from similarly colored or dark objects. Variations in lighting, background textures, and object shapes further complicate accurate detection. This work introduces NS-YOLO, a novel Tiny YOLO variant designed for the specific task of shadow detection under varying conditions. The new architecture includes a small-scale feature extraction network improvised by global attention mechanism, multi-scale spatial attention, and a spatial pyramid pooling block, while preserving effective multi-scale contextual information. In addition, a weight-adjusted CIOU loss function is introduced for enhancing localization accuracy. The proposed architecture addresses shadow detection by effectively capturing both fine details and global context, helping distinguish shadows from similar dark regions. The enhanced loss function improves boundary localization, reducing false detections and improving accuracy. The NS-YOLO is trained end-to-end from scratch on the SBU and ISTD datasets. The experiments show that NS-YOLO achieves a detection accuracy (mAP) of 59.2 % while utilizing only 35.6 BFLOPs. In comparison with existing lightweight YOLO variants that is, Tiny YOLO and YOLO Nano models proposed between 2017–2025, NS-YOLO shows a relative mAP improvement of 2.5 - 50.1 %. These results highlight its efficiency and effectiveness and make it particularly suitable for deployment on resource-limited edge devices in real-time scenarios, e.g., video surveillance and advanced driver-assistance systems (ADAS).
{"title":"A spatial features and weight adjusted loss infused Tiny YOLO for shadow detection","authors":"Akhil Kumar , R. Dhanalakshmi , R. Rajesh , R. Sendhil","doi":"10.1016/j.image.2025.117408","DOIUrl":"10.1016/j.image.2025.117408","url":null,"abstract":"<div><div>Shadow detection in computer vision is challenging due to the difficulty in distinguishing shadows from similarly colored or dark objects. Variations in lighting, background textures, and object shapes further complicate accurate detection. This work introduces NS-YOLO, a novel Tiny YOLO variant designed for the specific task of shadow detection under varying conditions. The new architecture includes a small-scale feature extraction network improvised by global attention mechanism, multi-scale spatial attention, and a spatial pyramid pooling block, while preserving effective multi-scale contextual information. In addition, a weight-adjusted CIOU loss function is introduced for enhancing localization accuracy. The proposed architecture addresses shadow detection by effectively capturing both fine details and global context, helping distinguish shadows from similar dark regions. The enhanced loss function improves boundary localization, reducing false detections and improving accuracy. The NS-YOLO is trained end-to-end from scratch on the SBU and ISTD datasets. The experiments show that NS-YOLO achieves a detection accuracy (mAP) of 59.2 % while utilizing only 35.6 BFLOPs. In comparison with existing lightweight YOLO variants that is, Tiny YOLO and YOLO Nano models proposed between 2017–2025, NS-YOLO shows a relative mAP improvement of 2.5 - 50.1 %. These results highlight its efficiency and effectiveness and make it particularly suitable for deployment on resource-limited edge devices in real-time scenarios, e.g., video surveillance and advanced driver-assistance systems (ADAS).</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117408"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-19DOI: 10.1016/j.image.2025.117407
Tiecheng Song, Qi Liu, Anyong Qin, Yin Liu
In recent years, Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success in a range of visual tasks by aligning visual and textual features. However, it remains a challenge to improve the robustness of CLIP for rotated images, especially for remote sensing images (RSIs) where objects can present various orientations. In this paper, we propose a Rotation Robust CLIP model, termed RotCLIP, to achieve the rotation robust classification of RSIs with a visual adapter and dual textual prompts. Specifically, we first compute the original and rotated visual features through the image encoder of CLIP and the proposed Rotation Adapter (Rot-Adapter). Then, we explore dual textual prompts to compute the textual features which describe original and rotated visual features through the text encoder of CLIP. Based on this, we further build a rotation robust loss to limit the distance of the two visual features. Finally, by taking advantage of the powerful image-text alignment ability of CLIP, we build a global discriminative classification loss by combining the prediction results of both original and rotated image-text features. To verify the effect of our RotCLIP, we conduct experiments on three RSI datasets, including the EuroSAT dataset used for scene classification, and the NWPU-VHR-10 and RSOD datasets used for object classification. Experimental results show that the proposed RotCLIP improves the robustness of CLIP against image rotation, outperforming several state-of-the-art methods.
{"title":"RotCLIP: Tuning CLIP with visual adapter and textual prompts for rotation robust remote sensing image classification","authors":"Tiecheng Song, Qi Liu, Anyong Qin, Yin Liu","doi":"10.1016/j.image.2025.117407","DOIUrl":"10.1016/j.image.2025.117407","url":null,"abstract":"<div><div>In recent years, Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success in a range of visual tasks by aligning visual and textual features. However, it remains a challenge to improve the robustness of CLIP for rotated images, especially for remote sensing images (RSIs) where objects can present various orientations. In this paper, we propose a Rotation Robust CLIP model, termed RotCLIP, to achieve the rotation robust classification of RSIs with a visual adapter and dual textual prompts. Specifically, we first compute the original and rotated visual features through the image encoder of CLIP and the proposed Rotation Adapter (Rot-Adapter). Then, we explore dual textual prompts to compute the textual features which describe original and rotated visual features through the text encoder of CLIP. Based on this, we further build a rotation robust loss to limit the distance of the two visual features. Finally, by taking advantage of the powerful image-text alignment ability of CLIP, we build a global discriminative classification loss by combining the prediction results of both original and rotated image-text features. To verify the effect of our RotCLIP, we conduct experiments on three RSI datasets, including the EuroSAT dataset used for scene classification, and the NWPU-VHR-10 and RSOD datasets used for object classification. Experimental results show that the proposed RotCLIP improves the robustness of CLIP against image rotation, outperforming several state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117407"},"PeriodicalIF":2.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-15DOI: 10.1016/j.image.2025.117402
Chothmal Kumawat , Vinod Pankajakshan
Estimating JPEG quantization step size from a JPEG image stored in a lossless format after the decompression (D-JPEG image) is a challenging problem in image forensics. The presence of forgery or additive noise in the D-JPEG image makes the quantization step estimation even more difficult. This paper proposes a novel quantization step estimation method robust to noise addition and forgery. First, we propose a statistical model for the subband DCT coefficients of forged and noisy D-JPEG images. We then show that the periodicity in the difference between the absolute values of rounded DCT coefficients in a subband of a D-JPEG image and those of the corresponding never-compressed image can be used for reliably estimating the JPEG quantization step. The proposed quantization step estimation method is based on this observation. Detailed experimental results reported in this paper demonstrate the robustness of the proposed method against noise addition and forgery. The experimental results also demonstrate that the quantization steps estimated using the proposed method can be used for localizing forgeries in D-JPEG images.
{"title":"A robust JPEG quantization step estimation method for image forensics","authors":"Chothmal Kumawat , Vinod Pankajakshan","doi":"10.1016/j.image.2025.117402","DOIUrl":"10.1016/j.image.2025.117402","url":null,"abstract":"<div><div>Estimating JPEG quantization step size from a JPEG image stored in a lossless format after the decompression (D-JPEG image) is a challenging problem in image forensics. The presence of forgery or additive noise in the D-JPEG image makes the quantization step estimation even more difficult. This paper proposes a novel quantization step estimation method robust to noise addition and forgery. First, we propose a statistical model for the subband DCT coefficients of forged and noisy D-JPEG images. We then show that the periodicity in the difference between the absolute values of rounded DCT coefficients in a subband of a D-JPEG image and those of the corresponding never-compressed image can be used for reliably estimating the JPEG quantization step. The proposed quantization step estimation method is based on this observation. Detailed experimental results reported in this paper demonstrate the robustness of the proposed method against noise addition and forgery. The experimental results also demonstrate that the quantization steps estimated using the proposed method can be used for localizing forgeries in D-JPEG images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117402"},"PeriodicalIF":2.7,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145120363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-13DOI: 10.1016/j.image.2025.117405
Abhinau K. Venkataramanan , Cosmin Stejerean , Ioannis Katsavounidis , Hassene Tmar , Alan C. Bovik
High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As a result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.
{"title":"Cut-FUNQUE: An objective quality model for compressed tone-mapped High Dynamic Range videos","authors":"Abhinau K. Venkataramanan , Cosmin Stejerean , Ioannis Katsavounidis , Hassene Tmar , Alan C. Bovik","doi":"10.1016/j.image.2025.117405","DOIUrl":"10.1016/j.image.2025.117405","url":null,"abstract":"<div><div>High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As a result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117405"},"PeriodicalIF":2.7,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-10DOI: 10.1016/j.image.2025.117403
Jun Wang, Lei Wan, Xin Zhang, Xiaotian Cao
Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at https://github.com/davidsmithwj/CS-CS-RCNN.
{"title":"Redundant contextual feature suppression for pedestrian detection in dense scenes","authors":"Jun Wang, Lei Wan, Xin Zhang, Xiaotian Cao","doi":"10.1016/j.image.2025.117403","DOIUrl":"10.1016/j.image.2025.117403","url":null,"abstract":"<div><div>Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the <span><math><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow></math></span> convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at <span><span>https://github.com/davidsmithwj/CS-CS-RCNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117403"},"PeriodicalIF":2.7,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-09DOI: 10.1016/j.image.2025.117404
Yang Chen, Guirong Weng
With regards to figure with inhomogeneous intensity, the models based on active contour model have been widely used. Compared with the classic models, this paper proposes an optimized additive model which contains the edge structure and inhomogeneous components. Second, by introducing a novel clustering criterion, the value of the bias field can be estimated before iteration, greatly speeding the evloving process and reducing the calculation cost. Thus, an improved energy function is drawn out. Considering the gradient descent flow formula, a novel error function and adaptive parameter are utilized to improve the performance of the data term. Finally, the proposed regularization terms ensure the evloving process is more efficient and accurate. Owing to the above mentioned improvements, the proposed model in this paper has excellent performance of the segmentation in terms of robustness, effectiveness and accuracy.
{"title":"Active contour model based on pre- additive bias field fitting image","authors":"Yang Chen, Guirong Weng","doi":"10.1016/j.image.2025.117404","DOIUrl":"10.1016/j.image.2025.117404","url":null,"abstract":"<div><div>With regards to figure with inhomogeneous intensity, the models based on active contour model have been widely used. Compared with the classic models, this paper proposes an optimized additive model which contains the edge structure and inhomogeneous components. Second, by introducing a novel clustering criterion, the value of the bias field can be estimated before iteration, greatly speeding the evloving process and reducing the calculation cost. Thus, an improved energy function is drawn out. Considering the gradient descent flow formula, a novel error function and adaptive parameter are utilized to improve the performance of the data term. Finally, the proposed regularization terms ensure the evloving process is more efficient and accurate. Owing to the above mentioned improvements, the proposed model in this paper has excellent performance of the segmentation in terms of robustness, effectiveness and accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117404"},"PeriodicalIF":2.7,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145095840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-28DOI: 10.1016/j.image.2025.117401
Ntivuguruzwa Jean De La Croix , Tohari Ahmad , Fengling Han , Royyana Muslim Ijtihadie
Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F1 Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.
{"title":"NTRF-Net: A fuzzy logic-enhanced convolutional neural network for detecting hidden data in digital images","authors":"Ntivuguruzwa Jean De La Croix , Tohari Ahmad , Fengling Han , Royyana Muslim Ijtihadie","doi":"10.1016/j.image.2025.117401","DOIUrl":"10.1016/j.image.2025.117401","url":null,"abstract":"<div><div>Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F<sub>1</sub> Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117401"},"PeriodicalIF":2.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-27DOI: 10.1016/j.image.2025.117400
Wei Wu , Wenzhuo Zhai , Yong Liu , Xianbin Hu , Tailin Yang , Zhu Li
When shot outdoors in rainy weather, a rather complex and dynamic changed rain streak layer will have to be added to an original clean video, greatly degrading the performance of advanced outdoor vision systems. Currently, some excellent video deraining algorithms have been proposed and produce good results. However, these approaches neglect the joint analysis of relations in three important domains of videos, where it is widely known that video data certainly has intrinsic characteristics in temporal, spatial, and frequency domains, respectively. To address this issue, in the paper we propose a Three-domain Joint Deraining Network (TJDNet) for video rain streak removal. It composes of three network branches: temporal-spatial-frequency (TSF) branch, temporal-spatial (TS) branch, and spatial branch. In the proposed TJDNet, to capture spatial property for the current frame, is the common goal of these three branches. Moreover, we develop the TSF branch to specially pursue temporal-frequency relations between the wavelet subbands of the current frame and those of its adjacent frames. Furthermore, the TS branch is also designed to directly seize temporal correlations among successive frames. Finally, across-branch feature fusions are employed to propagate the features of one branch to enrich the information of another branch, further exploiting the characteristics of these three noteworthy domains. Compared with twenty-two state-of-the-art methods, experimental results show our proposed TJDNet achieves significantly better performance in both objective and subjective image qualities, particularly average PSNR increased by up to 2.10 dB. Our code will be available online at https://github.com/YanZhanggugu/TJDNet.
{"title":"Three-domain joint deraining network for video rain streak removal","authors":"Wei Wu , Wenzhuo Zhai , Yong Liu , Xianbin Hu , Tailin Yang , Zhu Li","doi":"10.1016/j.image.2025.117400","DOIUrl":"10.1016/j.image.2025.117400","url":null,"abstract":"<div><div>When shot outdoors in rainy weather, a rather complex and dynamic changed rain streak layer will have to be added to an original clean video, greatly degrading the performance of advanced outdoor vision systems. Currently, some excellent video deraining algorithms have been proposed and produce good results. However, these approaches neglect the joint analysis of relations in three important domains of videos, where it is widely known that video data certainly has intrinsic characteristics in temporal, spatial, and frequency domains, respectively. To address this issue, in the paper we propose a Three-domain Joint Deraining Network (TJDNet) for video rain streak removal. It composes of three network branches: temporal-spatial-frequency (TSF) branch, temporal-spatial (TS) branch, and spatial branch. In the proposed TJDNet, to capture spatial property for the current frame, is the common goal of these three branches. Moreover, we develop the TSF branch to specially pursue temporal-frequency relations between the wavelet subbands of the current frame and those of its adjacent frames. Furthermore, the TS branch is also designed to directly seize temporal correlations among successive frames. Finally, across-branch feature fusions are employed to propagate the features of one branch to enrich the information of another branch, further exploiting the characteristics of these three noteworthy domains. Compared with twenty-two state-of-the-art methods, experimental results show our proposed TJDNet achieves significantly better performance in both objective and subjective image qualities, particularly average PSNR increased by up to 2.10 dB. Our code will be available online at <span><span>https://github.com/YanZhanggugu/TJDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117400"},"PeriodicalIF":2.7,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}