首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
Video and text semantic center alignment for text-video cross-modal retrieval 文本-视频跨模态检索的视频和文本语义中心对齐
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-25 DOI: 10.1016/j.image.2025.117413
Ming Jin , Huaxiang Zhang , Lei Zhu , Jiande Sun , Li Liu
With the proliferation of video on the Internet, users demand higher precision and efficiency of retrieval technology. The current cross-modal retrieval technology mainly has the following problems: firstly, there is no effective alignment of the same semantic objects between video and text. Secondly, the existing neural networks destroy the spatial features of the video when establishing the temporal features of the video. Finally, the extraction and processing of the text’s local features are too complex, which increases the network complexity. To address the existing problems, we proposed a text-video semantic center alignment network. First, a semantic center alignment module was constructed to promote the alignment of semantic features of the same object across different modalities. Second, a pre-trained BERT based on a residual structure was designed to protect spatial information when inferring temporal information. Finally, the “jieba” library was employed to extract the local key information of the text, thereby simplifying the complexity of local feature extraction. The effectiveness of the network structure was evaluated on the MSVD, MSR-VTT, and DiDeMo datasets.
随着互联网上视频的激增,用户对检索技术的精度和效率提出了更高的要求。目前的跨模态检索技术主要存在以下问题:首先,视频和文本之间没有对相同的语义对象进行有效的对齐。其次,现有的神经网络在建立视频的时间特征时破坏了视频的空间特征。最后,文本局部特征的提取和处理过于复杂,增加了网络的复杂度。针对存在的问题,我们提出了一种文本-视频语义中心对齐网络。首先,构建语义中心对齐模块,促进同一对象在不同模态上的语义特征对齐;其次,设计了基于残差结构的预训练BERT,在推断时间信息时保护空间信息;最后,利用“jieba”库提取文本的局部关键信息,从而简化了局部特征提取的复杂度。在MSVD、MSR-VTT和DiDeMo数据集上评估了网络结构的有效性。
{"title":"Video and text semantic center alignment for text-video cross-modal retrieval","authors":"Ming Jin ,&nbsp;Huaxiang Zhang ,&nbsp;Lei Zhu ,&nbsp;Jiande Sun ,&nbsp;Li Liu","doi":"10.1016/j.image.2025.117413","DOIUrl":"10.1016/j.image.2025.117413","url":null,"abstract":"<div><div>With the proliferation of video on the Internet, users demand higher precision and efficiency of retrieval technology. The current cross-modal retrieval technology mainly has the following problems: firstly, there is no effective alignment of the same semantic objects between video and text. Secondly, the existing neural networks destroy the spatial features of the video when establishing the temporal features of the video. Finally, the extraction and processing of the text’s local features are too complex, which increases the network complexity. To address the existing problems, we proposed a text-video semantic center alignment network. First, a semantic center alignment module was constructed to promote the alignment of semantic features of the same object across different modalities. Second, a pre-trained BERT based on a residual structure was designed to protect spatial information when inferring temporal information. Finally, the “jieba” library was employed to extract the local key information of the text, thereby simplifying the complexity of local feature extraction. The effectiveness of the network structure was evaluated on the MSVD, MSR-VTT, and DiDeMo datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117413"},"PeriodicalIF":2.7,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated multi-channel approach for speckle noise reduction in SAR imagery using gradient, spatial, and frequency analysis 基于梯度、空间和频率分析的综合多通道SAR图像散斑降噪方法
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-23 DOI: 10.1016/j.image.2025.117406
Anirban Saha, Harshit Singh, Suman Kumar Maji
Synthetic Aperture Radar (SAR) imagery is inherently marred by speckle noise, which undermines image quality and complicates subsequent analytical endeavors. While numerous strategies have been suggested in existing literature to mitigate this unwanted noise, the challenge of eliminating speckle while conserving subtle structural and textural details inherent in the raw data remains unresolved. In this article, we propose a comprehensive approach combining multi-domain analysis with gradient information processing for SAR. Our method aims to effectively suppress speckle noise while retaining crucial image characteristics. By leveraging multi-domain analysis techniques, we exploit both spatial and frequency domain information to gain a deeper insight into image structures. Additionally, we introduce a novel gradient information processing step that utilizes local gradient attributes to guide the process. Experimental results obtained from synthetic and real SAR imagery illustrate the effectiveness of our approach in terms of speckle noise reduction and preservation of image features. Quantitative assessments demonstrate substantial enhancements in image quality, indicating superior performance compared to current state-of-the-art methods.
合成孔径雷达(SAR)图像本身就受到散斑噪声的影响,这会破坏图像质量,并使后续的分析工作复杂化。虽然现有文献中提出了许多策略来减轻这种不必要的噪声,但在保留原始数据中固有的微妙结构和纹理细节的同时消除斑点的挑战仍然没有解决。在本文中,我们提出了一种将多域分析与梯度信息处理相结合的SAR综合方法,该方法旨在有效地抑制散斑噪声,同时保留关键的图像特征。通过利用多域分析技术,我们利用空间和频域信息来更深入地了解图像结构。此外,我们还引入了一种新的梯度信息处理步骤,利用局部梯度属性来指导处理过程。合成和真实SAR图像的实验结果表明,该方法在降低斑点噪声和保留图像特征方面是有效的。定量评估显示了图像质量的大幅提高,表明与当前最先进的方法相比,性能更优越。
{"title":"Integrated multi-channel approach for speckle noise reduction in SAR imagery using gradient, spatial, and frequency analysis","authors":"Anirban Saha,&nbsp;Harshit Singh,&nbsp;Suman Kumar Maji","doi":"10.1016/j.image.2025.117406","DOIUrl":"10.1016/j.image.2025.117406","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) imagery is inherently marred by speckle noise, which undermines image quality and complicates subsequent analytical endeavors. While numerous strategies have been suggested in existing literature to mitigate this unwanted noise, the challenge of eliminating speckle while conserving subtle structural and textural details inherent in the raw data remains unresolved. In this article, we propose a comprehensive approach combining multi-domain analysis with gradient information processing for SAR. Our method aims to effectively suppress speckle noise while retaining crucial image characteristics. By leveraging multi-domain analysis techniques, we exploit both spatial and frequency domain information to gain a deeper insight into image structures. Additionally, we introduce a novel gradient information processing step that utilizes local gradient attributes to guide the process. Experimental results obtained from synthetic and real SAR imagery illustrate the effectiveness of our approach in terms of speckle noise reduction and preservation of image features. Quantitative assessments demonstrate substantial enhancements in image quality, indicating superior performance compared to current state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117406"},"PeriodicalIF":2.7,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spatial features and weight adjusted loss infused Tiny YOLO for shadow detection 一个空间特征和重量调整损失注入微小的YOLO阴影检测
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-22 DOI: 10.1016/j.image.2025.117408
Akhil Kumar , R. Dhanalakshmi , R. Rajesh , R. Sendhil
Shadow detection in computer vision is challenging due to the difficulty in distinguishing shadows from similarly colored or dark objects. Variations in lighting, background textures, and object shapes further complicate accurate detection. This work introduces NS-YOLO, a novel Tiny YOLO variant designed for the specific task of shadow detection under varying conditions. The new architecture includes a small-scale feature extraction network improvised by global attention mechanism, multi-scale spatial attention, and a spatial pyramid pooling block, while preserving effective multi-scale contextual information. In addition, a weight-adjusted CIOU loss function is introduced for enhancing localization accuracy. The proposed architecture addresses shadow detection by effectively capturing both fine details and global context, helping distinguish shadows from similar dark regions. The enhanced loss function improves boundary localization, reducing false detections and improving accuracy. The NS-YOLO is trained end-to-end from scratch on the SBU and ISTD datasets. The experiments show that NS-YOLO achieves a detection accuracy (mAP) of 59.2 % while utilizing only 35.6 BFLOPs. In comparison with existing lightweight YOLO variants that is, Tiny YOLO and YOLO Nano models proposed between 2017–2025, NS-YOLO shows a relative mAP improvement of 2.5 - 50.1 %. These results highlight its efficiency and effectiveness and make it particularly suitable for deployment on resource-limited edge devices in real-time scenarios, e.g., video surveillance and advanced driver-assistance systems (ADAS).
由于难以从相似颜色或深色物体中区分阴影,因此阴影检测在计算机视觉中具有挑战性。光照、背景纹理和物体形状的变化进一步复杂化了准确的检测。这项工作介绍了NS-YOLO,一种新颖的微型YOLO变体,专为在不同条件下的阴影检测特定任务而设计。新架构包括基于全局注意机制的小尺度特征提取网络、多尺度空间注意和空间金字塔池块,同时保留有效的多尺度上下文信息。此外,为了提高定位精度,还引入了权重调整的CIOU损失函数。所提出的建筑通过有效地捕捉细节和全局背景来解决阴影检测问题,帮助从类似的黑暗区域区分阴影。增强的损失函数改进了边界定位,减少了误检,提高了精度。NS-YOLO在SBU和ISTD数据集上从头开始进行端到端训练。实验表明,NS-YOLO在仅利用35.6个BFLOPs的情况下,检测精度达到59.2%。与2017-2025年间提出的微型YOLO和纳米YOLO车型相比,NS-YOLO的相对mAP提高了2.5 - 50.1%。这些结果突出了其效率和有效性,使其特别适合部署在实时场景中资源有限的边缘设备上,例如视频监控和高级驾驶员辅助系统(ADAS)。
{"title":"A spatial features and weight adjusted loss infused Tiny YOLO for shadow detection","authors":"Akhil Kumar ,&nbsp;R. Dhanalakshmi ,&nbsp;R. Rajesh ,&nbsp;R. Sendhil","doi":"10.1016/j.image.2025.117408","DOIUrl":"10.1016/j.image.2025.117408","url":null,"abstract":"<div><div>Shadow detection in computer vision is challenging due to the difficulty in distinguishing shadows from similarly colored or dark objects. Variations in lighting, background textures, and object shapes further complicate accurate detection. This work introduces NS-YOLO, a novel Tiny YOLO variant designed for the specific task of shadow detection under varying conditions. The new architecture includes a small-scale feature extraction network improvised by global attention mechanism, multi-scale spatial attention, and a spatial pyramid pooling block, while preserving effective multi-scale contextual information. In addition, a weight-adjusted CIOU loss function is introduced for enhancing localization accuracy. The proposed architecture addresses shadow detection by effectively capturing both fine details and global context, helping distinguish shadows from similar dark regions. The enhanced loss function improves boundary localization, reducing false detections and improving accuracy. The NS-YOLO is trained end-to-end from scratch on the SBU and ISTD datasets. The experiments show that NS-YOLO achieves a detection accuracy (mAP) of 59.2 % while utilizing only 35.6 BFLOPs. In comparison with existing lightweight YOLO variants that is, Tiny YOLO and YOLO Nano models proposed between 2017–2025, NS-YOLO shows a relative mAP improvement of 2.5 - 50.1 %. These results highlight its efficiency and effectiveness and make it particularly suitable for deployment on resource-limited edge devices in real-time scenarios, e.g., video surveillance and advanced driver-assistance systems (ADAS).</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117408"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RotCLIP: Tuning CLIP with visual adapter and textual prompts for rotation robust remote sensing image classification RotCLIP:调整剪辑与视觉适配器和文本提示旋转鲁棒遥感图像分类
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-19 DOI: 10.1016/j.image.2025.117407
Tiecheng Song, Qi Liu, Anyong Qin, Yin Liu
In recent years, Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success in a range of visual tasks by aligning visual and textual features. However, it remains a challenge to improve the robustness of CLIP for rotated images, especially for remote sensing images (RSIs) where objects can present various orientations. In this paper, we propose a Rotation Robust CLIP model, termed RotCLIP, to achieve the rotation robust classification of RSIs with a visual adapter and dual textual prompts. Specifically, we first compute the original and rotated visual features through the image encoder of CLIP and the proposed Rotation Adapter (Rot-Adapter). Then, we explore dual textual prompts to compute the textual features which describe original and rotated visual features through the text encoder of CLIP. Based on this, we further build a rotation robust loss to limit the distance of the two visual features. Finally, by taking advantage of the powerful image-text alignment ability of CLIP, we build a global discriminative classification loss by combining the prediction results of both original and rotated image-text features. To verify the effect of our RotCLIP, we conduct experiments on three RSI datasets, including the EuroSAT dataset used for scene classification, and the NWPU-VHR-10 and RSOD datasets used for object classification. Experimental results show that the proposed RotCLIP improves the robustness of CLIP against image rotation, outperforming several state-of-the-art methods.
近年来,对比语言图像预训练(CLIP)通过对视觉特征和文本特征进行比对,在一系列视觉任务中取得了显著的成功。然而,提高CLIP对旋转图像的鲁棒性仍然是一个挑战,特别是对于物体可以呈现各种方向的遥感图像(rsi)。在本文中,我们提出了一个旋转鲁棒CLIP模型,称为RotCLIP,以实现旋转鲁棒分类的rsi与视觉适配器和双文本提示。具体来说,我们首先通过CLIP图像编码器和提出的旋转适配器(Rot-Adapter)计算原始和旋转的视觉特征。然后,我们探索了双文本提示,通过CLIP的文本编码器来计算描述原始和旋转视觉特征的文本特征。在此基础上,我们进一步建立了旋转鲁棒损失来限制两个视觉特征的距离。最后,利用CLIP强大的图像-文本对齐能力,结合原始和旋转图像-文本特征的预测结果,构建全局判别分类损失。为了验证RotCLIP的效果,我们在三个RSI数据集上进行了实验,包括用于场景分类的EuroSAT数据集,以及用于目标分类的NWPU-VHR-10和RSOD数据集。实验结果表明,提出的RotCLIP提高了CLIP对图像旋转的鲁棒性,优于几种最先进的方法。
{"title":"RotCLIP: Tuning CLIP with visual adapter and textual prompts for rotation robust remote sensing image classification","authors":"Tiecheng Song,&nbsp;Qi Liu,&nbsp;Anyong Qin,&nbsp;Yin Liu","doi":"10.1016/j.image.2025.117407","DOIUrl":"10.1016/j.image.2025.117407","url":null,"abstract":"<div><div>In recent years, Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success in a range of visual tasks by aligning visual and textual features. However, it remains a challenge to improve the robustness of CLIP for rotated images, especially for remote sensing images (RSIs) where objects can present various orientations. In this paper, we propose a Rotation Robust CLIP model, termed RotCLIP, to achieve the rotation robust classification of RSIs with a visual adapter and dual textual prompts. Specifically, we first compute the original and rotated visual features through the image encoder of CLIP and the proposed Rotation Adapter (Rot-Adapter). Then, we explore dual textual prompts to compute the textual features which describe original and rotated visual features through the text encoder of CLIP. Based on this, we further build a rotation robust loss to limit the distance of the two visual features. Finally, by taking advantage of the powerful image-text alignment ability of CLIP, we build a global discriminative classification loss by combining the prediction results of both original and rotated image-text features. To verify the effect of our RotCLIP, we conduct experiments on three RSI datasets, including the EuroSAT dataset used for scene classification, and the NWPU-VHR-10 and RSOD datasets used for object classification. Experimental results show that the proposed RotCLIP improves the robustness of CLIP against image rotation, outperforming several state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117407"},"PeriodicalIF":2.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust JPEG quantization step estimation method for image forensics 一种用于图像取证的稳健JPEG量化步长估计方法
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-15 DOI: 10.1016/j.image.2025.117402
Chothmal Kumawat , Vinod Pankajakshan
Estimating JPEG quantization step size from a JPEG image stored in a lossless format after the decompression (D-JPEG image) is a challenging problem in image forensics. The presence of forgery or additive noise in the D-JPEG image makes the quantization step estimation even more difficult. This paper proposes a novel quantization step estimation method robust to noise addition and forgery. First, we propose a statistical model for the subband DCT coefficients of forged and noisy D-JPEG images. We then show that the periodicity in the difference between the absolute values of rounded DCT coefficients in a subband of a D-JPEG image and those of the corresponding never-compressed image can be used for reliably estimating the JPEG quantization step. The proposed quantization step estimation method is based on this observation. Detailed experimental results reported in this paper demonstrate the robustness of the proposed method against noise addition and forgery. The experimental results also demonstrate that the quantization steps estimated using the proposed method can be used for localizing forgeries in D-JPEG images.
从解压缩后以无损格式存储的JPEG图像(D-JPEG图像)中估计JPEG量化步长是图像取证中的一个具有挑战性的问题。由于D-JPEG图像中存在伪造或附加噪声,使得量化步长估计更加困难。提出了一种新的抗噪声和伪造的量化步长估计方法。首先,我们提出了伪造和噪声D-JPEG图像子带DCT系数的统计模型。然后,我们证明了D-JPEG图像的子带中四舍五入DCT系数绝对值与相应的未压缩图像的绝对值之差的周期性可以用于可靠地估计JPEG量化步骤。提出的量化步长估计方法就是基于这一观察结果。详细的实验结果表明,该方法具有抗噪声和伪造的鲁棒性。实验结果还表明,利用该方法估计的量化步长可以用于D-JPEG图像的伪造定位。
{"title":"A robust JPEG quantization step estimation method for image forensics","authors":"Chothmal Kumawat ,&nbsp;Vinod Pankajakshan","doi":"10.1016/j.image.2025.117402","DOIUrl":"10.1016/j.image.2025.117402","url":null,"abstract":"<div><div>Estimating JPEG quantization step size from a JPEG image stored in a lossless format after the decompression (D-JPEG image) is a challenging problem in image forensics. The presence of forgery or additive noise in the D-JPEG image makes the quantization step estimation even more difficult. This paper proposes a novel quantization step estimation method robust to noise addition and forgery. First, we propose a statistical model for the subband DCT coefficients of forged and noisy D-JPEG images. We then show that the periodicity in the difference between the absolute values of rounded DCT coefficients in a subband of a D-JPEG image and those of the corresponding never-compressed image can be used for reliably estimating the JPEG quantization step. The proposed quantization step estimation method is based on this observation. Detailed experimental results reported in this paper demonstrate the robustness of the proposed method against noise addition and forgery. The experimental results also demonstrate that the quantization steps estimated using the proposed method can be used for localizing forgeries in D-JPEG images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117402"},"PeriodicalIF":2.7,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145120363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cut-FUNQUE: An objective quality model for compressed tone-mapped High Dynamic Range videos Cut-FUNQUE:用于压缩音调映射高动态范围视频的客观质量模型
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-13 DOI: 10.1016/j.image.2025.117405
Abhinau K. Venkataramanan , Cosmin Stejerean , Ioannis Katsavounidis , Hassene Tmar , Alan C. Bovik
High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As a result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.
近年来,高动态范围(HDR)视频因其比标准动态范围(SDR)视频具有更大的对比度和色彩范围而大受欢迎。尽管由于苹果iphone、谷歌Pixels和三星Galaxy手机等最近的旗舰手机,HDR视频捕捉越来越受欢迎,但仍有大量消费者使用无法显示HDR视频的传统SDR显示器。因此,HDR视频必须经过处理,即色调映射,然后才能传输给大部分支持sdr的视频消费者。然而,服务器端色调映射涉及到自动化有关色调映射操作符(tmo)及其参数选择的决策,以产生高保真输出。此外,这些选择必须与有损压缩的影响相平衡,有损压缩在流场景中无处不在。在这项工作中,我们开发了一种新的、有效的客观视频质量模型Cut-FUNQUE,它能够准确地预测色调映射和压缩HDR视频的视觉质量。最后,我们在此类视频的大规模众包数据库上评估Cut-FUNQUE,并表明它达到了最先进的精度。
{"title":"Cut-FUNQUE: An objective quality model for compressed tone-mapped High Dynamic Range videos","authors":"Abhinau K. Venkataramanan ,&nbsp;Cosmin Stejerean ,&nbsp;Ioannis Katsavounidis ,&nbsp;Hassene Tmar ,&nbsp;Alan C. Bovik","doi":"10.1016/j.image.2025.117405","DOIUrl":"10.1016/j.image.2025.117405","url":null,"abstract":"<div><div>High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As a result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117405"},"PeriodicalIF":2.7,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Redundant contextual feature suppression for pedestrian detection in dense scenes 基于冗余上下文特征抑制的密集场景行人检测
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-10 DOI: 10.1016/j.image.2025.117403
Jun Wang, Lei Wan, Xin Zhang, Xiaotian Cao
Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the 3×3 convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at https://github.com/davidsmithwj/CS-CS-RCNN.
行人检测是物体检测的重要分支之一,在自动驾驶、智能视频监控、客流统计等领域有着广泛的应用。然而,这些场景表现出高行人密度、严重遮挡和复杂的冗余上下文信息,导致当前通用目标检测器在应用于密集行人场景时检测精度低、误报率高等问题。在本文中,我们提出了一种改进的基于稀疏R-CNN的上下文抑制R-CNN方法,用于密集场景下的行人检测。首先,为了进一步增强网络在密集场景中提取深度特征的能力,我们将FPN网络与上下文转换块相结合,引入CoT-FPN骨干网。这个块取代了ResNet主干中的3×3卷积。其次,针对实例对象的冗余上下文特征会对密集场景中目标检测任务的定位和识别产生误导的问题,提出了冗余上下文特征抑制模块(RCFSM)。该模块基于卷积块注意机制,旨在抑制实例特征中冗余的上下文信息,从而提高网络在密集场景中的检测性能。在CrowdHuman数据集上的测试结果表明,与稀疏R-CNN算法相比,本文算法的平均精度(AP)提高了1.1%,Jaccard指数提高了1.2%,同时减少了模型参数的数量。代码可从https://github.com/davidsmithwj/CS-CS-RCNN获得。
{"title":"Redundant contextual feature suppression for pedestrian detection in dense scenes","authors":"Jun Wang,&nbsp;Lei Wan,&nbsp;Xin Zhang,&nbsp;Xiaotian Cao","doi":"10.1016/j.image.2025.117403","DOIUrl":"10.1016/j.image.2025.117403","url":null,"abstract":"<div><div>Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the <span><math><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow></math></span> convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at <span><span>https://github.com/davidsmithwj/CS-CS-RCNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117403"},"PeriodicalIF":2.7,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active contour model based on pre- additive bias field fitting image 基于预加性偏场拟合图像的主动轮廓模型
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-09 DOI: 10.1016/j.image.2025.117404
Yang Chen, Guirong Weng
With regards to figure with inhomogeneous intensity, the models based on active contour model have been widely used. Compared with the classic models, this paper proposes an optimized additive model which contains the edge structure and inhomogeneous components. Second, by introducing a novel clustering criterion, the value of the bias field can be estimated before iteration, greatly speeding the evloving process and reducing the calculation cost. Thus, an improved energy function is drawn out. Considering the gradient descent flow formula, a novel error function and adaptive parameter are utilized to improve the performance of the data term. Finally, the proposed regularization terms ensure the evloving process is more efficient and accurate. Owing to the above mentioned improvements, the proposed model in this paper has excellent performance of the segmentation in terms of robustness, effectiveness and accuracy.
对于强度不均匀的图形,基于活动轮廓模型的模型得到了广泛的应用。与经典模型相比,本文提出了一种包含边缘结构和非均匀成分的优化加性模型。其次,通过引入一种新的聚类准则,可以在迭代之前估计出偏差场的值,大大加快了进化过程,降低了计算成本;由此,提出了一种改进的能量函数。考虑梯度下降流公式,采用新的误差函数和自适应参数来提高数据项的性能。最后,提出的正则化项保证了演化过程的效率和准确性。由于上述改进,本文提出的模型在鲁棒性、有效性和准确性方面具有优异的分割性能。
{"title":"Active contour model based on pre- additive bias field fitting image","authors":"Yang Chen,&nbsp;Guirong Weng","doi":"10.1016/j.image.2025.117404","DOIUrl":"10.1016/j.image.2025.117404","url":null,"abstract":"<div><div>With regards to figure with inhomogeneous intensity, the models based on active contour model have been widely used. Compared with the classic models, this paper proposes an optimized additive model which contains the edge structure and inhomogeneous components. Second, by introducing a novel clustering criterion, the value of the bias field can be estimated before iteration, greatly speeding the evloving process and reducing the calculation cost. Thus, an improved energy function is drawn out. Considering the gradient descent flow formula, a novel error function and adaptive parameter are utilized to improve the performance of the data term. Finally, the proposed regularization terms ensure the evloving process is more efficient and accurate. Owing to the above mentioned improvements, the proposed model in this paper has excellent performance of the segmentation in terms of robustness, effectiveness and accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117404"},"PeriodicalIF":2.7,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145095840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NTRF-Net: A fuzzy logic-enhanced convolutional neural network for detecting hidden data in digital images NTRF-Net:一种用于检测数字图像中隐藏数据的模糊逻辑增强卷积神经网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-28 DOI: 10.1016/j.image.2025.117401
Ntivuguruzwa Jean De La Croix , Tohari Ahmad , Fengling Han , Royyana Muslim Ijtihadie
Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F1 Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.
隐写分析的最新进展集中在检测图像中的隐藏信息,但是在高级自适应隐写术中定位隐藏数据的可能位置仍然是一个关键的挑战,特别是对于在公共网络上共享的图像。本文介绍了一种新的隐写分析方法NTRF-Net,用于识别数字图像中被隐写改变的像素的位置。NTRF-Net关注图像的空间特征,在卷积神经网络中结合随机特征选择和模糊逻辑,通过修改图生成、特征分类和像素分类三个阶段进行工作。NTRF-Net显示出较高的准确率,准确率和F1分数分别达到98.2%和86.2%。ROC曲线和AUC值突出了所提出的NTRF-Net的强隐写改变识别能力,其优于现有基准。
{"title":"NTRF-Net: A fuzzy logic-enhanced convolutional neural network for detecting hidden data in digital images","authors":"Ntivuguruzwa Jean De La Croix ,&nbsp;Tohari Ahmad ,&nbsp;Fengling Han ,&nbsp;Royyana Muslim Ijtihadie","doi":"10.1016/j.image.2025.117401","DOIUrl":"10.1016/j.image.2025.117401","url":null,"abstract":"<div><div>Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F<sub>1</sub> Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117401"},"PeriodicalIF":2.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-domain joint deraining network for video rain streak removal 视频雨纹去除的三域联合训练网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-27 DOI: 10.1016/j.image.2025.117400
Wei Wu , Wenzhuo Zhai , Yong Liu , Xianbin Hu , Tailin Yang , Zhu Li
When shot outdoors in rainy weather, a rather complex and dynamic changed rain streak layer will have to be added to an original clean video, greatly degrading the performance of advanced outdoor vision systems. Currently, some excellent video deraining algorithms have been proposed and produce good results. However, these approaches neglect the joint analysis of relations in three important domains of videos, where it is widely known that video data certainly has intrinsic characteristics in temporal, spatial, and frequency domains, respectively. To address this issue, in the paper we propose a Three-domain Joint Deraining Network (TJDNet) for video rain streak removal. It composes of three network branches: temporal-spatial-frequency (TSF) branch, temporal-spatial (TS) branch, and spatial branch. In the proposed TJDNet, to capture spatial property for the current frame, is the common goal of these three branches. Moreover, we develop the TSF branch to specially pursue temporal-frequency relations between the wavelet subbands of the current frame and those of its adjacent frames. Furthermore, the TS branch is also designed to directly seize temporal correlations among successive frames. Finally, across-branch feature fusions are employed to propagate the features of one branch to enrich the information of another branch, further exploiting the characteristics of these three noteworthy domains. Compared with twenty-two state-of-the-art methods, experimental results show our proposed TJDNet achieves significantly better performance in both objective and subjective image qualities, particularly average PSNR increased by up to 2.10 dB. Our code will be available online at https://github.com/YanZhanggugu/TJDNet.
在雨天室外拍摄时,必须在原始的干净视频中添加一个相当复杂和动态变化的雨条纹层,这大大降低了先进的室外视觉系统的性能。目前,已经提出了一些优秀的视频脱轨算法,并取得了良好的效果。然而,这些方法忽略了对视频三个重要领域关系的联合分析,众所周知,视频数据在时间域、空间域和频域分别具有内在特征。为了解决这一问题,本文提出了一种用于视频雨纹去除的三域联合训练网络(TJDNet)。它由三个网络分支组成:时空频率分支(TSF)、时空分支(TS)和空间分支。在建议的tjnet中,捕获当前框架的空间属性是这三个分支的共同目标。此外,我们开发了TSF分支,专门研究当前帧的小波子带与其相邻帧的小波子带之间的时间-频率关系。此外,TS分支也被设计成直接捕捉连续帧之间的时间相关性。最后,利用跨分支特征融合传播一个分支的特征,以丰富另一个分支的信息,进一步利用这三个值得注意的领域的特征。实验结果表明,与22种最先进的方法相比,我们提出的TJDNet在客观和主观图像质量方面都取得了显着改善,特别是平均PSNR提高了2.10 dB。我们的代码将在https://github.com/YanZhanggugu/TJDNet上在线提供。
{"title":"Three-domain joint deraining network for video rain streak removal","authors":"Wei Wu ,&nbsp;Wenzhuo Zhai ,&nbsp;Yong Liu ,&nbsp;Xianbin Hu ,&nbsp;Tailin Yang ,&nbsp;Zhu Li","doi":"10.1016/j.image.2025.117400","DOIUrl":"10.1016/j.image.2025.117400","url":null,"abstract":"<div><div>When shot outdoors in rainy weather, a rather complex and dynamic changed rain streak layer will have to be added to an original clean video, greatly degrading the performance of advanced outdoor vision systems. Currently, some excellent video deraining algorithms have been proposed and produce good results. However, these approaches neglect the joint analysis of relations in three important domains of videos, where it is widely known that video data certainly has intrinsic characteristics in temporal, spatial, and frequency domains, respectively. To address this issue, in the paper we propose a Three-domain Joint Deraining Network (TJDNet) for video rain streak removal. It composes of three network branches: temporal-spatial-frequency (TSF) branch, temporal-spatial (TS) branch, and spatial branch. In the proposed TJDNet, to capture spatial property for the current frame, is the common goal of these three branches. Moreover, we develop the TSF branch to specially pursue temporal-frequency relations between the wavelet subbands of the current frame and those of its adjacent frames. Furthermore, the TS branch is also designed to directly seize temporal correlations among successive frames. Finally, across-branch feature fusions are employed to propagate the features of one branch to enrich the information of another branch, further exploiting the characteristics of these three noteworthy domains. Compared with twenty-two state-of-the-art methods, experimental results show our proposed TJDNet achieves significantly better performance in both objective and subjective image qualities, particularly average PSNR increased by up to 2.10 dB. Our code will be available online at <span><span>https://github.com/YanZhanggugu/TJDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117400"},"PeriodicalIF":2.7,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1