首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
Multi-exposure image enhancement and YOLO integration for nighttime pedestrian detection 夜间行人检测的多曝光图像增强和YOLO集成
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-26 DOI: 10.1016/j.image.2025.117421
Xiaobiao Dai , Junbo Lan , Zhigang Chen , Botao Wang , Xue Wen
This paper presents DCExYOLO, a novel method integrating multi-exposure image enhancement with YOLO object detection for real-time pedestrian detection in nighttime driving scenarios. To address the challenges of uneven illumination and low-light conditions in nighttime driving scenarios, we introduce an improved Zero-DCE++ algorithm to generate enhanced images at multiple exposure levels, which are then combined with the original image as input to the YOLO detector. The method significantly enhances the synergistic effect between image enhancement and object detection through the design of multi-task loss functions and a two-stage optimization strategy. Extensive experiments on multiple datasets demonstrate that DCExYOLO achieves an optimal balance between detection performance and efficiency, significantly reducing the log-average miss rate (MR-2) compared to the YOLO baseline. Therefore, this research validates the potential of multi-exposure enhancement in object detection under complex illumination environments, providing an efficient and reliable solution for intelligent driving and traffic safety, while establishing a foundation for future optimization of detection technologies in complex scenarios.
本文提出了一种将多曝光图像增强与YOLO目标检测相结合的夜间行车场景行人实时检测新方法DCExYOLO。为了解决夜间驾驶场景中光照不均匀和光线不足的问题,我们引入了一种改进的zero - dce++算法,以在多个曝光水平下生成增强图像,然后将其与原始图像相结合,作为YOLO检测器的输入。该方法通过设计多任务损失函数和两阶段优化策略,显著增强了图像增强与目标检测之间的协同效应。在多个数据集上的大量实验表明,DCExYOLO实现了检测性能和效率之间的最佳平衡,与YOLO基线相比,显著降低了日志平均缺失率(MR-2)。因此,本研究验证了多曝光增强在复杂照明环境下的目标检测潜力,为智能驾驶和交通安全提供了高效可靠的解决方案,同时也为未来复杂场景下检测技术的优化奠定了基础。
{"title":"Multi-exposure image enhancement and YOLO integration for nighttime pedestrian detection","authors":"Xiaobiao Dai ,&nbsp;Junbo Lan ,&nbsp;Zhigang Chen ,&nbsp;Botao Wang ,&nbsp;Xue Wen","doi":"10.1016/j.image.2025.117421","DOIUrl":"10.1016/j.image.2025.117421","url":null,"abstract":"<div><div>This paper presents DCE<em>x</em>YOLO, a novel method integrating multi-exposure image enhancement with YOLO object detection for real-time pedestrian detection in nighttime driving scenarios. To address the challenges of uneven illumination and low-light conditions in nighttime driving scenarios, we introduce an improved Zero-DCE++ algorithm to generate enhanced images at multiple exposure levels, which are then combined with the original image as input to the YOLO detector. The method significantly enhances the synergistic effect between image enhancement and object detection through the design of multi-task loss functions and a two-stage optimization strategy. Extensive experiments on multiple datasets demonstrate that DCE<em>x</em>YOLO achieves an optimal balance between detection performance and efficiency, significantly reducing the log-average miss rate (MR<sup>-2</sup>) compared to the YOLO baseline. Therefore, this research validates the potential of multi-exposure enhancement in object detection under complex illumination environments, providing an efficient and reliable solution for intelligent driving and traffic safety, while establishing a foundation for future optimization of detection technologies in complex scenarios.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117421"},"PeriodicalIF":2.7,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global and local collaborative learning for no-reference omnidirectional image quality assessment 基于全局和局部协同学习的无参考全方位图像质量评估
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-26 DOI: 10.1016/j.image.2025.117409
Deyang Liu , Lifei Wan , Xiaolin Zhang , Xiaofei Zhou , Caifeng Shan
Omnidirectional image (OI) has achieved tremendous success in virtual reality applications. With the continuous increase in network bandwidth, users can access massive OIs from the internet. It is crucial to evaluate the visual quality of distorted OIs to ensure a high-quality immersive experience for users. For most existing viewport based OI quality assessment(OIQA) methods, the inconsistent distortions in each viewport are always overlooked. Moreover, the loss of texture details brought by viewport downsampling procedure also limits the assessment performance. In order to address these challenges, this paper proposes a global-and-local collaborative learning method for no-reference OIQA. We adopt a dual-level learning architecture to collaboratively explore the non-uniform distortions and learn a sparse representation of each projected viewport. Specifically, we extract the hierarchical features from each viewport to align with the hierarchical perceptual progress of the human visual system (HVS). By aggregating with a Transformer encoder, the inconsistent spatial features in each viewport can be globally mined. To preserve more texture details during viewport downsampling process, we introduce a learnable patch selection paradigm. By learning the position preferences of local texture variations in each viewport, our method can derive a set of sparse image patches to sparsely represent the downsampled viewport. Comprehensive experiments illustrate the superiority of the proposed method on three publicly available databases. The code is available at https://github.com/ldyorchid/GLCNet-OIQA.
全向图像技术在虚拟现实应用中取得了巨大的成功。随着网络带宽的不断增加,用户可以从互联网上访问海量的io。为了确保用户获得高质量的沉浸式体验,评估扭曲的视觉质量至关重要。对于大多数现有的基于视口的OI质量评估(OIQA)方法,每个视口中的不一致扭曲总是被忽略。此外,视口下采样带来的纹理细节丢失也限制了评估性能。为了解决这些问题,本文提出了一种无参考OIQA的全局-局部协同学习方法。我们采用双层学习架构来协作探索非均匀扭曲,并学习每个投影视口的稀疏表示。具体来说,我们从每个视口提取分层特征,以与人类视觉系统(HVS)的分层感知过程保持一致。通过与Transformer编码器的聚合,可以对每个视口中不一致的空间特征进行全局挖掘。为了在视口下采样过程中保留更多的纹理细节,我们引入了一个可学习的补丁选择范例。通过学习每个视口中局部纹理变化的位置偏好,我们的方法可以得到一组稀疏的图像补丁来稀疏地表示下采样的视口。综合实验证明了该方法在三个公开数据库上的优越性。代码可在https://github.com/ldyorchid/GLCNet-OIQA上获得。
{"title":"Global and local collaborative learning for no-reference omnidirectional image quality assessment","authors":"Deyang Liu ,&nbsp;Lifei Wan ,&nbsp;Xiaolin Zhang ,&nbsp;Xiaofei Zhou ,&nbsp;Caifeng Shan","doi":"10.1016/j.image.2025.117409","DOIUrl":"10.1016/j.image.2025.117409","url":null,"abstract":"<div><div>Omnidirectional image (OI) has achieved tremendous success in virtual reality applications. With the continuous increase in network bandwidth, users can access massive OIs from the internet. It is crucial to evaluate the visual quality of distorted OIs to ensure a high-quality immersive experience for users. For most existing viewport based OI quality assessment(OIQA) methods, the inconsistent distortions in each viewport are always overlooked. Moreover, the loss of texture details brought by viewport downsampling procedure also limits the assessment performance. In order to address these challenges, this paper proposes a global-and-local collaborative learning method for no-reference OIQA. We adopt a dual-level learning architecture to collaboratively explore the non-uniform distortions and learn a sparse representation of each projected viewport. Specifically, we extract the hierarchical features from each viewport to align with the hierarchical perceptual progress of the human visual system (HVS). By aggregating with a Transformer encoder, the inconsistent spatial features in each viewport can be globally mined. To preserve more texture details during viewport downsampling process, we introduce a learnable patch selection paradigm. By learning the position preferences of local texture variations in each viewport, our method can derive a set of sparse image patches to sparsely represent the downsampled viewport. Comprehensive experiments illustrate the superiority of the proposed method on three publicly available databases. The code is available at <span><span>https://github.com/ldyorchid/GLCNet-OIQA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117409"},"PeriodicalIF":2.7,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video and text semantic center alignment for text-video cross-modal retrieval 文本-视频跨模态检索的视频和文本语义中心对齐
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-25 DOI: 10.1016/j.image.2025.117413
Ming Jin , Huaxiang Zhang , Lei Zhu , Jiande Sun , Li Liu
With the proliferation of video on the Internet, users demand higher precision and efficiency of retrieval technology. The current cross-modal retrieval technology mainly has the following problems: firstly, there is no effective alignment of the same semantic objects between video and text. Secondly, the existing neural networks destroy the spatial features of the video when establishing the temporal features of the video. Finally, the extraction and processing of the text’s local features are too complex, which increases the network complexity. To address the existing problems, we proposed a text-video semantic center alignment network. First, a semantic center alignment module was constructed to promote the alignment of semantic features of the same object across different modalities. Second, a pre-trained BERT based on a residual structure was designed to protect spatial information when inferring temporal information. Finally, the “jieba” library was employed to extract the local key information of the text, thereby simplifying the complexity of local feature extraction. The effectiveness of the network structure was evaluated on the MSVD, MSR-VTT, and DiDeMo datasets.
随着互联网上视频的激增,用户对检索技术的精度和效率提出了更高的要求。目前的跨模态检索技术主要存在以下问题:首先,视频和文本之间没有对相同的语义对象进行有效的对齐。其次,现有的神经网络在建立视频的时间特征时破坏了视频的空间特征。最后,文本局部特征的提取和处理过于复杂,增加了网络的复杂度。针对存在的问题,我们提出了一种文本-视频语义中心对齐网络。首先,构建语义中心对齐模块,促进同一对象在不同模态上的语义特征对齐;其次,设计了基于残差结构的预训练BERT,在推断时间信息时保护空间信息;最后,利用“jieba”库提取文本的局部关键信息,从而简化了局部特征提取的复杂度。在MSVD、MSR-VTT和DiDeMo数据集上评估了网络结构的有效性。
{"title":"Video and text semantic center alignment for text-video cross-modal retrieval","authors":"Ming Jin ,&nbsp;Huaxiang Zhang ,&nbsp;Lei Zhu ,&nbsp;Jiande Sun ,&nbsp;Li Liu","doi":"10.1016/j.image.2025.117413","DOIUrl":"10.1016/j.image.2025.117413","url":null,"abstract":"<div><div>With the proliferation of video on the Internet, users demand higher precision and efficiency of retrieval technology. The current cross-modal retrieval technology mainly has the following problems: firstly, there is no effective alignment of the same semantic objects between video and text. Secondly, the existing neural networks destroy the spatial features of the video when establishing the temporal features of the video. Finally, the extraction and processing of the text’s local features are too complex, which increases the network complexity. To address the existing problems, we proposed a text-video semantic center alignment network. First, a semantic center alignment module was constructed to promote the alignment of semantic features of the same object across different modalities. Second, a pre-trained BERT based on a residual structure was designed to protect spatial information when inferring temporal information. Finally, the “jieba” library was employed to extract the local key information of the text, thereby simplifying the complexity of local feature extraction. The effectiveness of the network structure was evaluated on the MSVD, MSR-VTT, and DiDeMo datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117413"},"PeriodicalIF":2.7,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated multi-channel approach for speckle noise reduction in SAR imagery using gradient, spatial, and frequency analysis 基于梯度、空间和频率分析的综合多通道SAR图像散斑降噪方法
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-23 DOI: 10.1016/j.image.2025.117406
Anirban Saha, Harshit Singh, Suman Kumar Maji
Synthetic Aperture Radar (SAR) imagery is inherently marred by speckle noise, which undermines image quality and complicates subsequent analytical endeavors. While numerous strategies have been suggested in existing literature to mitigate this unwanted noise, the challenge of eliminating speckle while conserving subtle structural and textural details inherent in the raw data remains unresolved. In this article, we propose a comprehensive approach combining multi-domain analysis with gradient information processing for SAR. Our method aims to effectively suppress speckle noise while retaining crucial image characteristics. By leveraging multi-domain analysis techniques, we exploit both spatial and frequency domain information to gain a deeper insight into image structures. Additionally, we introduce a novel gradient information processing step that utilizes local gradient attributes to guide the process. Experimental results obtained from synthetic and real SAR imagery illustrate the effectiveness of our approach in terms of speckle noise reduction and preservation of image features. Quantitative assessments demonstrate substantial enhancements in image quality, indicating superior performance compared to current state-of-the-art methods.
合成孔径雷达(SAR)图像本身就受到散斑噪声的影响,这会破坏图像质量,并使后续的分析工作复杂化。虽然现有文献中提出了许多策略来减轻这种不必要的噪声,但在保留原始数据中固有的微妙结构和纹理细节的同时消除斑点的挑战仍然没有解决。在本文中,我们提出了一种将多域分析与梯度信息处理相结合的SAR综合方法,该方法旨在有效地抑制散斑噪声,同时保留关键的图像特征。通过利用多域分析技术,我们利用空间和频域信息来更深入地了解图像结构。此外,我们还引入了一种新的梯度信息处理步骤,利用局部梯度属性来指导处理过程。合成和真实SAR图像的实验结果表明,该方法在降低斑点噪声和保留图像特征方面是有效的。定量评估显示了图像质量的大幅提高,表明与当前最先进的方法相比,性能更优越。
{"title":"Integrated multi-channel approach for speckle noise reduction in SAR imagery using gradient, spatial, and frequency analysis","authors":"Anirban Saha,&nbsp;Harshit Singh,&nbsp;Suman Kumar Maji","doi":"10.1016/j.image.2025.117406","DOIUrl":"10.1016/j.image.2025.117406","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) imagery is inherently marred by speckle noise, which undermines image quality and complicates subsequent analytical endeavors. While numerous strategies have been suggested in existing literature to mitigate this unwanted noise, the challenge of eliminating speckle while conserving subtle structural and textural details inherent in the raw data remains unresolved. In this article, we propose a comprehensive approach combining multi-domain analysis with gradient information processing for SAR. Our method aims to effectively suppress speckle noise while retaining crucial image characteristics. By leveraging multi-domain analysis techniques, we exploit both spatial and frequency domain information to gain a deeper insight into image structures. Additionally, we introduce a novel gradient information processing step that utilizes local gradient attributes to guide the process. Experimental results obtained from synthetic and real SAR imagery illustrate the effectiveness of our approach in terms of speckle noise reduction and preservation of image features. Quantitative assessments demonstrate substantial enhancements in image quality, indicating superior performance compared to current state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117406"},"PeriodicalIF":2.7,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spatial features and weight adjusted loss infused Tiny YOLO for shadow detection 一个空间特征和重量调整损失注入微小的YOLO阴影检测
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-22 DOI: 10.1016/j.image.2025.117408
Akhil Kumar , R. Dhanalakshmi , R. Rajesh , R. Sendhil
Shadow detection in computer vision is challenging due to the difficulty in distinguishing shadows from similarly colored or dark objects. Variations in lighting, background textures, and object shapes further complicate accurate detection. This work introduces NS-YOLO, a novel Tiny YOLO variant designed for the specific task of shadow detection under varying conditions. The new architecture includes a small-scale feature extraction network improvised by global attention mechanism, multi-scale spatial attention, and a spatial pyramid pooling block, while preserving effective multi-scale contextual information. In addition, a weight-adjusted CIOU loss function is introduced for enhancing localization accuracy. The proposed architecture addresses shadow detection by effectively capturing both fine details and global context, helping distinguish shadows from similar dark regions. The enhanced loss function improves boundary localization, reducing false detections and improving accuracy. The NS-YOLO is trained end-to-end from scratch on the SBU and ISTD datasets. The experiments show that NS-YOLO achieves a detection accuracy (mAP) of 59.2 % while utilizing only 35.6 BFLOPs. In comparison with existing lightweight YOLO variants that is, Tiny YOLO and YOLO Nano models proposed between 2017–2025, NS-YOLO shows a relative mAP improvement of 2.5 - 50.1 %. These results highlight its efficiency and effectiveness and make it particularly suitable for deployment on resource-limited edge devices in real-time scenarios, e.g., video surveillance and advanced driver-assistance systems (ADAS).
由于难以从相似颜色或深色物体中区分阴影,因此阴影检测在计算机视觉中具有挑战性。光照、背景纹理和物体形状的变化进一步复杂化了准确的检测。这项工作介绍了NS-YOLO,一种新颖的微型YOLO变体,专为在不同条件下的阴影检测特定任务而设计。新架构包括基于全局注意机制的小尺度特征提取网络、多尺度空间注意和空间金字塔池块,同时保留有效的多尺度上下文信息。此外,为了提高定位精度,还引入了权重调整的CIOU损失函数。所提出的建筑通过有效地捕捉细节和全局背景来解决阴影检测问题,帮助从类似的黑暗区域区分阴影。增强的损失函数改进了边界定位,减少了误检,提高了精度。NS-YOLO在SBU和ISTD数据集上从头开始进行端到端训练。实验表明,NS-YOLO在仅利用35.6个BFLOPs的情况下,检测精度达到59.2%。与2017-2025年间提出的微型YOLO和纳米YOLO车型相比,NS-YOLO的相对mAP提高了2.5 - 50.1%。这些结果突出了其效率和有效性,使其特别适合部署在实时场景中资源有限的边缘设备上,例如视频监控和高级驾驶员辅助系统(ADAS)。
{"title":"A spatial features and weight adjusted loss infused Tiny YOLO for shadow detection","authors":"Akhil Kumar ,&nbsp;R. Dhanalakshmi ,&nbsp;R. Rajesh ,&nbsp;R. Sendhil","doi":"10.1016/j.image.2025.117408","DOIUrl":"10.1016/j.image.2025.117408","url":null,"abstract":"<div><div>Shadow detection in computer vision is challenging due to the difficulty in distinguishing shadows from similarly colored or dark objects. Variations in lighting, background textures, and object shapes further complicate accurate detection. This work introduces NS-YOLO, a novel Tiny YOLO variant designed for the specific task of shadow detection under varying conditions. The new architecture includes a small-scale feature extraction network improvised by global attention mechanism, multi-scale spatial attention, and a spatial pyramid pooling block, while preserving effective multi-scale contextual information. In addition, a weight-adjusted CIOU loss function is introduced for enhancing localization accuracy. The proposed architecture addresses shadow detection by effectively capturing both fine details and global context, helping distinguish shadows from similar dark regions. The enhanced loss function improves boundary localization, reducing false detections and improving accuracy. The NS-YOLO is trained end-to-end from scratch on the SBU and ISTD datasets. The experiments show that NS-YOLO achieves a detection accuracy (mAP) of 59.2 % while utilizing only 35.6 BFLOPs. In comparison with existing lightweight YOLO variants that is, Tiny YOLO and YOLO Nano models proposed between 2017–2025, NS-YOLO shows a relative mAP improvement of 2.5 - 50.1 %. These results highlight its efficiency and effectiveness and make it particularly suitable for deployment on resource-limited edge devices in real-time scenarios, e.g., video surveillance and advanced driver-assistance systems (ADAS).</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117408"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RotCLIP: Tuning CLIP with visual adapter and textual prompts for rotation robust remote sensing image classification RotCLIP:调整剪辑与视觉适配器和文本提示旋转鲁棒遥感图像分类
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-19 DOI: 10.1016/j.image.2025.117407
Tiecheng Song, Qi Liu, Anyong Qin, Yin Liu
In recent years, Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success in a range of visual tasks by aligning visual and textual features. However, it remains a challenge to improve the robustness of CLIP for rotated images, especially for remote sensing images (RSIs) where objects can present various orientations. In this paper, we propose a Rotation Robust CLIP model, termed RotCLIP, to achieve the rotation robust classification of RSIs with a visual adapter and dual textual prompts. Specifically, we first compute the original and rotated visual features through the image encoder of CLIP and the proposed Rotation Adapter (Rot-Adapter). Then, we explore dual textual prompts to compute the textual features which describe original and rotated visual features through the text encoder of CLIP. Based on this, we further build a rotation robust loss to limit the distance of the two visual features. Finally, by taking advantage of the powerful image-text alignment ability of CLIP, we build a global discriminative classification loss by combining the prediction results of both original and rotated image-text features. To verify the effect of our RotCLIP, we conduct experiments on three RSI datasets, including the EuroSAT dataset used for scene classification, and the NWPU-VHR-10 and RSOD datasets used for object classification. Experimental results show that the proposed RotCLIP improves the robustness of CLIP against image rotation, outperforming several state-of-the-art methods.
近年来,对比语言图像预训练(CLIP)通过对视觉特征和文本特征进行比对,在一系列视觉任务中取得了显著的成功。然而,提高CLIP对旋转图像的鲁棒性仍然是一个挑战,特别是对于物体可以呈现各种方向的遥感图像(rsi)。在本文中,我们提出了一个旋转鲁棒CLIP模型,称为RotCLIP,以实现旋转鲁棒分类的rsi与视觉适配器和双文本提示。具体来说,我们首先通过CLIP图像编码器和提出的旋转适配器(Rot-Adapter)计算原始和旋转的视觉特征。然后,我们探索了双文本提示,通过CLIP的文本编码器来计算描述原始和旋转视觉特征的文本特征。在此基础上,我们进一步建立了旋转鲁棒损失来限制两个视觉特征的距离。最后,利用CLIP强大的图像-文本对齐能力,结合原始和旋转图像-文本特征的预测结果,构建全局判别分类损失。为了验证RotCLIP的效果,我们在三个RSI数据集上进行了实验,包括用于场景分类的EuroSAT数据集,以及用于目标分类的NWPU-VHR-10和RSOD数据集。实验结果表明,提出的RotCLIP提高了CLIP对图像旋转的鲁棒性,优于几种最先进的方法。
{"title":"RotCLIP: Tuning CLIP with visual adapter and textual prompts for rotation robust remote sensing image classification","authors":"Tiecheng Song,&nbsp;Qi Liu,&nbsp;Anyong Qin,&nbsp;Yin Liu","doi":"10.1016/j.image.2025.117407","DOIUrl":"10.1016/j.image.2025.117407","url":null,"abstract":"<div><div>In recent years, Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success in a range of visual tasks by aligning visual and textual features. However, it remains a challenge to improve the robustness of CLIP for rotated images, especially for remote sensing images (RSIs) where objects can present various orientations. In this paper, we propose a Rotation Robust CLIP model, termed RotCLIP, to achieve the rotation robust classification of RSIs with a visual adapter and dual textual prompts. Specifically, we first compute the original and rotated visual features through the image encoder of CLIP and the proposed Rotation Adapter (Rot-Adapter). Then, we explore dual textual prompts to compute the textual features which describe original and rotated visual features through the text encoder of CLIP. Based on this, we further build a rotation robust loss to limit the distance of the two visual features. Finally, by taking advantage of the powerful image-text alignment ability of CLIP, we build a global discriminative classification loss by combining the prediction results of both original and rotated image-text features. To verify the effect of our RotCLIP, we conduct experiments on three RSI datasets, including the EuroSAT dataset used for scene classification, and the NWPU-VHR-10 and RSOD datasets used for object classification. Experimental results show that the proposed RotCLIP improves the robustness of CLIP against image rotation, outperforming several state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117407"},"PeriodicalIF":2.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust JPEG quantization step estimation method for image forensics 一种用于图像取证的稳健JPEG量化步长估计方法
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-15 DOI: 10.1016/j.image.2025.117402
Chothmal Kumawat , Vinod Pankajakshan
Estimating JPEG quantization step size from a JPEG image stored in a lossless format after the decompression (D-JPEG image) is a challenging problem in image forensics. The presence of forgery or additive noise in the D-JPEG image makes the quantization step estimation even more difficult. This paper proposes a novel quantization step estimation method robust to noise addition and forgery. First, we propose a statistical model for the subband DCT coefficients of forged and noisy D-JPEG images. We then show that the periodicity in the difference between the absolute values of rounded DCT coefficients in a subband of a D-JPEG image and those of the corresponding never-compressed image can be used for reliably estimating the JPEG quantization step. The proposed quantization step estimation method is based on this observation. Detailed experimental results reported in this paper demonstrate the robustness of the proposed method against noise addition and forgery. The experimental results also demonstrate that the quantization steps estimated using the proposed method can be used for localizing forgeries in D-JPEG images.
从解压缩后以无损格式存储的JPEG图像(D-JPEG图像)中估计JPEG量化步长是图像取证中的一个具有挑战性的问题。由于D-JPEG图像中存在伪造或附加噪声,使得量化步长估计更加困难。提出了一种新的抗噪声和伪造的量化步长估计方法。首先,我们提出了伪造和噪声D-JPEG图像子带DCT系数的统计模型。然后,我们证明了D-JPEG图像的子带中四舍五入DCT系数绝对值与相应的未压缩图像的绝对值之差的周期性可以用于可靠地估计JPEG量化步骤。提出的量化步长估计方法就是基于这一观察结果。详细的实验结果表明,该方法具有抗噪声和伪造的鲁棒性。实验结果还表明,利用该方法估计的量化步长可以用于D-JPEG图像的伪造定位。
{"title":"A robust JPEG quantization step estimation method for image forensics","authors":"Chothmal Kumawat ,&nbsp;Vinod Pankajakshan","doi":"10.1016/j.image.2025.117402","DOIUrl":"10.1016/j.image.2025.117402","url":null,"abstract":"<div><div>Estimating JPEG quantization step size from a JPEG image stored in a lossless format after the decompression (D-JPEG image) is a challenging problem in image forensics. The presence of forgery or additive noise in the D-JPEG image makes the quantization step estimation even more difficult. This paper proposes a novel quantization step estimation method robust to noise addition and forgery. First, we propose a statistical model for the subband DCT coefficients of forged and noisy D-JPEG images. We then show that the periodicity in the difference between the absolute values of rounded DCT coefficients in a subband of a D-JPEG image and those of the corresponding never-compressed image can be used for reliably estimating the JPEG quantization step. The proposed quantization step estimation method is based on this observation. Detailed experimental results reported in this paper demonstrate the robustness of the proposed method against noise addition and forgery. The experimental results also demonstrate that the quantization steps estimated using the proposed method can be used for localizing forgeries in D-JPEG images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117402"},"PeriodicalIF":2.7,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145120363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cut-FUNQUE: An objective quality model for compressed tone-mapped High Dynamic Range videos Cut-FUNQUE:用于压缩音调映射高动态范围视频的客观质量模型
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-13 DOI: 10.1016/j.image.2025.117405
Abhinau K. Venkataramanan , Cosmin Stejerean , Ioannis Katsavounidis , Hassene Tmar , Alan C. Bovik
High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As a result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.
近年来,高动态范围(HDR)视频因其比标准动态范围(SDR)视频具有更大的对比度和色彩范围而大受欢迎。尽管由于苹果iphone、谷歌Pixels和三星Galaxy手机等最近的旗舰手机,HDR视频捕捉越来越受欢迎,但仍有大量消费者使用无法显示HDR视频的传统SDR显示器。因此,HDR视频必须经过处理,即色调映射,然后才能传输给大部分支持sdr的视频消费者。然而,服务器端色调映射涉及到自动化有关色调映射操作符(tmo)及其参数选择的决策,以产生高保真输出。此外,这些选择必须与有损压缩的影响相平衡,有损压缩在流场景中无处不在。在这项工作中,我们开发了一种新的、有效的客观视频质量模型Cut-FUNQUE,它能够准确地预测色调映射和压缩HDR视频的视觉质量。最后,我们在此类视频的大规模众包数据库上评估Cut-FUNQUE,并表明它达到了最先进的精度。
{"title":"Cut-FUNQUE: An objective quality model for compressed tone-mapped High Dynamic Range videos","authors":"Abhinau K. Venkataramanan ,&nbsp;Cosmin Stejerean ,&nbsp;Ioannis Katsavounidis ,&nbsp;Hassene Tmar ,&nbsp;Alan C. Bovik","doi":"10.1016/j.image.2025.117405","DOIUrl":"10.1016/j.image.2025.117405","url":null,"abstract":"<div><div>High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As a result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117405"},"PeriodicalIF":2.7,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Redundant contextual feature suppression for pedestrian detection in dense scenes 基于冗余上下文特征抑制的密集场景行人检测
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-10 DOI: 10.1016/j.image.2025.117403
Jun Wang, Lei Wan, Xin Zhang, Xiaotian Cao
Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the 3×3 convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at https://github.com/davidsmithwj/CS-CS-RCNN.
行人检测是物体检测的重要分支之一,在自动驾驶、智能视频监控、客流统计等领域有着广泛的应用。然而,这些场景表现出高行人密度、严重遮挡和复杂的冗余上下文信息,导致当前通用目标检测器在应用于密集行人场景时检测精度低、误报率高等问题。在本文中,我们提出了一种改进的基于稀疏R-CNN的上下文抑制R-CNN方法,用于密集场景下的行人检测。首先,为了进一步增强网络在密集场景中提取深度特征的能力,我们将FPN网络与上下文转换块相结合,引入CoT-FPN骨干网。这个块取代了ResNet主干中的3×3卷积。其次,针对实例对象的冗余上下文特征会对密集场景中目标检测任务的定位和识别产生误导的问题,提出了冗余上下文特征抑制模块(RCFSM)。该模块基于卷积块注意机制,旨在抑制实例特征中冗余的上下文信息,从而提高网络在密集场景中的检测性能。在CrowdHuman数据集上的测试结果表明,与稀疏R-CNN算法相比,本文算法的平均精度(AP)提高了1.1%,Jaccard指数提高了1.2%,同时减少了模型参数的数量。代码可从https://github.com/davidsmithwj/CS-CS-RCNN获得。
{"title":"Redundant contextual feature suppression for pedestrian detection in dense scenes","authors":"Jun Wang,&nbsp;Lei Wan,&nbsp;Xin Zhang,&nbsp;Xiaotian Cao","doi":"10.1016/j.image.2025.117403","DOIUrl":"10.1016/j.image.2025.117403","url":null,"abstract":"<div><div>Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the <span><math><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow></math></span> convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at <span><span>https://github.com/davidsmithwj/CS-CS-RCNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117403"},"PeriodicalIF":2.7,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active contour model based on pre- additive bias field fitting image 基于预加性偏场拟合图像的主动轮廓模型
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-09 DOI: 10.1016/j.image.2025.117404
Yang Chen, Guirong Weng
With regards to figure with inhomogeneous intensity, the models based on active contour model have been widely used. Compared with the classic models, this paper proposes an optimized additive model which contains the edge structure and inhomogeneous components. Second, by introducing a novel clustering criterion, the value of the bias field can be estimated before iteration, greatly speeding the evloving process and reducing the calculation cost. Thus, an improved energy function is drawn out. Considering the gradient descent flow formula, a novel error function and adaptive parameter are utilized to improve the performance of the data term. Finally, the proposed regularization terms ensure the evloving process is more efficient and accurate. Owing to the above mentioned improvements, the proposed model in this paper has excellent performance of the segmentation in terms of robustness, effectiveness and accuracy.
对于强度不均匀的图形,基于活动轮廓模型的模型得到了广泛的应用。与经典模型相比,本文提出了一种包含边缘结构和非均匀成分的优化加性模型。其次,通过引入一种新的聚类准则,可以在迭代之前估计出偏差场的值,大大加快了进化过程,降低了计算成本;由此,提出了一种改进的能量函数。考虑梯度下降流公式,采用新的误差函数和自适应参数来提高数据项的性能。最后,提出的正则化项保证了演化过程的效率和准确性。由于上述改进,本文提出的模型在鲁棒性、有效性和准确性方面具有优异的分割性能。
{"title":"Active contour model based on pre- additive bias field fitting image","authors":"Yang Chen,&nbsp;Guirong Weng","doi":"10.1016/j.image.2025.117404","DOIUrl":"10.1016/j.image.2025.117404","url":null,"abstract":"<div><div>With regards to figure with inhomogeneous intensity, the models based on active contour model have been widely used. Compared with the classic models, this paper proposes an optimized additive model which contains the edge structure and inhomogeneous components. Second, by introducing a novel clustering criterion, the value of the bias field can be estimated before iteration, greatly speeding the evloving process and reducing the calculation cost. Thus, an improved energy function is drawn out. Considering the gradient descent flow formula, a novel error function and adaptive parameter are utilized to improve the performance of the data term. Finally, the proposed regularization terms ensure the evloving process is more efficient and accurate. Owing to the above mentioned improvements, the proposed model in this paper has excellent performance of the segmentation in terms of robustness, effectiveness and accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117404"},"PeriodicalIF":2.7,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145095840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1