首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Rethinking Artifact Mitigation in HDR Reconstruction: From Detection to Optimization 重新思考HDR重建中的伪影缓解:从检测到优化
IF 13.7 Pub Date : 2025-12-17 DOI: 10.1109/TIP.2025.3642557
Xinyue Li;Zhangkai Ni;Hang Wu;Wenhan Yang;Hanli Wang;Lianghua He;Sam Kwong
Artifact remains a long-standing challenge in High Dynamic Range (HDR) reconstruction. Existing methods focus on model designs for artifact mitigation but ignore explicit detection and suppression strategies. Because artifact lacks clear boundaries, distinct shapes, and semantic consistency, and there is no existing dedicated dataset for HDR artifact, progress in direct artifact detection and recovery is impeded. To bridge the gap, we propose a unified HDR reconstruction framework that integrates artifact detection and model optimization. Firstly, we build the first HDR artifact dataset (HADataset), comprising 1,213 diverse multi-exposure Low Dynamic Range (LDR) image sets and 1,765 HDR image pairs with per-pixel artifact annotations. Secondly, we develop an effective HDR artifact detector (HADetector), a robust artifact detection model capable of accurately localizing HDR reconstruction artifact. HADetector plays two pivotal roles: (1) enhancing existing HDR reconstruction models through fine-tuning, and (2) serving as a non-reference image quality assessment (NR-IQA) metric, the Artifact Score (AS), which aligns closely with human visual perception for reliable quality evaluation. Extensive experiments validate the effectiveness and generalizability of our framework, including the HADataset, HADetector, fine-tuning paradigm, and AS metric. The code and datasets are available at: https://github.com/xinyueliii/hdr-artifact-detect-optimize
在高动态范围(HDR)重建中,伪影是一个长期存在的挑战。现有的方法侧重于人为干扰缓解的模型设计,而忽略了明确的检测和抑制策略。由于工件缺乏明确的边界、不同的形状和语义一致性,并且没有现有的HDR工件专用数据集,阻碍了直接工件检测和恢复的进展。为了弥补这一差距,我们提出了一个统一的HDR重建框架,该框架集成了伪影检测和模型优化。首先,我们构建了第一个HDR伪数据集(HADataset),该数据集包括1213个不同的多曝光低动态范围(LDR)图像集和1765个带有像素伪注释的HDR图像对。其次,我们开发了一种有效的HDR伪像检测器(HADetector),这是一种能够准确定位HDR重建伪像的鲁棒伪像检测模型。HADetector有两个关键作用:(1)通过微调增强现有的HDR重建模型;(2)作为一种非参考图像质量评估(NR-IQA)指标,即与人类视觉感知密切相关的伪像评分(as),以进行可靠的质量评估。大量的实验验证了我们的框架的有效性和可泛化性,包括HADataset、HADetector、微调范例和AS度量。代码和数据集可在:https://github.com/xinyueliii/hdr-artifact-detect-optimize
{"title":"Rethinking Artifact Mitigation in HDR Reconstruction: From Detection to Optimization","authors":"Xinyue Li;Zhangkai Ni;Hang Wu;Wenhan Yang;Hanli Wang;Lianghua He;Sam Kwong","doi":"10.1109/TIP.2025.3642557","DOIUrl":"10.1109/TIP.2025.3642557","url":null,"abstract":"Artifact remains a long-standing challenge in High Dynamic Range (HDR) reconstruction. Existing methods focus on model designs for artifact mitigation but ignore explicit detection and suppression strategies. Because artifact lacks clear boundaries, distinct shapes, and semantic consistency, and there is no existing dedicated dataset for HDR artifact, progress in direct artifact detection and recovery is impeded. To bridge the gap, we propose a unified HDR reconstruction framework that integrates artifact detection and model optimization. Firstly, we build the first HDR artifact dataset (HADataset), comprising 1,213 diverse multi-exposure Low Dynamic Range (LDR) image sets and 1,765 HDR image pairs with per-pixel artifact annotations. Secondly, we develop an effective HDR artifact detector (HADetector), a robust artifact detection model capable of accurately localizing HDR reconstruction artifact. HADetector plays two pivotal roles: (1) enhancing existing HDR reconstruction models through fine-tuning, and (2) serving as a non-reference image quality assessment (NR-IQA) metric, the Artifact Score (AS), which aligns closely with human visual perception for reliable quality evaluation. Extensive experiments validate the effectiveness and generalizability of our framework, including the HADataset, HADetector, fine-tuning paradigm, and AS metric. The code and datasets are available at: <uri>https://github.com/xinyueliii/hdr-artifact-detect-optimize</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8435-8446"},"PeriodicalIF":13.7,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incomplete Modalities Restoration via Hierarchical Adaptation for Robust Multimodal Segmentation 基于层次自适应的不完全模态恢复鲁棒多模态分割
IF 13.7 Pub Date : 2025-12-17 DOI: 10.1109/TIP.2025.3642612
Yujia Sun;Weisheng Dong;Peng Wu;Mingtao Feng;Tao Huang;Xin Li;Guangming Shi
Multimodal semantic segmentation has significantly advanced the field of semantic segmentation by integrating data from multiple sources. However, this task often encounters missing modality scenarios due to challenges such as sensor failures or data transmission errors, which can result in substantial performance degradation. Existing approaches to addressing missing modalities predominantly involve training separate models tailored to specific missing scenarios, typically requiring considerable computational resources. In this paper, we propose a Hierarchical Adaptation framework to Restore Missing Modalities for Multimodal segmentation (HARM3), which enables frozen pretrained multimodal models to be directly applied to missing-modality semantic segmentation tasks with minimal parameter updates. Central to HARM3 is a text-instructed missing modality prompt module, which learns multimodal semantic knowledge by utilizing available modalities and textual instructions to generate prompts for the missing modalities. By incorporating a small set of trainable parameters, this module effectively facilitates knowledge transfer between high-resource domains and low-resource domains where missing modalities are more prevalent. Besides, to further enhance the model’s robustness and adaptability, we introduce adaptive perturbation training and an affine modality adapter. Extensive experimental results demonstrate the effectiveness and robustness of HARM3 across a variety of missing modality scenarios.
多模态语义分割通过对多源数据的整合,极大地推动了语义分割领域的发展。然而,由于传感器故障或数据传输错误等挑战,该任务经常遇到缺少模态的情况,这可能导致性能大幅下降。解决缺失模式的现有方法主要涉及针对特定缺失场景训练单独的模型,通常需要大量的计算资源。在本文中,我们提出了一个分层自适应框架来恢复缺失模态的多模态分割(HARM3),它使冻结的预训练的多模态模型能够以最小的参数更新直接应用于缺失模态的语义分割任务。HARM3的核心是一个文本指示的缺失情态提示模块,它通过利用可用的模态和文本指令来生成缺失模态的提示,从而学习多模态语义知识。通过结合一组小的可训练参数,该模块有效地促进了高资源领域和低资源领域之间的知识转移,其中缺失模式更为普遍。此外,为了进一步提高模型的鲁棒性和适应性,我们引入了自适应扰动训练和仿射模态适配器。大量的实验结果证明了HARM3在各种缺失模态场景下的有效性和鲁棒性。
{"title":"Incomplete Modalities Restoration via Hierarchical Adaptation for Robust Multimodal Segmentation","authors":"Yujia Sun;Weisheng Dong;Peng Wu;Mingtao Feng;Tao Huang;Xin Li;Guangming Shi","doi":"10.1109/TIP.2025.3642612","DOIUrl":"10.1109/TIP.2025.3642612","url":null,"abstract":"Multimodal semantic segmentation has significantly advanced the field of semantic segmentation by integrating data from multiple sources. However, this task often encounters missing modality scenarios due to challenges such as sensor failures or data transmission errors, which can result in substantial performance degradation. Existing approaches to addressing missing modalities predominantly involve training separate models tailored to specific missing scenarios, typically requiring considerable computational resources. In this paper, we propose a Hierarchical Adaptation framework to Restore Missing Modalities for Multimodal segmentation (HARM3), which enables frozen pretrained multimodal models to be directly applied to missing-modality semantic segmentation tasks with minimal parameter updates. Central to HARM3 is a text-instructed missing modality prompt module, which learns multimodal semantic knowledge by utilizing available modalities and textual instructions to generate prompts for the missing modalities. By incorporating a small set of trainable parameters, this module effectively facilitates knowledge transfer between high-resource domains and low-resource domains where missing modalities are more prevalent. Besides, to further enhance the model’s robustness and adaptability, we introduce adaptive perturbation training and an affine modality adapter. Extensive experimental results demonstrate the effectiveness and robustness of HARM3 across a variety of missing modality scenarios.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8672-8683"},"PeriodicalIF":13.7,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Micro-Expression Analysis Based on Self-Adaptive Pseudo-Labeling and Residual Connected Channel Attention Mechanisms 基于自适应伪标记和残差连通通道注意机制的微表情分析
IF 13.7 Pub Date : 2025-12-17 DOI: 10.1109/TIP.2025.3642527
Jinxiu Zhang;Weidong Min;Jiahao Li;Qing Han
Micro-expressions can reveal genuine emotions that are not easily concealed, making them invaluable in fields such as psychotherapy and criminal interrogation. However, existing pseudo-labeling-based methods for micro-expression analysis have two major limitations. First, pseudo-labels generated by the sliding window do not account for the actual proportion of micro-expressions in the video, which leads to inaccurate labeling. Second, they predominantly focus on overall features, thereby neglecting subtle features. In this paper, we propose a micro-expression analysis method called Spot-Then-Recognize Method (STRM), which integrates spotting and recognition tasks. To address the first limitation, we propose a Self-Adaptive Pseudo-labeling Method (SAPM) that dynamically assigns pseudo-labels to micro-expression frames according to their actual proportion in the video sequence, thereby improving labeling accuracy. To address second limitation, we design a Multi-Scale Residual Channel Attention Network (MSRCAN) to effectively extract subtle micro-expression features. The MSRCAN comprises three modules: Multi-Scale Shared Network (MSSN), Spotting Network, and Recognition Network. The MSSN initially extracts micro-expression features by performing multi-scale feature extraction with Residual Connected Channel Attention Modules (RCCAM), which are then refined in the spotting and recognition networks. We conducted comprehensive experiments on three short video datasets (CASME II, SMIC-E-HS, SMIC-E-NIR) and two long video datasets (CAS(ME)2, SAMMLV). Experimental results show that our proposed method significantly outperforms existing methods, achieving an overall performance of 58.24%, a 19.62% improvement, and a $1.51times $ gain over the baseline in terms of micro-expression analysis.
微表情可以揭示不易隐藏的真实情绪,在心理治疗和刑事审讯等领域具有不可估量的价值。然而,现有的基于伪标注的微表情分析方法存在两大局限性。首先,滑动窗口生成的伪标签没有考虑到微表情在视频中的实际占比,导致标注不准确。其次,他们主要关注整体特征,从而忽略了细微特征。在本文中,我们提出了一种微表情分析方法,称为spot - then - recognition method (STRM),它集成了发现和识别任务。为了解决第一个限制,我们提出了一种自适应伪标记方法(SAPM),该方法根据微表情帧在视频序列中的实际比例动态分配伪标记,从而提高标记精度。为了解决第二个限制,我们设计了一个多尺度剩余通道注意网络(MSRCAN)来有效地提取细微的微表情特征。MSRCAN包括三个模块:多尺度共享网络(MSSN)、发现网络(Spotting Network)和识别网络(Recognition Network)。该方法首先利用残差连通通道注意模块(RCCAM)进行多尺度特征提取,提取微表情特征,然后在发现和识别网络中进行细化。在3个短视频数据集(CASME II、SMIC-E-HS、SMIC-E-NIR)和2个长视频数据集(CAS(ME)2、SAMMLV)上进行了综合实验。实验结果表明,我们提出的方法显著优于现有方法,在微表情分析方面,总体性能提高了58.24%,提高了19.62%,比基线提高了1.51倍。
{"title":"Micro-Expression Analysis Based on Self-Adaptive Pseudo-Labeling and Residual Connected Channel Attention Mechanisms","authors":"Jinxiu Zhang;Weidong Min;Jiahao Li;Qing Han","doi":"10.1109/TIP.2025.3642527","DOIUrl":"10.1109/TIP.2025.3642527","url":null,"abstract":"Micro-expressions can reveal genuine emotions that are not easily concealed, making them invaluable in fields such as psychotherapy and criminal interrogation. However, existing pseudo-labeling-based methods for micro-expression analysis have two major limitations. First, pseudo-labels generated by the sliding window do not account for the actual proportion of micro-expressions in the video, which leads to inaccurate labeling. Second, they predominantly focus on overall features, thereby neglecting subtle features. In this paper, we propose a micro-expression analysis method called Spot-Then-Recognize Method (STRM), which integrates spotting and recognition tasks. To address the first limitation, we propose a Self-Adaptive Pseudo-labeling Method (SAPM) that dynamically assigns pseudo-labels to micro-expression frames according to their actual proportion in the video sequence, thereby improving labeling accuracy. To address second limitation, we design a Multi-Scale Residual Channel Attention Network (MSRCAN) to effectively extract subtle micro-expression features. The MSRCAN comprises three modules: Multi-Scale Shared Network (MSSN), Spotting Network, and Recognition Network. The MSSN initially extracts micro-expression features by performing multi-scale feature extraction with Residual Connected Channel Attention Modules (RCCAM), which are then refined in the spotting and recognition networks. We conducted comprehensive experiments on three short video datasets (CASME II, SMIC-E-HS, SMIC-E-NIR) and two long video datasets (CAS(ME)2, SAMMLV). Experimental results show that our proposed method significantly outperforms existing methods, achieving an overall performance of 58.24%, a 19.62% improvement, and a <inline-formula> <tex-math>$1.51times $ </tex-math></inline-formula> gain over the baseline in terms of micro-expression analysis.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"221-233"},"PeriodicalIF":13.7,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145771081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NiCI-Pruning: Enhancing Diffusion Model Pruning via Noise in Clean Image Guidance nici -剪枝:在干净图像引导中通过噪声增强扩散模型剪枝
IF 13.7 Pub Date : 2025-12-17 DOI: 10.1109/TIP.2025.3643138
Junzhu Mao;Zeren Sun;Yazhou Yao;Tianfei Zhou;Liqiang Nie;Xiansheng Hua
The substantial successes achieved by diffusion probabilistic models have prompted the study of their employment in resource-limited scenarios. Pruning methods have been proven effective in compressing discriminative models relying on the correlation between training losses and model performances. However, diffusion models employ an iterative process for generating high-quality images, leading to a breakdown of such connections. To address this challenge, we propose a simple yet effective method, named NiCI-Pruning (Noise in Clean Image Pruning), for the compression of diffusion models. NiCI-Pruning capitalizes the noise predicted by the model based on clean image inputs, favoring it as a feature for establishing reconstruction losses. Accordingly, Taylor expansion is employed for the proposed reconstruction loss to evaluate the parameter importance effectively. Moreover, we propose an interval sampling strategy that incorporates a timestep-weighted schema, alleviating the risk of misleading information obtained at later timesteps. We provide comprehensive experimental results to affirm the superiority of our proposed approach. Notably, our method achieves a remarkable average reduction of 30.4% in FID score increase across five different datasets compared to the state-of-the-art diffusion pruning method at equivalent pruning rates. Our code and models have been made available at https://github.com/NUST-Machine-Intelligence-Laboratory/NiCI-Pruning
扩散概率模型取得的巨大成功促使人们对其在资源有限情况下的应用进行研究。基于训练损失和模型性能之间的相关性,剪枝方法在压缩判别模型方面被证明是有效的。然而,扩散模型采用迭代过程来生成高质量的图像,导致这种连接的崩溃。为了解决这一挑战,我们提出了一种简单而有效的方法,称为NiCI-Pruning (Noise in Clean Image Pruning),用于压缩扩散模型。NiCI-Pruning利用基于干净图像输入的模型预测的噪声,有利于将其作为建立重建损失的特征。因此,对所提出的重构损失采用泰勒展开,有效地评价了参数的重要性。此外,我们提出了一种包含时间步长加权模式的间隔采样策略,以减轻在后期时间步长获得误导性信息的风险。我们提供了全面的实验结果来证实我们所提出的方法的优越性。值得注意的是,与最先进的扩散修剪方法相比,我们的方法在相同修剪速率下,在五个不同的数据集上,FID评分平均降低了30.4%。我们的代码和模型已在https://github.com/NUST-Machine-Intelligence-Laboratory/NiCI-Pruning上提供
{"title":"NiCI-Pruning: Enhancing Diffusion Model Pruning via Noise in Clean Image Guidance","authors":"Junzhu Mao;Zeren Sun;Yazhou Yao;Tianfei Zhou;Liqiang Nie;Xiansheng Hua","doi":"10.1109/TIP.2025.3643138","DOIUrl":"10.1109/TIP.2025.3643138","url":null,"abstract":"The substantial successes achieved by diffusion probabilistic models have prompted the study of their employment in resource-limited scenarios. Pruning methods have been proven effective in compressing discriminative models relying on the correlation between training losses and model performances. However, diffusion models employ an iterative process for generating high-quality images, leading to a breakdown of such connections. To address this challenge, we propose a simple yet effective method, named NiCI-Pruning (Noise in Clean Image Pruning), for the compression of diffusion models. NiCI-Pruning capitalizes the noise predicted by the model based on clean image inputs, favoring it as a feature for establishing reconstruction losses. Accordingly, Taylor expansion is employed for the proposed reconstruction loss to evaluate the parameter importance effectively. Moreover, we propose an interval sampling strategy that incorporates a timestep-weighted schema, alleviating the risk of misleading information obtained at later timesteps. We provide comprehensive experimental results to affirm the superiority of our proposed approach. Notably, our method achieves a remarkable average reduction of 30.4% in FID score increase across five different datasets compared to the state-of-the-art diffusion pruning method at equivalent pruning rates. Our code and models have been made available at <uri>https://github.com/NUST-Machine-Intelligence-Laboratory/NiCI-Pruning</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8447-8460"},"PeriodicalIF":13.7,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hiding Local Manipulations on SAR Images: A Counter-Forensic Attack 在SAR图像上隐藏局部操作:一种反取证攻击
IF 13.7 Pub Date : 2025-12-17 DOI: 10.1109/TIP.2025.3643154
Sara Mandelli;Edoardo Daniele Cannas;Paolo Bestagini;Stefano Tebaldini;Stefano Tubaro
The vast accessibility of Synthetic Aperture Radar (SAR) images through online portals has propelled the research across various fields. This widespread use and easy availability have unfortunately made SAR data susceptible to malicious alterations, such as local editing applied to the images for inserting or covering the presence of sensitive targets. To contrast malicious manipulations, in the last years the forensic community has begun to dig into the SAR manipulation issue, proposing detectors that effectively localize the tampering traces in amplitude images. Nonetheless, in this paper we demonstrate that an expert practitioner can exploit the complex nature of SAR data to obscure any signs of manipulation within a locally altered amplitude image. We refer to this approach as a counter-forensic attack. To achieve the concealment of manipulation traces, the attacker can simulate a re-acquisition of the manipulated scene by the SAR system that initially generated the pristine image. In doing so, the attacker can obscure any evidence of manipulation, making it appear as if the image was legitimately produced by the system. This attack has unique features that make it both highly generalizable and relatively easy to apply. First, it is a black-box attack, meaning it is not designed to deceive a specific forensic detector. Furthermore, it does not require a training phase and is not based on adversarial operations. We assess the effectiveness of the proposed counter-forensic approach across diverse scenarios, examining various manipulation operations. The obtained results indicate that our devised attack successfully eliminates traces of manipulation, deceiving even the most advanced forensic detectors.
合成孔径雷达(SAR)图像的大量可访问性通过在线门户推动了各个领域的研究。不幸的是,这种广泛使用和容易获得使得SAR数据容易受到恶意更改,例如对图像进行本地编辑以插入或覆盖敏感目标的存在。为了对比恶意操纵,在过去的几年里,法医学界已经开始深入研究SAR操纵问题,提出了有效定位振幅图像中篡改痕迹的检测器。尽管如此,在本文中,我们证明了专业从业者可以利用SAR数据的复杂性来掩盖局部振幅改变图像中的任何操纵迹象。我们把这种方法称为反取证攻击。为了实现操纵痕迹的隐藏,攻击者可以模拟由最初生成原始图像的SAR系统重新获取被操纵的场景。通过这样做,攻击者可以掩盖任何操纵的证据,使其看起来好像图像是由系统合法生成的。这种攻击具有独特的特性,使其具有高度通用性,并且相对容易应用。首先,它是一种黑盒攻击,这意味着它不是为了欺骗特定的法医探测器而设计的。此外,它不需要训练阶段,也不基于对抗性操作。我们评估了在不同情况下提出的反取证方法的有效性,检查了各种操纵操作。获得的结果表明,我们设计的攻击成功地消除了操纵的痕迹,甚至欺骗了最先进的法医探测器。
{"title":"Hiding Local Manipulations on SAR Images: A Counter-Forensic Attack","authors":"Sara Mandelli;Edoardo Daniele Cannas;Paolo Bestagini;Stefano Tebaldini;Stefano Tubaro","doi":"10.1109/TIP.2025.3643154","DOIUrl":"10.1109/TIP.2025.3643154","url":null,"abstract":"The vast accessibility of Synthetic Aperture Radar (SAR) images through online portals has propelled the research across various fields. This widespread use and easy availability have unfortunately made SAR data susceptible to malicious alterations, such as local editing applied to the images for inserting or covering the presence of sensitive targets. To contrast malicious manipulations, in the last years the forensic community has begun to dig into the SAR manipulation issue, proposing detectors that effectively localize the tampering traces in amplitude images. Nonetheless, in this paper we demonstrate that an expert practitioner can exploit the complex nature of SAR data to obscure any signs of manipulation within a locally altered amplitude image. We refer to this approach as a counter-forensic attack. To achieve the concealment of manipulation traces, the attacker can simulate a re-acquisition of the manipulated scene by the SAR system that initially generated the pristine image. In doing so, the attacker can obscure any evidence of manipulation, making it appear as if the image was legitimately produced by the system. This attack has unique features that make it both highly generalizable and relatively easy to apply. First, it is a black-box attack, meaning it is not designed to deceive a specific forensic detector. Furthermore, it does not require a training phase and is not based on adversarial operations. We assess the effectiveness of the proposed counter-forensic approach across diverse scenarios, examining various manipulation operations. The obtained results indicate that our devised attack successfully eliminates traces of manipulation, deceiving even the most advanced forensic detectors.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8523-8536"},"PeriodicalIF":13.7,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving CNN Inference for Image Super-Resolution Cross Multiple Ciphertexts 图像超分辨率跨多密文的保护隐私CNN推理
IF 13.7 Pub Date : 2025-12-17 DOI: 10.1109/TIP.2025.3641310
Peijia Zheng;Donger Mo;Yufei Zhou;Xiangyu Gao;Xiaochun Cao;Jiwu Huang
Online image super-resolution (SR) services have been widely used in applications such as Remini and DeepAI. However, the exposure of plaintext images raises serious privacy concerns. While secure CNN inference techniques are employed to protect images in image classification, they are not applicable to the unique challenges posed by image SR: the output resolution is significantly higher than that of the input image. In this paper, we present a secure CNN inference scheme for image SR by employing a multiple ciphertext encapsulation method. We begin by designing fundamental homomorphic operations, including addition, multiplication, and rotation across ciphertexts. Recognizing that image SR typically involves an upsampling layer—unlike image classification—we propose a fast algorithm for secure upsampling. This technique leverages pre-weight block masking and cross-ciphertext rotation, resulting in a significant speedup compared to direct homomorphic upsampling. We then present an efficient batched homomorphic two-dimensional convolution method across ciphertexts, incorporating kernel rearrangement and merging strategies. We also design a polynomial activation function specifically optimized for image SR, further enhancing performance. Extensive experiments demonstrate that our HE-friendly SR network outperforms existing secure solutions, while the proposed multiple ciphertext encapsulation technique achieves at least a 2x improvement in both computational efficiency and memory usage.
在线图像超分辨率(SR)服务已广泛应用于雷米尼和DeepAI等应用。然而,明文图像的曝光引发了严重的隐私问题。虽然在图像分类中使用了安全CNN推理技术来保护图像,但它并不适用于图像SR带来的独特挑战:输出分辨率明显高于输入图像。本文提出了一种采用多密文封装方法的图像SR安全CNN推理方案。我们首先设计基本的同态操作,包括密文间的加法、乘法和旋转。认识到图像SR通常涉及上采样层(与图像分类不同),我们提出了一种快速的安全上采样算法。这种技术利用了权重前块屏蔽和跨密文旋转,与直接同态上采样相比,可以显著提高速度。然后,我们提出了一种有效的跨密文的批处理同态二维卷积方法,该方法结合了核重排和合并策略。我们还设计了一个专门针对图像SR优化的多项式激活函数,进一步提高了性能。大量实验表明,我们的he友好SR网络优于现有的安全解决方案,而提出的多密文封装技术在计算效率和内存使用方面至少提高了2倍。
{"title":"Privacy-Preserving CNN Inference for Image Super-Resolution Cross Multiple Ciphertexts","authors":"Peijia Zheng;Donger Mo;Yufei Zhou;Xiangyu Gao;Xiaochun Cao;Jiwu Huang","doi":"10.1109/TIP.2025.3641310","DOIUrl":"10.1109/TIP.2025.3641310","url":null,"abstract":"Online image super-resolution (SR) services have been widely used in applications such as Remini and DeepAI. However, the exposure of plaintext images raises serious privacy concerns. While secure CNN inference techniques are employed to protect images in image classification, they are not applicable to the unique challenges posed by image SR: the output resolution is significantly higher than that of the input image. In this paper, we present a secure CNN inference scheme for image SR by employing a multiple ciphertext encapsulation method. We begin by designing fundamental homomorphic operations, including addition, multiplication, and rotation across ciphertexts. Recognizing that image SR typically involves an upsampling layer—unlike image classification—we propose a fast algorithm for secure upsampling. This technique leverages pre-weight block masking and cross-ciphertext rotation, resulting in a significant speedup compared to direct homomorphic upsampling. We then present an efficient batched homomorphic two-dimensional convolution method across ciphertexts, incorporating kernel rearrangement and merging strategies. We also design a polynomial activation function specifically optimized for image SR, further enhancing performance. Extensive experiments demonstrate that our HE-friendly SR network outperforms existing secure solutions, while the proposed multiple ciphertext encapsulation technique achieves at least a 2x improvement in both computational efficiency and memory usage.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8568-8582"},"PeriodicalIF":13.7,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-Grained Image Captioning by Ranking Diffusion Transformer 分级扩散转换器的细粒度图像字幕
IF 13.7 Pub Date : 2025-12-15 DOI: 10.1109/TIP.2025.3641303
Jun Wan;Min Gan;Lefei Zhang;Jie Zhou;Jun Liu;Bo Du;C. L. Philip Chen
The CLIP visual feature-based image captioning models have developed rapidly and achieved remarkable results. However, existing models still struggle to produce descriptive and discriminative captions because they insufficiently exploit fine-grained visual cues and fail to model complex vision–language alignment. To address these limitations, we propose a Ranking Diffusion Transformer (RDT), which integrates a Ranking Visual Encoder (RVE) and a Ranking Loss (RL) for fine-grained image captioning. The RVE introduces a novel ranking attention mechanism that effectively mines diverse and discriminative visual information from CLIP features. Meanwhile, the RL leverages the ranking of generated caption quality as a global semantic supervisory signal, thereby enhancing the diffusion process and strengthening vision–language semantic alignment. We show that by collaborating RVE and RL via the novel RDT—and by gradually adding and removing noise in the diffusion process—more discriminative visual features are learned and precisely aligned with the language features. Experimental results on popular benchmark datasets demonstrate that our proposed RDT surpasses existing state-of-the-art image captioning models in the literature. The code is publicly available at: https://github.com/junwan2014/RDT
基于CLIP视觉特征的图像字幕模型发展迅速,取得了显著的效果。然而,现有的模型仍然难以产生描述性和判别性的标题,因为它们没有充分利用细粒度的视觉线索,并且无法对复杂的视觉语言对齐进行建模。为了解决这些限制,我们提出了一种排序扩散转换器(RDT),它集成了用于细粒度图像字幕的排序视觉编码器(RVE)和排序损失(RL)。RVE引入了一种新的排序注意机制,可以有效地从CLIP特征中挖掘出多样性和歧视性的视觉信息。同时,RL利用生成的标题质量排序作为全局语义监督信号,从而增强扩散过程,加强视觉语言语义对齐。我们发现,通过RVE和RL通过新颖的rdt进行协作,并在扩散过程中逐渐添加和去除噪声,可以学习到更多的判别性视觉特征,并与语言特征精确地对齐。在流行的基准数据集上的实验结果表明,我们提出的RDT优于文献中现有的最先进的图像字幕模型。该代码可在https://github.com/junwan2014/RDT公开获取
{"title":"Fine-Grained Image Captioning by Ranking Diffusion Transformer","authors":"Jun Wan;Min Gan;Lefei Zhang;Jie Zhou;Jun Liu;Bo Du;C. L. Philip Chen","doi":"10.1109/TIP.2025.3641303","DOIUrl":"10.1109/TIP.2025.3641303","url":null,"abstract":"The CLIP visual feature-based image captioning models have developed rapidly and achieved remarkable results. However, existing models still struggle to produce descriptive and discriminative captions because they insufficiently exploit fine-grained visual cues and fail to model complex vision–language alignment. To address these limitations, we propose a Ranking Diffusion Transformer (RDT), which integrates a Ranking Visual Encoder (RVE) and a Ranking Loss (RL) for fine-grained image captioning. The RVE introduces a novel ranking attention mechanism that effectively mines diverse and discriminative visual information from CLIP features. Meanwhile, the RL leverages the ranking of generated caption quality as a global semantic supervisory signal, thereby enhancing the diffusion process and strengthening vision–language semantic alignment. We show that by collaborating RVE and RL via the novel RDT—and by gradually adding and removing noise in the diffusion process—more discriminative visual features are learned and precisely aligned with the language features. Experimental results on popular benchmark datasets demonstrate that our proposed RDT surpasses existing state-of-the-art image captioning models in the literature. The code is publicly available at: <uri>https://github.com/junwan2014/RDT</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8332-8344"},"PeriodicalIF":13.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145759515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision–Language Models Empowered Nighttime Object Detection With Consistency Sampler and Hallucination Feature Generator 视觉语言模型与一致性采样器和幻觉特征发生器增强夜间目标检测
IF 13.7 Pub Date : 2025-12-15 DOI: 10.1109/TIP.2025.3641316
Lihuo He;Junjie Ke;Zhenghao Wang;Jie Li;Kai Zhou;Qi Wang;Xinbo Gao
Current object detectors often suffer performance degradation when applied to cross-domain scenarios, particularly under challenging visual conditions such as nighttime scenes. This is primarily due to the I3 problems: Inadequate sampling of instance-level features, Indistinguishable feature representation across domains and Inaccurate generation for identical category participation. To address these challenges, we propose a domain-adaptive detection framework that enables robust generalization across different visual domains without introducing any additional inference overhead. The framework comprises three key components. Specifically, the centerness–category consistency sampler alleviates inadequate sampling by selecting representative instance-level features, while the paired centerness consistency loss enforces alignment between classification and localization. Second, VLM-based orthogonality enhancement leverages frozen vision–language encoders with an orthogonal projection loss to improve cross-domain feature distinguishability. Third, hallucination feature generator synthesizes robust instance-level features for missing categories, ensuring balanced category participation across domains. Extensive experiments on multiple datasets covering various domain adaptation and generalization settings demonstrate that our method consistently outperforms state-of-the-art detectors, achieving up to 5.5 mAP improvement, with particularly strong gains in nighttime adaptation.
当前的目标检测器在应用于跨域场景时通常会出现性能下降,特别是在具有挑战性的视觉条件下,如夜间场景。这主要是由于I3问题:实例级特征的采样不足,跨领域的不可区分特征表示以及相同类别参与的不准确生成。为了解决这些挑战,我们提出了一个领域自适应检测框架,该框架可以在不引入任何额外推理开销的情况下实现跨不同视觉领域的鲁棒泛化。该框架包括三个关键部分。具体而言,中心度-类别一致性采样器通过选择具有代表性的实例级特征来缓解采样不足,而成对的中心度一致性损失则强制分类和定位之间的对齐。其次,基于vmm的正交性增强利用具有正交投影损失的冻结视觉语言编码器来提高跨域特征的可分辨性。第三,幻觉特征生成器为缺失的类别合成鲁棒的实例级特征,确保跨领域的平衡类别参与。在涵盖各种领域自适应和泛化设置的多个数据集上进行的大量实验表明,我们的方法始终优于最先进的检测器,实现了高达5.5 mAP的改进,特别是在夜间适应方面取得了很大的进步。
{"title":"Vision–Language Models Empowered Nighttime Object Detection With Consistency Sampler and Hallucination Feature Generator","authors":"Lihuo He;Junjie Ke;Zhenghao Wang;Jie Li;Kai Zhou;Qi Wang;Xinbo Gao","doi":"10.1109/TIP.2025.3641316","DOIUrl":"10.1109/TIP.2025.3641316","url":null,"abstract":"Current object detectors often suffer performance degradation when applied to cross-domain scenarios, particularly under challenging visual conditions such as nighttime scenes. This is primarily due to the I3 problems: Inadequate sampling of instance-level features, Indistinguishable feature representation across domains and Inaccurate generation for identical category participation. To address these challenges, we propose a domain-adaptive detection framework that enables robust generalization across different visual domains without introducing any additional inference overhead. The framework comprises three key components. Specifically, the centerness–category consistency sampler alleviates inadequate sampling by selecting representative instance-level features, while the paired centerness consistency loss enforces alignment between classification and localization. Second, VLM-based orthogonality enhancement leverages frozen vision–language encoders with an orthogonal projection loss to improve cross-domain feature distinguishability. Third, hallucination feature generator synthesizes robust instance-level features for missing categories, ensuring balanced category participation across domains. Extensive experiments on multiple datasets covering various domain adaptation and generalization settings demonstrate that our method consistently outperforms state-of-the-art detectors, achieving up to 5.5 mAP improvement, with particularly strong gains in nighttime adaptation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8345-8360"},"PeriodicalIF":13.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145759517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S2AFormer: Strip Self-Attention for Efficient Vision Transformer S2AFormer:带自关注高效视觉变压器
IF 13.7 Pub Date : 2025-12-11 DOI: 10.1109/TIP.2025.3639919
Guoan Xu;Wenfeng Huang;Wenjing Jia;Jiamao Li;Guangwei Gao;Guo-Jun Qi
The Vision Transformer (ViT) has achieved remarkable success in computer vision due to its powerful token mixer, which effectively captures global dependencies among all tokens. However, the quadratic complexity of standard self-attention with respect to the number of tokens severely hampers its computational efficiency in practical deployment. Although recent hybrid approaches have sought to combine the strengths of convolutions and self-attention to improve the performance–efficiency trade-off, the costly pairwise token interactions and heavy matrix operations in conventional self-attention remain a critical bottleneck. To overcome this limitation, we introduce S2AFormer, an efficient Vision Transformer architecture built around a novel Strip Self-Attention (SSA) mechanism. Our design incorporates lightweight yet effective Hybrid Perception Blocks (HPBs) that seamlessly fuse the local inductive biases of CNNs with the global modeling capability of Transformer-style attention. The core innovation of SSA lies in simultaneously reducing the spatial resolution of the key ( $K$ ) and value ( $V$ ) tensors while compressing the channel dimension of the query ( $Q$ ) and key ( $K$ ) tensors. This joint spatial-and-channel compression dramatically lowers computational cost without sacrificing representational power, achieving an excellent balance between accuracy and efficiency. We extensively evaluate S2AFormer on a wide range of vision tasks, including image classification (ImageNet-1K), semantic segmentation (ADE20K), and object detection/instance segmentation (COCO). Experimental results consistently show that S2AFormer delivers substantial accuracy improvements together with superior inference speed and throughput across both GPU and non-GPU platforms, establishing it as a highly competitive solution in the landscape of efficient Vision Transformers.
视觉转换器(Vision Transformer, ViT)由于其强大的令牌混合器(token mixer),有效地捕获了所有令牌之间的全局依赖关系,在计算机视觉领域取得了显著的成功。然而,标准自关注相对于令牌数量的二次复杂度严重阻碍了其在实际部署中的计算效率。尽管最近的混合方法试图结合卷积和自关注的优势来改善性能-效率的权衡,但传统自关注中昂贵的成对令牌交互和繁重的矩阵操作仍然是一个关键的瓶颈。为了克服这一限制,我们引入了S2AFormer,这是一种高效的视觉变压器架构,围绕一种新颖的条带自注意(SSA)机制构建。我们的设计结合了轻量级但有效的混合感知块(HPBs),它无缝地融合了cnn的局部归纳偏差和变压器式注意力的全局建模能力。SSA的核心创新在于,在压缩查询张量(Q$)和键张量(K$)的通道维数的同时,降低键张量(K$)和值张量(V$)的空间分辨率。这种联合的空间和信道压缩在不牺牲表示能力的情况下显著降低了计算成本,实现了精度和效率之间的良好平衡。我们在广泛的视觉任务上对S2AFormer进行了广泛的评估,包括图像分类(ImageNet-1K)、语义分割(ADE20K)和目标检测/实例分割(COCO)。实验结果一致表明,S2AFormer在GPU和非GPU平台上提供了显著的精度改进,以及卓越的推理速度和吞吐量,使其成为高效视觉变压器领域中极具竞争力的解决方案。
{"title":"S2AFormer: Strip Self-Attention for Efficient Vision Transformer","authors":"Guoan Xu;Wenfeng Huang;Wenjing Jia;Jiamao Li;Guangwei Gao;Guo-Jun Qi","doi":"10.1109/TIP.2025.3639919","DOIUrl":"10.1109/TIP.2025.3639919","url":null,"abstract":"The Vision Transformer (ViT) has achieved remarkable success in computer vision due to its powerful token mixer, which effectively captures global dependencies among all tokens. However, the quadratic complexity of standard self-attention with respect to the number of tokens severely hampers its computational efficiency in practical deployment. Although recent hybrid approaches have sought to combine the strengths of convolutions and self-attention to improve the performance–efficiency trade-off, the costly pairwise token interactions and heavy matrix operations in conventional self-attention remain a critical bottleneck. To overcome this limitation, we introduce S2AFormer, an efficient Vision Transformer architecture built around a novel Strip Self-Attention (SSA) mechanism. Our design incorporates lightweight yet effective Hybrid Perception Blocks (HPBs) that seamlessly fuse the local inductive biases of CNNs with the global modeling capability of Transformer-style attention. The core innovation of SSA lies in simultaneously reducing the spatial resolution of the key (<inline-formula> <tex-math>$K$ </tex-math></inline-formula>) and value (<inline-formula> <tex-math>$V$ </tex-math></inline-formula>) tensors while compressing the channel dimension of the query (<inline-formula> <tex-math>$Q$ </tex-math></inline-formula>) and key (<inline-formula> <tex-math>$K$ </tex-math></inline-formula>) tensors. This joint spatial-and-channel compression dramatically lowers computational cost without sacrificing representational power, achieving an excellent balance between accuracy and efficiency. We extensively evaluate S2AFormer on a wide range of vision tasks, including image classification (ImageNet-1K), semantic segmentation (ADE20K), and object detection/instance segmentation (COCO). Experimental results consistently show that S2AFormer delivers substantial accuracy improvements together with superior inference speed and throughput across both GPU and non-GPU platforms, establishing it as a highly competitive solution in the landscape of efficient Vision Transformers.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8243-8256"},"PeriodicalIF":13.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145728828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Precision Camera Distortion Correction: A Decoupled Approach With Rational Functions 高精度相机畸变校正:一种有理函数解耦方法。
IF 13.7 Pub Date : 2025-12-11 DOI: 10.1109/TIP.2025.3641052
Jiachuan Yu;Han Sun;Yuankai Zhou;Xiaowei Jiang
This paper presents a robust, decoupled approach to camera distortion correction using a rational function model (RFM), designed to address challenges in accuracy and flexibility within precision-critical applications. Camera distortion is a pervasive issue in fields such as medical imaging, robotics, and 3D reconstruction, where high fidelity and geometric accuracy are crucial. Traditional distortion correction methods rely on radial-symmetry-based models, which have limited precision under tangential distortion and require nonlinear optimization. In contrast, general models do not rely on radial symmetry geometry and are theoretically generalizable to various sources of distortion. There exists a gap between the theoretical precision advantage of the Rational Function Model (RFM) and its practical applicability in real-world scenarios. This gap arises from uncertainties regarding the model’s robustness to noise, the impact of sparse sample distributions, and its generalizability out of the training sample range. In this paper, we provide a mathematical interpretation of how RFM is suitable for the distortion correction problem through sensitivity analysis. The precision and robustness of RFM are evaluated through synthetic and real-world experiments, considering distortion level, noise level, and sample distribution. Moreover, a practical and accurate decoupled distortion correction method is proposed using just a single captured image of a chessboard pattern. The correction performance is compared with the current state-of-the-art using camera calibration, and experimental results indicate that more precise distortion correction can enhance the overall accuracy of camera calibration. In summary, this decoupled RFM-based distortion correction approach provides a flexible, high-precision solution for applications requiring minimal calibration steps and reliable geometric accuracy, establishing a foundation for distortion-free imaging and simplified camera models in precision-driven computer vision tasks.
本文提出了一种使用理性函数模型(RFM)的鲁棒解耦相机畸变校正方法,旨在解决精度关键应用中精度和灵活性方面的挑战。在医学成像、机器人和3D重建等领域,相机失真是一个普遍存在的问题,在这些领域,高保真度和几何精度至关重要。传统的畸变校正方法依赖于基于径向对称的模型,在切向畸变下精度有限,且需要非线性优化。相比之下,一般模型不依赖于径向对称几何,理论上可以推广到各种失真源。在Rational Function Model (RFM)的理论精度优势和它在现实场景中的实际适用性之间存在着差距。这种差距源于模型对噪声的鲁棒性、稀疏样本分布的影响以及其在训练样本范围外的泛化性的不确定性。在本文中,我们通过灵敏度分析提供了RFM如何适用于失真校正问题的数学解释。通过综合考虑失真水平、噪声水平和样本分布,对RFM的精度和鲁棒性进行了评价。此外,本文还提出了一种实用且精确的解耦畸变校正方法。实验结果表明,更精确的畸变校正可以提高摄像机标定的整体精度。总之,这种解耦的基于rfm的畸变校正方法为需要最小校准步骤和可靠几何精度的应用提供了灵活、高精度的解决方案,为精确驱动的计算机视觉任务中的无畸变成像和简化相机模型奠定了基础。
{"title":"High-Precision Camera Distortion Correction: A Decoupled Approach With Rational Functions","authors":"Jiachuan Yu;Han Sun;Yuankai Zhou;Xiaowei Jiang","doi":"10.1109/TIP.2025.3641052","DOIUrl":"10.1109/TIP.2025.3641052","url":null,"abstract":"This paper presents a robust, decoupled approach to camera distortion correction using a rational function model (RFM), designed to address challenges in accuracy and flexibility within precision-critical applications. Camera distortion is a pervasive issue in fields such as medical imaging, robotics, and 3D reconstruction, where high fidelity and geometric accuracy are crucial. Traditional distortion correction methods rely on radial-symmetry-based models, which have limited precision under tangential distortion and require nonlinear optimization. In contrast, general models do not rely on radial symmetry geometry and are theoretically generalizable to various sources of distortion. There exists a gap between the theoretical precision advantage of the Rational Function Model (RFM) and its practical applicability in real-world scenarios. This gap arises from uncertainties regarding the model’s robustness to noise, the impact of sparse sample distributions, and its generalizability out of the training sample range. In this paper, we provide a mathematical interpretation of how RFM is suitable for the distortion correction problem through sensitivity analysis. The precision and robustness of RFM are evaluated through synthetic and real-world experiments, considering distortion level, noise level, and sample distribution. Moreover, a practical and accurate decoupled distortion correction method is proposed using just a single captured image of a chessboard pattern. The correction performance is compared with the current state-of-the-art using camera calibration, and experimental results indicate that more precise distortion correction can enhance the overall accuracy of camera calibration. In summary, this decoupled RFM-based distortion correction approach provides a flexible, high-precision solution for applications requiring minimal calibration steps and reliable geometric accuracy, establishing a foundation for distortion-free imaging and simplified camera models in precision-driven computer vision tasks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"290-304"},"PeriodicalIF":13.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145728417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1