首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Deep Learning-Based Joint Geometry and Attribute Up-Sampling for Large-Scale Colored Point Clouds. 基于深度学习的大规模彩色点云联合几何和属性上采样。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3657214
Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong

Colored point cloud comprising geometry and attribute components is one of the mainstream representations enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method, which learns to model both geometry and attribute patterns and leverages the spatial attribute correlation. Firstly, we establish and release a large-scale dataset for colored point cloud up-sampling, named SYSU-PCUD, which has 121 large-scale colored point clouds with diverse geometry and attribute complexities in six categories and four sampling rates. Secondly, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework to up-sample the geometry and attribute jointly. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Thirdly, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarsely up-sampled attributes for each point. Then, we propose an attribute enhancement module to refine the up-sampled attributes and generate high quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU are 33.90 dB, 32.10 dB, 31.10 dB, and 30.39 dB when up-sampling rates are $4times $ , $8times $ , $12times $ , and $16times $ , respectively. Compared to the state-of-the-art schemes, the JGAU achieves an average of 2.32 dB, 2.47 dB, 2.28 dB and 2.11 dB PSNR gains at four up-sampling rates, respectively, which are significant. The code is released with https://github.com/SYSU-Video/JGAU.

由几何和属性组件组成的彩色点云是实现逼真和沉浸式3D应用的主流表示之一。为了生成大规模和更密集的彩色点云,我们提出了一种基于深度学习的联合几何和属性上采样(JGAU)方法,该方法学习建模几何和属性模式,并利用空间属性相关性。首先,建立并发布了大规模彩色点云上采样数据集SYSU-PCUD,该数据集包含6类、4种采样率、121种不同几何和属性复杂度的大规模彩色点云。其次,为了提高点云上采样的质量,提出了一种基于深度学习的JGAU框架,对几何和属性进行联合上采样。它由一个几何上采样网络和一个属性上采样网络组成,后者利用上采样的辅助几何来建模属性的邻域相关性。第三,提出了几何距离加权属性插值(GDWAI)和基于深度学习的属性插值(DLAI)两种粗属性上采样方法,对每个点进行粗属性上采样。然后,我们提出了一个属性增强模块,通过进一步挖掘固有属性和几何模式来细化上采样属性,生成高质量的点云。大量实验表明,当上采样率为4倍、8倍、12倍和16倍时,JGAU的峰值信噪比分别为33.90 dB、32.10 dB、31.10 dB和30.39 dB。与最先进的方案相比,JGAU在四种上采样率下分别实现了2.32 dB、2.47 dB、2.28 dB和2.11 dB的平均PSNR增益,这是非常显著的。
{"title":"Deep Learning-Based Joint Geometry and Attribute Up-Sampling for Large-Scale Colored Point Clouds.","authors":"Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong","doi":"10.1109/TIP.2026.3657214","DOIUrl":"10.1109/TIP.2026.3657214","url":null,"abstract":"<p><p>Colored point cloud comprising geometry and attribute components is one of the mainstream representations enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method, which learns to model both geometry and attribute patterns and leverages the spatial attribute correlation. Firstly, we establish and release a large-scale dataset for colored point cloud up-sampling, named SYSU-PCUD, which has 121 large-scale colored point clouds with diverse geometry and attribute complexities in six categories and four sampling rates. Secondly, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework to up-sample the geometry and attribute jointly. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Thirdly, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarsely up-sampled attributes for each point. Then, we propose an attribute enhancement module to refine the up-sampled attributes and generate high quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU are 33.90 dB, 32.10 dB, 31.10 dB, and 30.39 dB when up-sampling rates are $4times $ , $8times $ , $12times $ , and $16times $ , respectively. Compared to the state-of-the-art schemes, the JGAU achieves an average of 2.32 dB, 2.47 dB, 2.28 dB and 2.11 dB PSNR gains at four up-sampling rates, respectively, which are significant. The code is released with https://github.com/SYSU-Video/JGAU.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1305-1320"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One Step Diffusion-Based Super-Resolution With Time-Aware Distillation. 基于时间感知蒸馏的一步扩散超分辨。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3672376
Xiao He, Huaao Tang, Zhijun Tu, Junchao Zhang, Kun Cheng, Hanting Chen, Yong Guo, Mingrui Zhu, Jie Hu, Nannan Wang, Xinbo Gao

Diffusion-based image super-resolution (SR) has shown strong potential in recovering high-fidelity details from low-resolution inputs. However, the need for tens or hundreds of sampling steps leads to substantial inference latency. Recent works attempt to accelerate this process via knowledge distillation, but often rely solely on pixel-level loss or overlook the fact that diffusion models capture different information across time steps. To address this, we propose TAD-SR, a time-aware diffusion distillation framework. Specifically, we introduce a novel score distillation strategy to align the score functions between the outputs of the student and teacher models after minor noise perturbation. This distillation strategy eliminates the inherent bias in score distillation sampling (SDS) and enables the student models to focus more on high-frequency image details by sampling at smaller time steps. We further introduce a time-aware discriminator that exploits the teacher's knowledge to differentiate real and synthetic samples across different noise scales, using explicit temporal conditioning. Extensive experiments on SR tasks demonstrate that TAD-SR outperforms existing single-step diffusion methods and achieves performance on par with multi-step state-of-the-art models.

基于注入的图像超分辨率(SR)在从低分辨率输入恢复高保真细节方面显示出强大的潜力。然而,需要数十或数百个采样步骤会导致大量的推理延迟。最近的研究试图通过知识蒸馏来加速这一过程,但通常只依赖于像素级的损失,或者忽略了扩散模型在时间步长上捕获不同信息的事实。为了解决这个问题,我们提出了TAD-SR,一个时间感知扩散蒸馏框架。具体来说,我们引入了一种新的分数蒸馏策略,在轻微噪声扰动后,在学生和教师模型的输出之间对齐分数函数。这种蒸馏策略消除了分数蒸馏采样(SDS)的固有偏差,使学生模型能够通过更小的时间步长采样更多地关注高频图像细节。我们进一步介绍了一个时间感知鉴别器,它利用教师的知识来区分不同噪声尺度的真实样本和合成样本,使用明确的时间条件。在SR任务上的大量实验表明,TAD-SR优于现有的单步扩散方法,并达到与多步最先进模型相当的性能。基于注入的图像超分辨率(SR)在从低分辨率输入恢复高保真细节方面显示出强大的潜力。然而,需要数十或数百个采样步骤会导致大量的推理延迟。最近的研究试图通过知识蒸馏来加速这一过程,但通常只依赖于像素级的损失,或者忽略了扩散模型在时间步长上捕获不同信息的事实。为了解决这个问题,我们提出了TADSR,一个时间感知扩散蒸馏框架。具体来说,我们引入了一种新的分数蒸馏策略,在轻微噪声扰动后,在学生和教师模型的输出之间对齐分数函数。这种蒸馏策略消除了分数蒸馏采样(SDS)的固有偏差,并使学生模型能够通过以较小的时间步长采样来更多地关注高频图像细节。我们进一步介绍了一个时间感知鉴别器,它利用教师的知识来区分不同噪声尺度的真实样本和合成样本,使用明确的时间条件。在SR任务上的大量实验表明,TAD-SR优于现有的单步扩散方法,并达到与多步最先进模型D相当的性能。
{"title":"One Step Diffusion-Based Super-Resolution With Time-Aware Distillation.","authors":"Xiao He, Huaao Tang, Zhijun Tu, Junchao Zhang, Kun Cheng, Hanting Chen, Yong Guo, Mingrui Zhu, Jie Hu, Nannan Wang, Xinbo Gao","doi":"10.1109/TIP.2026.3672376","DOIUrl":"10.1109/TIP.2026.3672376","url":null,"abstract":"<p><p>Diffusion-based image super-resolution (SR) has shown strong potential in recovering high-fidelity details from low-resolution inputs. However, the need for tens or hundreds of sampling steps leads to substantial inference latency. Recent works attempt to accelerate this process via knowledge distillation, but often rely solely on pixel-level loss or overlook the fact that diffusion models capture different information across time steps. To address this, we propose TAD-SR, a time-aware diffusion distillation framework. Specifically, we introduce a novel score distillation strategy to align the score functions between the outputs of the student and teacher models after minor noise perturbation. This distillation strategy eliminates the inherent bias in score distillation sampling (SDS) and enables the student models to focus more on high-frequency image details by sampling at smaller time steps. We further introduce a time-aware discriminator that exploits the teacher's knowledge to differentiate real and synthetic samples across different noise scales, using explicit temporal conditioning. Extensive experiments on SR tasks demonstrate that TAD-SR outperforms existing single-step diffusion methods and achieves performance on par with multi-step state-of-the-art models.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"2928-2940"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing. SPAgent:通用视频生成和编辑的自适应任务分解和模型选择。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3673949
Rong-Cheng Tu, Wenhao Sun, Zhao Jin, Jingyi Liao, Jiaxing Huang, Dacheng Tao

Although video generation and editing models have advanced significantly, individual models remain restricted to specific tasks, often failing to meet diverse user needs. Effectively coordinating these models in pipelines can unlock a wide range of video generation and editing capabilities. However, manual orchestration is complex, time-consuming, and requires deep expertise in model performance and limitations. To address these challenges, we propose the Semantic Planning Agent (SPAgent), a novel system that automatically coordinates state-of-the-art open-source models to fulfill complex user intents. To equip SPAgent with robust orchestration capabilities, we introduce a three-step framework: 1) decoupled intent recognition to accurately parse multi-modal inputs; 2) principle-guided route planning to design effective execution chains; and 3) capability-based model selection to identify the optimal tools for each sub-task. To facilitate training, we curate a comprehensive multi-task generative video dataset. Furthermore, we enhance SPAgent with a video quality evaluation module, enabling it to autonomously assess and incorporate new models into its tool library without human intervention. Experimental results demonstrate that SPAgent effectively coordinates models to generate and edit high-quality videos, exhibiting superior versatility and adaptability across various tasks.

虽然视频生成和编辑模型有了很大的进步,但单个模型仍然局限于特定的任务,往往不能满足不同的用户需求。在管道中有效地协调这些模型可以解锁广泛的视频生成和编辑功能。然而,手工编排是复杂的、耗时的,并且需要在模型性能和限制方面有深入的专业知识。为了应对这些挑战,我们提出了语义规划代理(SPAgent),这是一个自动协调最先进的开源模型以实现复杂用户意图的新系统。为了使SPAgent具有强大的编排能力,我们引入了一个三步框架:(1)解耦的意图识别,以准确解析多模态输入;(2)原则导向路线规划,设计有效的执行链;(3)基于能力的模型选择,确定每个子任务的最优工具。为了方便训练,我们策划了一个全面的多任务生成视频数据集。此外,我们通过视频质量评估模块增强了SPAgent,使其能够自主评估并将新模型纳入其工具库,而无需人工干预。实验结果表明,SPAgent可以有效地协调模型生成和编辑高质量的视频,在各种任务中表现出优越的通用性和适应性。
{"title":"SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing.","authors":"Rong-Cheng Tu, Wenhao Sun, Zhao Jin, Jingyi Liao, Jiaxing Huang, Dacheng Tao","doi":"10.1109/TIP.2026.3673949","DOIUrl":"10.1109/TIP.2026.3673949","url":null,"abstract":"<p><p>Although video generation and editing models have advanced significantly, individual models remain restricted to specific tasks, often failing to meet diverse user needs. Effectively coordinating these models in pipelines can unlock a wide range of video generation and editing capabilities. However, manual orchestration is complex, time-consuming, and requires deep expertise in model performance and limitations. To address these challenges, we propose the Semantic Planning Agent (SPAgent), a novel system that automatically coordinates state-of-the-art open-source models to fulfill complex user intents. To equip SPAgent with robust orchestration capabilities, we introduce a three-step framework: 1) decoupled intent recognition to accurately parse multi-modal inputs; 2) principle-guided route planning to design effective execution chains; and 3) capability-based model selection to identify the optimal tools for each sub-task. To facilitate training, we curate a comprehensive multi-task generative video dataset. Furthermore, we enhance SPAgent with a video quality evaluation module, enabling it to autonomously assess and incorporate new models into its tool library without human intervention. Experimental results demonstrate that SPAgent effectively coordinates models to generate and edit high-quality videos, exhibiting superior versatility and adaptability across various tasks.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"3085-3098"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147489129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causally-Aware Unsupervised Feature Selection Learning. 因果意识无监督特征选择学习。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654354
Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li

Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.

近年来,无监督特征选择(Unsupervised feature selection, UFS)因其在处理未标记高维数据方面的有效性而备受关注。然而,现有的方法忽略了数据内在的因果机制,导致选择不相关的特征和较差的可解释性。此外,以前基于图的方法在构建相似图时没有考虑到非因果和因果特征的不同影响,从而导致生成图中的假链接。为了解决这些问题,提出了一种新的UFS方法,称为因果感知无监督特征选择学习(CAUSE-FS)。CAUSE-FS引入了一种新的因果正则器,它重新加权样本以平衡每个处理特征的混杂分布。该正则化器随后集成到广义无监督谱回归模型中,以减轻特征和聚类标签之间的虚假关联,从而实现因果特征选择。此外,CAUSE-FS采用因果引导的分层聚类,将不同因果贡献的特征划分为多个粒度。CAUSE-FS通过对不同粒度自适应学习的相似图进行整合,提高了构建融合相似图时因果特征的重要性,以捕获可靠的数据局部结构。大量的实验结果证明了CAUSE-FS优于最先进的方法,并通过特征可视化进一步验证了其可解释性。
{"title":"Causally-Aware Unsupervised Feature Selection Learning.","authors":"Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li","doi":"10.1109/TIP.2026.3654354","DOIUrl":"10.1109/TIP.2026.3654354","url":null,"abstract":"<p><p>Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1011-1024"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Token Calibration for Transformer-Based Domain Adaptation 基于变压器域自适应的令牌校正。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2025.3647367
Xiaowei Fu;Shiyu Ye;Chenxu Zhang;Fuxiang Huang;Xin Xu;Lei Zhang
Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain by learning domain-invariant representations. Motivated by the recent success of Vision Transformers (ViTs), several UDA approaches have adopted ViT architectures to exploit fine-grained patch-level representations, which are unified as Transformer-based $D$ omain $A$ daptation (TransDA) independent of CNN-based. However, we have a key observation in TransDA: due to inherent domain shifts, patches (tokens) from different semantic categories across domains may exhibit abnormally high similarities, which can mislead the self-attention mechanism and degrade adaptation performance. To solve that, we propose a novel $P$ atch- $A$ daptation Transformer (PATrans), which first identifies similarity-anomalous patches and then adaptively suppresses their negative impact to domain alignment, i.e. token calibration. Specifically, we introduce a $P$ atch- $A$ daptation $A$ ttention (PAA) mechanism to replace the standard self-attention mechanism, which consists of a weight-shared triple-branch mixed attention mechanism and a patch-level domain discriminator. The mixed attention integrates self-attention and cross-attention to enhance intra-domain feature modeling and inter-domain similarity estimation. Meanwhile, the patch-level domain discriminator quantifies the anomaly probability of each patch, enabling dynamic reweighting to mitigate the impact of unreliable patch correspondences. Furthermore, we introduce a contrastive attention regularization strategy, which leverages category-level information in a contrastive learning framework to promote class-consistent attention distributions. Extensive experiments on four benchmark datasets demonstrate that PATrans attains significant improvements over existing state-of-the-art UDA methods (e.g., 89.2% on the VisDA-2017). Code is available at: https://github.com/YSY145/PATrans
无监督域自适应(Unsupervised Domain Adaptation, UDA)旨在通过学习域不变表示,将知识从标记的源域转移到未标记的目标域。受视觉变形器(Vision transformer, ViT)最近成功的启发,一些UDA方法采用ViT架构来利用细粒度的补丁级表示,这些表示统一为基于变形器的域自适应(TransDA),独立于基于cnn的域自适应。然而,我们在TransDA中有一个关键的观察:由于固有的领域转移,来自不同语义类别的补丁(token)跨领域可能表现出异常高的相似性,这可能会误导自注意机制并降低自适应性能。为了解决这个问题,我们提出了一种新的补丁自适应变压器(PATrans),它首先识别相似异常的补丁,然后自适应地抑制它们对域对齐的负面影响,即标记校准。具体来说,我们引入了一种补丁自适应注意(PAA)机制来取代标准的自注意机制,该机制由一个权重共享的三分支混合注意机制和一个补丁级域鉴别器组成。混合注意融合了自注意和交叉注意,增强了域内特征建模和域间相似度估计。同时,补丁级域鉴别器量化每个补丁的异常概率,实现动态加权,以减轻不可靠的补丁对应的影响。此外,我们引入了一种对比注意正则化策略,该策略利用对比学习框架中的类别级信息来促进类别一致的注意分布。在四个基准数据集上进行的大量实验表明,PATrans比现有的最先进的UDA方法取得了显著的改进(例如,在VisDA-2017上取得了89.2%的改进)。代码可从https://github.com/YSY145/PATrans获得。
{"title":"Token Calibration for Transformer-Based Domain Adaptation","authors":"Xiaowei Fu;Shiyu Ye;Chenxu Zhang;Fuxiang Huang;Xin Xu;Lei Zhang","doi":"10.1109/TIP.2025.3647367","DOIUrl":"10.1109/TIP.2025.3647367","url":null,"abstract":"Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain by learning domain-invariant representations. Motivated by the recent success of Vision Transformers (ViTs), several UDA approaches have adopted ViT architectures to exploit fine-grained patch-level representations, which are unified as <italic>Trans</i>former-based <inline-formula> <tex-math>$D$ </tex-math></inline-formula>omain <inline-formula> <tex-math>$A$ </tex-math></inline-formula>daptation (TransDA) independent of CNN-based. However, we have a key observation in TransDA: due to inherent domain shifts, patches (tokens) from different semantic categories across domains may exhibit abnormally high similarities, which can mislead the self-attention mechanism and degrade adaptation performance. To solve that, we propose a novel <inline-formula> <tex-math>$P$ </tex-math></inline-formula>atch-<inline-formula> <tex-math>$A$ </tex-math></inline-formula>daptation <italic>Trans</i>former (PATrans), which first <italic>identifies</i> similarity-anomalous patches and then adaptively <italic>suppresses</i> their negative impact to domain alignment, i.e. <italic>token calibration</i>. Specifically, we introduce a <inline-formula> <tex-math>$P$ </tex-math></inline-formula>atch-<inline-formula> <tex-math>$A$ </tex-math></inline-formula>daptation <inline-formula> <tex-math>$A$ </tex-math></inline-formula>ttention (<italic>PAA</i>) mechanism to replace the standard self-attention mechanism, which consists of a weight-shared triple-branch mixed attention mechanism and a patch-level domain discriminator. The mixed attention integrates self-attention and cross-attention to enhance intra-domain feature modeling and inter-domain similarity estimation. Meanwhile, the patch-level domain discriminator quantifies the anomaly probability of each patch, enabling dynamic reweighting to mitigate the impact of unreliable patch correspondences. Furthermore, we introduce a contrastive attention regularization strategy, which leverages category-level information in a contrastive learning framework to promote class-consistent attention distributions. Extensive experiments on four benchmark datasets demonstrate that PATrans attains significant improvements over existing state-of-the-art UDA methods (e.g., 89.2% on the VisDA-2017). Code is available at: <uri>https://github.com/YSY145/PATrans</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"57-68"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coupled Diffusion Posterior Sampling for Unsupervised Hyperspectral and Multispectral Images Fusion 无监督高光谱与多光谱图像融合的耦合扩散后验采样。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2025.3647207
Yang Xu;Jian Zhu;Danfeng Hong;Zhihui Wei;Zebin Wu
Hyperspectral images (HSIs) and multispectral images (MSIs) fusion is a hot topic in the remote sensing society. A high-resolution HSI (HR-HSI) can be obtained by fusing a low-resolution HSI (LR-HSI) and a high-resolution MSI (HR-MSI) or RGB image. However, most deep learning-based methods require a large amount of HR-HSIs for supervised training, which is very rare in practice. In this paper, we propose a coupled diffusion posterior sampling (CDPS) method for HSI and MSI fusion in which the HR-HSIs are no longer required in the training process. Because the LR-HSI contains the spectral information and HR-MSI contains the spatial information of the captured scene, we design an unsupervised strategy that learns the required diffusion priors directly and solely from the input test image pair (the LR-HSI and HR-MSI themselves). Then, a coupled diffusion posterior sampling method is proposed to introduce the two priors in the diffusion posterior sampling which leverages the observed LR-HSI and HR-MSI as fidelity terms. Experimental results demonstrate that the proposed method outperforms other state-of-the-art unsupervised HSI and MSI fusion methods. Additionally, this method utilizes smaller networks that are simpler and easier to train without other data.
高光谱图像与多光谱图像的融合是遥感领域的研究热点。高分辨率HSI (HR-HSI)是由低分辨率HSI (LR-HSI)和高分辨率MSI (HR-MSI)或RGB图像融合而成的。然而,大多数基于深度学习的方法需要大量的hr - hsi进行监督训练,这在实践中是非常罕见的。在本文中,我们提出了一种用于HSI和MSI融合的耦合扩散后验采样(CDPS)方法,该方法在训练过程中不再需要hr -HSI。由于LR-HSI包含光谱信息,HR-MSI包含捕获场景的空间信息,因此我们设计了一种无监督策略,该策略直接且仅从输入测试图像对(LR-HSI和HR-MSI本身)中学习所需的扩散先验。然后,提出了一种耦合扩散后验抽样方法,利用观测到的LR-HSI和HR-MSI作为保真度项,在扩散后验抽样中引入两个先验。实验结果表明,该方法优于其他无监督HSI和MSI融合方法。此外,这种方法利用更小的网络,更简单,更容易训练,不需要其他数据。
{"title":"Coupled Diffusion Posterior Sampling for Unsupervised Hyperspectral and Multispectral Images Fusion","authors":"Yang Xu;Jian Zhu;Danfeng Hong;Zhihui Wei;Zebin Wu","doi":"10.1109/TIP.2025.3647207","DOIUrl":"10.1109/TIP.2025.3647207","url":null,"abstract":"Hyperspectral images (HSIs) and multispectral images (MSIs) fusion is a hot topic in the remote sensing society. A high-resolution HSI (HR-HSI) can be obtained by fusing a low-resolution HSI (LR-HSI) and a high-resolution MSI (HR-MSI) or RGB image. However, most deep learning-based methods require a large amount of HR-HSIs for supervised training, which is very rare in practice. In this paper, we propose a coupled diffusion posterior sampling (CDPS) method for HSI and MSI fusion in which the HR-HSIs are no longer required in the training process. Because the LR-HSI contains the spectral information and HR-MSI contains the spatial information of the captured scene, we design an unsupervised strategy that learns the required diffusion priors directly and solely from the input test image pair (the LR-HSI and HR-MSI themselves). Then, a coupled diffusion posterior sampling method is proposed to introduce the two priors in the diffusion posterior sampling which leverages the observed LR-HSI and HR-MSI as fidelity terms. Experimental results demonstrate that the proposed method outperforms other state-of-the-art unsupervised HSI and MSI fusion methods. Additionally, this method utilizes smaller networks that are simpler and easier to train without other data.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"69-84"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffEraser: Generalized Text Erasure Based on Latent Diffusion Prior. 基于潜在扩散先验的广义文本擦除。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3659329
Zhihao Chen, Yongqi Chen, Changsheng Chen, Shunquan Tan, Jiwu Huang

Text removal is an important task in processing both scene and document images. However, existing scene text removal (STR) methods are primarily focus on scene text images. The STR models (trained by scene text images) perform poorly on document images with dense, complex textured backgrounds. We discover that the limitations of existing methods can be attributed to the difficuties in background features estimation in the regions to be erased, which is based on the knowledge from neighboring regions in the input images and priors learned from the training data. The background features estimation performance degrades under the cross-domain scenarios, and compromises the quality of STR results. To address these issues, we introduce DiffEraser, a novel text removal framework that leverages prior knowledge from the Latent Diffusion Model (LDM) for removing text in both scene and document images. Our DiffEraser incorporates two key innovations to fully exploit the prior knowledge of LDM. First, we replace the conventional Variational Auto-Encoders (VAE) encoder with a Diffusion-Prior (DP) encoder, aiming to integrate the heterogeneous information from the LDM prior knowledge in latent space with the multi-level encoded features of the input image. Second, we introduce a Latent-Fusion (LF) decoder that integrates the heterogeneous features from both the LDM and DP encoders to generate high-quality text-erased results. To evaluate the generalization performance of our DiffEraser, we focus on the cross-domain protocols and construct a document image dataset, NPID295, which contains 295 types of passports and identity cards. Notably, when trained on a scene text dataset, DiffEraser significantly outperforms existing STR methods in the challenging NPID295 dataset. The resources of this work will be available online upon acceptance.

文本去除是场景图像和文档图像处理中的重要任务。然而,现有的场景文本去除方法主要针对场景文本图像。STR模型(由场景文本图像训练)在具有密集、复杂纹理背景的文档图像上表现不佳。我们发现现有方法的局限性可以归结为难以估计待擦除区域的背景特征,这是基于输入图像中邻近区域的知识和从训练数据中学习到的先验。在跨域场景下,背景特征估计性能下降,影响了STR结果的质量。为了解决这些问题,我们引入了一个新的文本删除框架DiffEraser,它利用潜在扩散模型(LDM)的先验知识来删除场景和文档图像中的文本。我们的DiffEraser集成了两个关键的创新,以充分利用LDM的先验知识。首先,我们用扩散先验(DP)编码器取代了传统的变分自编码器(VAE)编码器,旨在将潜在空间LDM先验知识中的异构信息与输入图像的多层次编码特征相结合。其次,我们引入了一种潜在融合(LF)解码器,该解码器集成了LDM和DP编码器的异构特征,以生成高质量的文本擦除结果。为了评估我们的diffaser的泛化性能,我们专注于跨域协议,并构建了一个文档图像数据集NPID295,其中包含295种类型的护照和身份证。值得注意的是,当在场景文本数据集上进行训练时,DiffEraser在具有挑战性的NPID295数据集上的表现明显优于现有的STR方法。这项工作的资源将在接受后在线提供。
{"title":"DiffEraser: Generalized Text Erasure Based on Latent Diffusion Prior.","authors":"Zhihao Chen, Yongqi Chen, Changsheng Chen, Shunquan Tan, Jiwu Huang","doi":"10.1109/TIP.2026.3659329","DOIUrl":"10.1109/TIP.2026.3659329","url":null,"abstract":"<p><p>Text removal is an important task in processing both scene and document images. However, existing scene text removal (STR) methods are primarily focus on scene text images. The STR models (trained by scene text images) perform poorly on document images with dense, complex textured backgrounds. We discover that the limitations of existing methods can be attributed to the difficuties in background features estimation in the regions to be erased, which is based on the knowledge from neighboring regions in the input images and priors learned from the training data. The background features estimation performance degrades under the cross-domain scenarios, and compromises the quality of STR results. To address these issues, we introduce DiffEraser, a novel text removal framework that leverages prior knowledge from the Latent Diffusion Model (LDM) for removing text in both scene and document images. Our DiffEraser incorporates two key innovations to fully exploit the prior knowledge of LDM. First, we replace the conventional Variational Auto-Encoders (VAE) encoder with a Diffusion-Prior (DP) encoder, aiming to integrate the heterogeneous information from the LDM prior knowledge in latent space with the multi-level encoded features of the input image. Second, we introduce a Latent-Fusion (LF) decoder that integrates the heterogeneous features from both the LDM and DP encoders to generate high-quality text-erased results. To evaluate the generalization performance of our DiffEraser, we focus on the cross-domain protocols and construct a document image dataset, NPID295, which contains 295 types of passports and identity cards. Notably, when trained on a scene text dataset, DiffEraser significantly outperforms existing STR methods in the challenging NPID295 dataset. The resources of this work will be available online upon acceptance.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"2138-2151"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Fidelity Seismic Super-Resolution Using Prior-Informed Deep Learning With 3D Awareness. 高保真地震超分辨率使用先验信息深度学习与3D感知。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3663926
Jintao Li, Xinming Wu, Xianwen Zhang, Xin Du, Xiaoming Sun, Bao Deng, Guangyu Wang

The limitations of seismic vertical resolution pose significant challenges for the identification of thin beds. Improving the vertical resolution of seismic data using deep learning methods often encounters challenges related to unrealistic outputs and limited generalization. To address these challenges, we propose a novel framework that improves the fidelity and generalization of seismic super-resolution. Our approach begins with the generation of realistic synthetic training data that aligns with the structural and amplitude characteristics of field surveys. We then introduce an enhanced 2D network with 3D awareness, which builds on the 2D Swin-Transformer and 3D convolution blocks to effectively capture 3D spatial features while maintaining computational efficiency. This network addresses the limitations of traditional 2D approaches by reducing stitching artifacts and improving spatial consistency. Finally, we develop a prior-informed fine-tuning strategy using field data without the need for labels, which incorporates a self-supervised data consistency loss and a spectral matching loss based on prior knowledge. This strategy ensures that the super-resolution results preserve the original low frequency information while yielding a spectral distribution as expected. Experiments on multiple field datasets demonstrate the robustness and generalization capability of our method, making it a practical solution for seismic resolution enhancement in diverse field datasets.

地震垂直分辨率的局限性给薄层的识别带来了巨大的挑战。使用深度学习方法提高地震数据的垂直分辨率经常遇到与不切实际的输出和有限的泛化相关的挑战。为了解决这些挑战,我们提出了一个新的框架,以提高地震超分辨率的保真度和泛化。我们的方法从生成与实地调查的结构和幅度特征相一致的现实合成训练数据开始。然后,我们引入了一个具有3D感知的增强2D网络,该网络建立在2D swing - transformer和3D卷积块的基础上,在保持计算效率的同时有效地捕获3D空间特征。该网络通过减少拼接伪影和提高空间一致性来解决传统二维方法的局限性。最后,我们开发了一种预先通知的微调策略,使用现场数据而不需要标签,该策略结合了自监督数据一致性损失和基于先验知识的频谱匹配损失。该策略确保了超分辨率结果保留了原始的低频信息,同时产生了预期的频谱分布。在多个现场数据集上的实验证明了该方法的鲁棒性和泛化能力,使其成为提高不同现场数据集地震分辨率的实用解决方案。
{"title":"High-Fidelity Seismic Super-Resolution Using Prior-Informed Deep Learning With 3D Awareness.","authors":"Jintao Li, Xinming Wu, Xianwen Zhang, Xin Du, Xiaoming Sun, Bao Deng, Guangyu Wang","doi":"10.1109/TIP.2026.3663926","DOIUrl":"10.1109/TIP.2026.3663926","url":null,"abstract":"<p><p>The limitations of seismic vertical resolution pose significant challenges for the identification of thin beds. Improving the vertical resolution of seismic data using deep learning methods often encounters challenges related to unrealistic outputs and limited generalization. To address these challenges, we propose a novel framework that improves the fidelity and generalization of seismic super-resolution. Our approach begins with the generation of realistic synthetic training data that aligns with the structural and amplitude characteristics of field surveys. We then introduce an enhanced 2D network with 3D awareness, which builds on the 2D Swin-Transformer and 3D convolution blocks to effectively capture 3D spatial features while maintaining computational efficiency. This network addresses the limitations of traditional 2D approaches by reducing stitching artifacts and improving spatial consistency. Finally, we develop a prior-informed fine-tuning strategy using field data without the need for labels, which incorporates a self-supervised data consistency loss and a spectral matching loss based on prior knowledge. This strategy ensures that the super-resolution results preserve the original low frequency information while yielding a spectral distribution as expected. Experiments on multiple field datasets demonstrate the robustness and generalization capability of our method, making it a practical solution for seismic resolution enhancement in diverse field datasets.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"2152-2166"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Token-Level Prompt Mixture With Parameter-Free Routing for Federated Domain Generalization. 联邦域泛化中无参数路由的令牌级提示混合。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3652431
Shuai Gong, Chaoran Cui, Xiaolin Dong, Xiushan Nie, Lei Zhu, Xiaojun Chang

Federated Domain Generalization (FedDG) aims to train a globally generalizable model on data from decentralized, heterogeneous clients. While recent work has adapted vision-language models for FedDG using prompt learning, the prevailing "one-prompt-fits-all" paradigm struggles with sample diversity, causing a marked performance decline on personalized samples. The Mixture of Experts (MoE) architecture offers a promising solution for specialization. However, existing MoE-based prompt learning methods suffer from two key limitations: coarse image-level expert assignment and high communication costs from parameterized routers. To address these limitations, we propose TRIP, a Token-level pRompt mIxture with Parameter-free routing framework for FedDG. TRIP treats prompts as multiple experts, and assigns individual tokens within an image to distinct experts, facilitating the capture of fine-grained visual patterns. To ensure communication efficiency, TRIP introduces a parameter-free routing mechanism based on capacity-aware clustering and Optimal Transport (OT). First, tokens are grouped into capacity-aware clusters to ensure balanced workloads. These clusters are then assigned to experts via OT, stabilized by mapping cluster centroids to static, non-learnable keys. The final instance-specific prompt is synthesized by aggregating experts, weighted by the number of tokens assigned to each. Extensive experiments across four benchmarks demonstrate that TRIP achieves optimal generalization results, with communicating as few as 1K parameters. Our code is available at https://github.com/GongShuai8210/TRIP.

域泛化(FedDG)的目标是在分散的异构客户端数据上训练一个全局可泛化的模型。虽然最近的工作已经为使用提示学习的FedDG调整了视觉语言模型,但流行的“一个提示适合所有”范式与样本多样性作斗争,导致个性化样本的性能明显下降。专家混合(MoE)体系结构为专门化提供了一个很有前途的解决方案。然而,现有的基于moe的提示学习方法存在两个关键的局限性:粗糙的图像级专家分配和参数化路由器的高通信成本。为了解决这些限制,我们提出了TRIP,这是一个用于FedDG的令牌级提示混合无参数路由框架。TRIP将提示视为多个专家,并将图像中的单个令牌分配给不同的专家,从而促进了细粒度视觉模式的捕获。为了保证通信效率,TRIP引入了一种基于容量感知集群和最优传输(OT)的无参数路由机制。首先,将令牌分组到容量感知集群中,以确保平衡工作负载。然后通过OT将这些聚类分配给专家,通过将聚类质心映射到静态的、不可学习的键来稳定聚类。最终的特定于实例的提示由聚合专家合成,并根据分配给每个专家的令牌数量进行加权。在四个基准测试中进行的广泛实验表明,TRIP在通信参数少至1K的情况下实现了最佳的泛化结果。我们的代码可在https://github.com/GongShuai8210/TRIP上获得。
{"title":"Token-Level Prompt Mixture With Parameter-Free Routing for Federated Domain Generalization.","authors":"Shuai Gong, Chaoran Cui, Xiaolin Dong, Xiushan Nie, Lei Zhu, Xiaojun Chang","doi":"10.1109/TIP.2026.3652431","DOIUrl":"10.1109/TIP.2026.3652431","url":null,"abstract":"<p><p>Federated Domain Generalization (FedDG) aims to train a globally generalizable model on data from decentralized, heterogeneous clients. While recent work has adapted vision-language models for FedDG using prompt learning, the prevailing \"one-prompt-fits-all\" paradigm struggles with sample diversity, causing a marked performance decline on personalized samples. The Mixture of Experts (MoE) architecture offers a promising solution for specialization. However, existing MoE-based prompt learning methods suffer from two key limitations: coarse image-level expert assignment and high communication costs from parameterized routers. To address these limitations, we propose TRIP, a Token-level pRompt mIxture with Parameter-free routing framework for FedDG. TRIP treats prompts as multiple experts, and assigns individual tokens within an image to distinct experts, facilitating the capture of fine-grained visual patterns. To ensure communication efficiency, TRIP introduces a parameter-free routing mechanism based on capacity-aware clustering and Optimal Transport (OT). First, tokens are grouped into capacity-aware clusters to ensure balanced workloads. These clusters are then assigned to experts via OT, stabilized by mapping cluster centroids to static, non-learnable keys. The final instance-specific prompt is synthesized by aggregating experts, weighted by the number of tokens assigned to each. Extensive experiments across four benchmarks demonstrate that TRIP achieves optimal generalization results, with communicating as few as 1K parameters. Our code is available at https://github.com/GongShuai8210/TRIP.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"656-669"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal Component Maximization: A Novel Method for SAR Image Recovery From Raw Data Without System Parameters. 主成分最大化:一种无系统参数的原始数据SAR图像恢复新方法。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3657165
Huizhang Yang, Liyuan Chen, Shao-Shan Zuo, Zhong Liu, Jian Yang

Synthetic Aperture Radar (SAR) imaging relies on using focusing algorithms to transform raw measurement data into radar images. These algorithms require knowledge of SAR system parameters, such as wavelength, center slant range, fast time sampling rate, pulse repetition interval, waveform, and platform speed. However, in non-cooperative scenarios or when metadata is corrupted, these parameters are unavailable, rendering traditional algorithms ineffective. To address this challenge, this article presents a novel parameter-free method for recovering SAR images from raw data without the requirement of any SAR system parameters. Firstly, we introduce an approximated matched filtering model that leverages the shift-invariance properties of SAR echoes, enabling image formation via convolving the raw data with an unknown reference echo. Secondly, we develop a Principal Component Maximization (PCM) method that exploits the low-dimensional structure of SAR signals to estimate the reference echo. The PCM method employs a three-stage procedure: 1) segment raw data into blocks; 2) normalize the energy of each block; and 3) maximize the principal component's energy across all blocks, enabling robust estimation of the reference echo under non-stationary clutter. Experimental results on various SAR datasets demonstrate that our method can effectively recover SAR images from raw data without any system parameters. To facilitate reproducibility, the Matlab program is available at https://github.com/huizhangyang/pcm.

合成孔径雷达(SAR)成像依赖于使用聚焦算法将原始测量数据转换为雷达图像。这些算法需要了解SAR系统参数,如波长、中心倾斜范围、快速采样率、脉冲重复间隔、波形和平台速度。然而,在非合作场景或元数据损坏时,这些参数不可用,使传统算法失效。为了解决这一问题,本文提出了一种不需要任何SAR系统参数就能从原始数据中恢复SAR图像的无参数方法。首先,我们引入了一种近似匹配滤波模型,该模型利用SAR回波的平移不变性特性,通过将原始数据与未知参考回波进行卷积形成图像。其次,提出了一种利用SAR信号低维结构估计参考回波的主成分最大化方法。PCM方法采用三个阶段的过程:1)将原始数据分割成块,2)规范化每个块的能量,3)最大化所有块的主成分能量,从而在非平稳杂波下实现参考回波的鲁棒估计。在各种SAR数据集上的实验结果表明,该方法可以有效地从原始数据中恢复SAR图像,而不需要任何系统参数。为了便于再现,matlab程序可在https://github.com/huizhangyang/pcm上获得。
{"title":"Principal Component Maximization: A Novel Method for SAR Image Recovery From Raw Data Without System Parameters.","authors":"Huizhang Yang, Liyuan Chen, Shao-Shan Zuo, Zhong Liu, Jian Yang","doi":"10.1109/TIP.2026.3657165","DOIUrl":"10.1109/TIP.2026.3657165","url":null,"abstract":"<p><p>Synthetic Aperture Radar (SAR) imaging relies on using focusing algorithms to transform raw measurement data into radar images. These algorithms require knowledge of SAR system parameters, such as wavelength, center slant range, fast time sampling rate, pulse repetition interval, waveform, and platform speed. However, in non-cooperative scenarios or when metadata is corrupted, these parameters are unavailable, rendering traditional algorithms ineffective. To address this challenge, this article presents a novel parameter-free method for recovering SAR images from raw data without the requirement of any SAR system parameters. Firstly, we introduce an approximated matched filtering model that leverages the shift-invariance properties of SAR echoes, enabling image formation via convolving the raw data with an unknown reference echo. Secondly, we develop a Principal Component Maximization (PCM) method that exploits the low-dimensional structure of SAR signals to estimate the reference echo. The PCM method employs a three-stage procedure: 1) segment raw data into blocks; 2) normalize the energy of each block; and 3) maximize the principal component's energy across all blocks, enabling robust estimation of the reference echo under non-stationary clutter. Experimental results on various SAR datasets demonstrate that our method can effectively recover SAR images from raw data without any system parameters. To facilitate reproducibility, the Matlab program is available at https://github.com/huizhangyang/pcm.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1231-1245"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1