首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
UGAE: Unified Geometry and Attribute Enhancement for G-PCC Compressed Point Clouds. UGAE: G-PCC压缩点云的统一几何和属性增强。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654348
Pan Zhao, Hui Yuan, Chongzhen Tian, Tian Guo, Raouf Hamzaoui, Zhigeng Pan

Lossy compression of point clouds reduces storage and transmission costs; however, it inevitably leads to irreversible distortion in geometry structure and attribute information. To address these issues, we propose a unified geometry and attribute enhancement (UGAE) framework, which consists of three core components: post-geometry enhancement (PoGE), pre-attribute enhancement (PAE), and post-attribute enhancement (PoAE). In PoGE, a Transformer-based sparse convolutional U-Net is used to reconstruct the geometry structure with high precision by predicting voxel occupancy probabilities. Building on the refined geometry structure, PAE introduces an innovative enhanced geometry-guided recoloring strategy, which uses a detail-aware K-Nearest Neighbors (DA-KNN) method to achieve accurate recoloring and effectively preserve high-frequency details before attribute compression. Finally, at the decoder side, PoAE uses an attribute residual prediction network with a weighted mean squared error (W-MSE) loss to enhance the quality of high-frequency regions while maintaining the fidelity of low-frequency regions. UGAE significantly outperformed existing methods on three benchmark datasets: 8iVFB, Owlii, and MVUB. Compared to the latest G-PCC test model (TMC13v29), in terms of total bitrate setting, UGAE achieved an average BD-PSNR gain of 9.98 dB and -90.54% BD-bitrate for geometry under the D1 metric, as well as a 3.34 dB BD-PSNR improvement with -55.53% BD-bitrate for attributes. Additionally, it improved perceptual quality significantly. Our source code will be released on GitHub at: https://github.com/yuanhui0325/UGAE.

点云的有损压缩降低了存储和传输成本;然而,它不可避免地导致几何结构和属性信息的不可逆畸变。为了解决这些问题,我们提出了一个统一的几何和属性增强(UGAE)框架,该框架由三个核心组件组成:后几何增强(PoGE)、前属性增强(PAE)和后属性增强(PoAE)。在PoGE中,利用基于transformer的稀疏卷积U-Net,通过预测体素占用概率,高精度地重建几何结构。在精细几何结构的基础上,PAE引入了一种创新的增强几何引导重着色策略,该策略使用细节感知的k -近邻(DA-KNN)方法来实现精确的重着色,并在属性压缩之前有效地保留高频细节。最后,在解码器端,PoAE使用加权均方误差(W-MSE)损失的属性残差预测网络来提高高频区域的质量,同时保持低频区域的保真度。UGAE在三个基准数据集(8iVFB、Owlii和MVUB)上的性能明显优于现有方法。与最新的G-PCC测试模型(TMC13v29)相比,在总比特率设置方面,UGAE在D1度量下实现了9.98 dB的平均BD-PSNR增益和-90.54%的几何图形bd -比特率,以及3.34 dB的BD-PSNR改进和-55.53%的属性bd -比特率。此外,它还显著提高了感知质量。我们的源代码将在GitHub上发布:https://github.com/yuanhui0325/UGAE。
{"title":"UGAE: Unified Geometry and Attribute Enhancement for G-PCC Compressed Point Clouds.","authors":"Pan Zhao, Hui Yuan, Chongzhen Tian, Tian Guo, Raouf Hamzaoui, Zhigeng Pan","doi":"10.1109/TIP.2026.3654348","DOIUrl":"10.1109/TIP.2026.3654348","url":null,"abstract":"<p><p>Lossy compression of point clouds reduces storage and transmission costs; however, it inevitably leads to irreversible distortion in geometry structure and attribute information. To address these issues, we propose a unified geometry and attribute enhancement (UGAE) framework, which consists of three core components: post-geometry enhancement (PoGE), pre-attribute enhancement (PAE), and post-attribute enhancement (PoAE). In PoGE, a Transformer-based sparse convolutional U-Net is used to reconstruct the geometry structure with high precision by predicting voxel occupancy probabilities. Building on the refined geometry structure, PAE introduces an innovative enhanced geometry-guided recoloring strategy, which uses a detail-aware K-Nearest Neighbors (DA-KNN) method to achieve accurate recoloring and effectively preserve high-frequency details before attribute compression. Finally, at the decoder side, PoAE uses an attribute residual prediction network with a weighted mean squared error (W-MSE) loss to enhance the quality of high-frequency regions while maintaining the fidelity of low-frequency regions. UGAE significantly outperformed existing methods on three benchmark datasets: 8iVFB, Owlii, and MVUB. Compared to the latest G-PCC test model (TMC13v29), in terms of total bitrate setting, UGAE achieved an average BD-PSNR gain of 9.98 dB and -90.54% BD-bitrate for geometry under the D1 metric, as well as a 3.34 dB BD-PSNR improvement with -55.53% BD-bitrate for attributes. Additionally, it improved perceptual quality significantly. Our source code will be released on GitHub at: https://github.com/yuanhui0325/UGAE.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"888-903"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DVG-Diffusion: Dual-View-Guided Diffusion Model for CT Reconstruction From X-Rays. dvg扩散:x射线CT重建的双视图引导扩散模型。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3655171
Xing Xie, Jiawei Liu, Huijie Fan, Zhi Han, Yandong Tang, Liangqiong Qu

Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented. The model and code are available at https://github.com/xiexing0916/DVG-Diffusion.

使用端到端深度学习网络从少量2D x射线直接重建3D CT体是一项具有挑战性的任务,因为x射线图像仅仅是3D CT体的投影视图。在这项工作中,我们通过引入新的视图合成来促进复杂的2D x射线图像到3D CT的映射,并通过视图引导特征对齐来降低学习难度。具体而言,我们提出了一种双视图引导扩散模型(DVG-Diffusion),该模型将真实输入x射线视图和合成的新x射线视图耦合在一起,共同指导CT重建。首先,一种新颖的视图参数引导编码器捕获与CT空间对齐的x射线的特征。接下来,我们将提取的双视图特征连接起来,作为潜在扩散模型学习和改进CT潜在表示的条件。最后,将CT潜表示解码为像素空间的CT体。通过结合视图参数引导编码和双视图引导CT重建,我们的DVG-Diffusion可以在CT重建的高保真度和感知质量之间实现有效的平衡。实验结果表明,我们的方法优于最先进的方法。在实验的基础上,对视图和重构进行了综合分析和讨论。该模型和代码可在https://github.com/xiexing0916/DVG-Diffusion上获得。
{"title":"DVG-Diffusion: Dual-View-Guided Diffusion Model for CT Reconstruction From X-Rays.","authors":"Xing Xie, Jiawei Liu, Huijie Fan, Zhi Han, Yandong Tang, Liangqiong Qu","doi":"10.1109/TIP.2026.3655171","DOIUrl":"10.1109/TIP.2026.3655171","url":null,"abstract":"<p><p>Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented. The model and code are available at https://github.com/xiexing0916/DVG-Diffusion.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1158-1173"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LaCon: Late-Constraint Controllable Visual Generation. LaCon:后约束可控视觉生成。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654412
Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Xin Luo, Dong Liu

Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process of diffusion models, previous studies normally adopt extra modules to integrate condition signals by manipulating the intermediate features of the noise predictors, where they often fail in conditions not seen in the training. Although subsequent studies are motivated to handle multi-condition control, they are mostly resource-consuming to implement, where more generalizable and efficient solutions are expected for controllable visual generation. In this paper, we present a late-constraint controllable visual generation method, namely LaCon, which enables generalization across various modalities and granularities for each single-condition control. LaCon establishes an alignment between the external condition and specific diffusion timesteps, and guides diffusion models to produce conditional results based on this built alignment. Experimental results on prevailing benchmark datasets illustrate the promising performance and generalization capability of LaCon under various conditions and settings. Ablation studies analyze different components in LaCon, illustrating its great potential to offer flexible condition controls for different backbones.

扩散模型在生成照片真实感和创造性图像方面表现出令人印象深刻的能力。为了给扩散模型的生成过程提供更多的可控性,以往的研究通常采用额外的模块,通过操纵噪声预测器的中间特征来整合条件信号,而在训练中没有看到的条件下,它们往往会失败。虽然后续研究的动机是处理多条件控制,但它们大多是资源消耗的实现,其中更通用和有效的解决方案,期望可控的视觉生成。在本文中,我们提出了一种后约束可控视觉生成方法,即LaCon,它可以实现对每个单条件控制的各种模态和粒度的泛化。LaCon在外部条件和特定扩散时间步之间建立了一种一致性,并指导扩散模型基于这种构建的一致性产生条件结果。在主流基准数据集上的实验结果表明,LaCon在各种条件和设置下具有良好的性能和泛化能力。消融研究分析了LaCon的不同成分,说明了其为不同骨干提供灵活状态控制的巨大潜力。
{"title":"LaCon: Late-Constraint Controllable Visual Generation.","authors":"Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Xin Luo, Dong Liu","doi":"10.1109/TIP.2026.3654412","DOIUrl":"10.1109/TIP.2026.3654412","url":null,"abstract":"<p><p>Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process of diffusion models, previous studies normally adopt extra modules to integrate condition signals by manipulating the intermediate features of the noise predictors, where they often fail in conditions not seen in the training. Although subsequent studies are motivated to handle multi-condition control, they are mostly resource-consuming to implement, where more generalizable and efficient solutions are expected for controllable visual generation. In this paper, we present a late-constraint controllable visual generation method, namely LaCon, which enables generalization across various modalities and granularities for each single-condition control. LaCon establishes an alignment between the external condition and specific diffusion timesteps, and guides diffusion models to produce conditional results based on this built alignment. Experimental results on prevailing benchmark datasets illustrate the promising performance and generalization capability of LaCon under various conditions and settings. Ablation studies analyze different components in LaCon, illustrating its great potential to offer flexible condition controls for different backbones.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1111-1126"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning-Based Joint Geometry and Attribute Up-Sampling for Large-Scale Colored Point Clouds. 基于深度学习的大规模彩色点云联合几何和属性上采样。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3657214
Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong

Colored point cloud comprising geometry and attribute components is one of the mainstream representations enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method, which learns to model both geometry and attribute patterns and leverages the spatial attribute correlation. Firstly, we establish and release a large-scale dataset for colored point cloud up-sampling, named SYSU-PCUD, which has 121 large-scale colored point clouds with diverse geometry and attribute complexities in six categories and four sampling rates. Secondly, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework to up-sample the geometry and attribute jointly. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Thirdly, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarsely up-sampled attributes for each point. Then, we propose an attribute enhancement module to refine the up-sampled attributes and generate high quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU are 33.90 dB, 32.10 dB, 31.10 dB, and 30.39 dB when up-sampling rates are $4times $ , $8times $ , $12times $ , and $16times $ , respectively. Compared to the state-of-the-art schemes, the JGAU achieves an average of 2.32 dB, 2.47 dB, 2.28 dB and 2.11 dB PSNR gains at four up-sampling rates, respectively, which are significant. The code is released with https://github.com/SYSU-Video/JGAU.

由几何和属性组件组成的彩色点云是实现逼真和沉浸式3D应用的主流表示之一。为了生成大规模和更密集的彩色点云,我们提出了一种基于深度学习的联合几何和属性上采样(JGAU)方法,该方法学习建模几何和属性模式,并利用空间属性相关性。首先,建立并发布了大规模彩色点云上采样数据集SYSU-PCUD,该数据集包含6类、4种采样率、121种不同几何和属性复杂度的大规模彩色点云。其次,为了提高点云上采样的质量,提出了一种基于深度学习的JGAU框架,对几何和属性进行联合上采样。它由一个几何上采样网络和一个属性上采样网络组成,后者利用上采样的辅助几何来建模属性的邻域相关性。第三,提出了几何距离加权属性插值(GDWAI)和基于深度学习的属性插值(DLAI)两种粗属性上采样方法,对每个点进行粗属性上采样。然后,我们提出了一个属性增强模块,通过进一步挖掘固有属性和几何模式来细化上采样属性,生成高质量的点云。大量实验表明,当上采样率为4倍、8倍、12倍和16倍时,JGAU的峰值信噪比分别为33.90 dB、32.10 dB、31.10 dB和30.39 dB。与最先进的方案相比,JGAU在四种上采样率下分别实现了2.32 dB、2.47 dB、2.28 dB和2.11 dB的平均PSNR增益,这是非常显著的。
{"title":"Deep Learning-Based Joint Geometry and Attribute Up-Sampling for Large-Scale Colored Point Clouds.","authors":"Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong","doi":"10.1109/TIP.2026.3657214","DOIUrl":"10.1109/TIP.2026.3657214","url":null,"abstract":"<p><p>Colored point cloud comprising geometry and attribute components is one of the mainstream representations enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method, which learns to model both geometry and attribute patterns and leverages the spatial attribute correlation. Firstly, we establish and release a large-scale dataset for colored point cloud up-sampling, named SYSU-PCUD, which has 121 large-scale colored point clouds with diverse geometry and attribute complexities in six categories and four sampling rates. Secondly, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework to up-sample the geometry and attribute jointly. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Thirdly, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarsely up-sampled attributes for each point. Then, we propose an attribute enhancement module to refine the up-sampled attributes and generate high quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU are 33.90 dB, 32.10 dB, 31.10 dB, and 30.39 dB when up-sampling rates are $4times $ , $8times $ , $12times $ , and $16times $ , respectively. Compared to the state-of-the-art schemes, the JGAU achieves an average of 2.32 dB, 2.47 dB, 2.28 dB and 2.11 dB PSNR gains at four up-sampling rates, respectively, which are significant. The code is released with https://github.com/SYSU-Video/JGAU.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1305-1320"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causally-Aware Unsupervised Feature Selection Learning. 因果意识无监督特征选择学习。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654354
Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li

Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.

近年来,无监督特征选择(Unsupervised feature selection, UFS)因其在处理未标记高维数据方面的有效性而备受关注。然而,现有的方法忽略了数据内在的因果机制,导致选择不相关的特征和较差的可解释性。此外,以前基于图的方法在构建相似图时没有考虑到非因果和因果特征的不同影响,从而导致生成图中的假链接。为了解决这些问题,提出了一种新的UFS方法,称为因果感知无监督特征选择学习(CAUSE-FS)。CAUSE-FS引入了一种新的因果正则器,它重新加权样本以平衡每个处理特征的混杂分布。该正则化器随后集成到广义无监督谱回归模型中,以减轻特征和聚类标签之间的虚假关联,从而实现因果特征选择。此外,CAUSE-FS采用因果引导的分层聚类,将不同因果贡献的特征划分为多个粒度。CAUSE-FS通过对不同粒度自适应学习的相似图进行整合,提高了构建融合相似图时因果特征的重要性,以捕获可靠的数据局部结构。大量的实验结果证明了CAUSE-FS优于最先进的方法,并通过特征可视化进一步验证了其可解释性。
{"title":"Causally-Aware Unsupervised Feature Selection Learning.","authors":"Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li","doi":"10.1109/TIP.2026.3654354","DOIUrl":"10.1109/TIP.2026.3654354","url":null,"abstract":"<p><p>Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1011-1024"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Token Calibration for Transformer-Based Domain Adaptation 基于变压器域自适应的令牌校正。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2025.3647367
Xiaowei Fu;Shiyu Ye;Chenxu Zhang;Fuxiang Huang;Xin Xu;Lei Zhang
Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain by learning domain-invariant representations. Motivated by the recent success of Vision Transformers (ViTs), several UDA approaches have adopted ViT architectures to exploit fine-grained patch-level representations, which are unified as Transformer-based $D$ omain $A$ daptation (TransDA) independent of CNN-based. However, we have a key observation in TransDA: due to inherent domain shifts, patches (tokens) from different semantic categories across domains may exhibit abnormally high similarities, which can mislead the self-attention mechanism and degrade adaptation performance. To solve that, we propose a novel $P$ atch- $A$ daptation Transformer (PATrans), which first identifies similarity-anomalous patches and then adaptively suppresses their negative impact to domain alignment, i.e. token calibration. Specifically, we introduce a $P$ atch- $A$ daptation $A$ ttention (PAA) mechanism to replace the standard self-attention mechanism, which consists of a weight-shared triple-branch mixed attention mechanism and a patch-level domain discriminator. The mixed attention integrates self-attention and cross-attention to enhance intra-domain feature modeling and inter-domain similarity estimation. Meanwhile, the patch-level domain discriminator quantifies the anomaly probability of each patch, enabling dynamic reweighting to mitigate the impact of unreliable patch correspondences. Furthermore, we introduce a contrastive attention regularization strategy, which leverages category-level information in a contrastive learning framework to promote class-consistent attention distributions. Extensive experiments on four benchmark datasets demonstrate that PATrans attains significant improvements over existing state-of-the-art UDA methods (e.g., 89.2% on the VisDA-2017). Code is available at: https://github.com/YSY145/PATrans
无监督域自适应(Unsupervised Domain Adaptation, UDA)旨在通过学习域不变表示,将知识从标记的源域转移到未标记的目标域。受视觉变形器(Vision transformer, ViT)最近成功的启发,一些UDA方法采用ViT架构来利用细粒度的补丁级表示,这些表示统一为基于变形器的域自适应(TransDA),独立于基于cnn的域自适应。然而,我们在TransDA中有一个关键的观察:由于固有的领域转移,来自不同语义类别的补丁(token)跨领域可能表现出异常高的相似性,这可能会误导自注意机制并降低自适应性能。为了解决这个问题,我们提出了一种新的补丁自适应变压器(PATrans),它首先识别相似异常的补丁,然后自适应地抑制它们对域对齐的负面影响,即标记校准。具体来说,我们引入了一种补丁自适应注意(PAA)机制来取代标准的自注意机制,该机制由一个权重共享的三分支混合注意机制和一个补丁级域鉴别器组成。混合注意融合了自注意和交叉注意,增强了域内特征建模和域间相似度估计。同时,补丁级域鉴别器量化每个补丁的异常概率,实现动态加权,以减轻不可靠的补丁对应的影响。此外,我们引入了一种对比注意正则化策略,该策略利用对比学习框架中的类别级信息来促进类别一致的注意分布。在四个基准数据集上进行的大量实验表明,PATrans比现有的最先进的UDA方法取得了显著的改进(例如,在VisDA-2017上取得了89.2%的改进)。代码可从https://github.com/YSY145/PATrans获得。
{"title":"Token Calibration for Transformer-Based Domain Adaptation","authors":"Xiaowei Fu;Shiyu Ye;Chenxu Zhang;Fuxiang Huang;Xin Xu;Lei Zhang","doi":"10.1109/TIP.2025.3647367","DOIUrl":"10.1109/TIP.2025.3647367","url":null,"abstract":"Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain by learning domain-invariant representations. Motivated by the recent success of Vision Transformers (ViTs), several UDA approaches have adopted ViT architectures to exploit fine-grained patch-level representations, which are unified as <italic>Trans</i>former-based <inline-formula> <tex-math>$D$ </tex-math></inline-formula>omain <inline-formula> <tex-math>$A$ </tex-math></inline-formula>daptation (TransDA) independent of CNN-based. However, we have a key observation in TransDA: due to inherent domain shifts, patches (tokens) from different semantic categories across domains may exhibit abnormally high similarities, which can mislead the self-attention mechanism and degrade adaptation performance. To solve that, we propose a novel <inline-formula> <tex-math>$P$ </tex-math></inline-formula>atch-<inline-formula> <tex-math>$A$ </tex-math></inline-formula>daptation <italic>Trans</i>former (PATrans), which first <italic>identifies</i> similarity-anomalous patches and then adaptively <italic>suppresses</i> their negative impact to domain alignment, i.e. <italic>token calibration</i>. Specifically, we introduce a <inline-formula> <tex-math>$P$ </tex-math></inline-formula>atch-<inline-formula> <tex-math>$A$ </tex-math></inline-formula>daptation <inline-formula> <tex-math>$A$ </tex-math></inline-formula>ttention (<italic>PAA</i>) mechanism to replace the standard self-attention mechanism, which consists of a weight-shared triple-branch mixed attention mechanism and a patch-level domain discriminator. The mixed attention integrates self-attention and cross-attention to enhance intra-domain feature modeling and inter-domain similarity estimation. Meanwhile, the patch-level domain discriminator quantifies the anomaly probability of each patch, enabling dynamic reweighting to mitigate the impact of unreliable patch correspondences. Furthermore, we introduce a contrastive attention regularization strategy, which leverages category-level information in a contrastive learning framework to promote class-consistent attention distributions. Extensive experiments on four benchmark datasets demonstrate that PATrans attains significant improvements over existing state-of-the-art UDA methods (e.g., 89.2% on the VisDA-2017). Code is available at: <uri>https://github.com/YSY145/PATrans</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"57-68"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coupled Diffusion Posterior Sampling for Unsupervised Hyperspectral and Multispectral Images Fusion 无监督高光谱与多光谱图像融合的耦合扩散后验采样。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2025.3647207
Yang Xu;Jian Zhu;Danfeng Hong;Zhihui Wei;Zebin Wu
Hyperspectral images (HSIs) and multispectral images (MSIs) fusion is a hot topic in the remote sensing society. A high-resolution HSI (HR-HSI) can be obtained by fusing a low-resolution HSI (LR-HSI) and a high-resolution MSI (HR-MSI) or RGB image. However, most deep learning-based methods require a large amount of HR-HSIs for supervised training, which is very rare in practice. In this paper, we propose a coupled diffusion posterior sampling (CDPS) method for HSI and MSI fusion in which the HR-HSIs are no longer required in the training process. Because the LR-HSI contains the spectral information and HR-MSI contains the spatial information of the captured scene, we design an unsupervised strategy that learns the required diffusion priors directly and solely from the input test image pair (the LR-HSI and HR-MSI themselves). Then, a coupled diffusion posterior sampling method is proposed to introduce the two priors in the diffusion posterior sampling which leverages the observed LR-HSI and HR-MSI as fidelity terms. Experimental results demonstrate that the proposed method outperforms other state-of-the-art unsupervised HSI and MSI fusion methods. Additionally, this method utilizes smaller networks that are simpler and easier to train without other data.
高光谱图像与多光谱图像的融合是遥感领域的研究热点。高分辨率HSI (HR-HSI)是由低分辨率HSI (LR-HSI)和高分辨率MSI (HR-MSI)或RGB图像融合而成的。然而,大多数基于深度学习的方法需要大量的hr - hsi进行监督训练,这在实践中是非常罕见的。在本文中,我们提出了一种用于HSI和MSI融合的耦合扩散后验采样(CDPS)方法,该方法在训练过程中不再需要hr -HSI。由于LR-HSI包含光谱信息,HR-MSI包含捕获场景的空间信息,因此我们设计了一种无监督策略,该策略直接且仅从输入测试图像对(LR-HSI和HR-MSI本身)中学习所需的扩散先验。然后,提出了一种耦合扩散后验抽样方法,利用观测到的LR-HSI和HR-MSI作为保真度项,在扩散后验抽样中引入两个先验。实验结果表明,该方法优于其他无监督HSI和MSI融合方法。此外,这种方法利用更小的网络,更简单,更容易训练,不需要其他数据。
{"title":"Coupled Diffusion Posterior Sampling for Unsupervised Hyperspectral and Multispectral Images Fusion","authors":"Yang Xu;Jian Zhu;Danfeng Hong;Zhihui Wei;Zebin Wu","doi":"10.1109/TIP.2025.3647207","DOIUrl":"10.1109/TIP.2025.3647207","url":null,"abstract":"Hyperspectral images (HSIs) and multispectral images (MSIs) fusion is a hot topic in the remote sensing society. A high-resolution HSI (HR-HSI) can be obtained by fusing a low-resolution HSI (LR-HSI) and a high-resolution MSI (HR-MSI) or RGB image. However, most deep learning-based methods require a large amount of HR-HSIs for supervised training, which is very rare in practice. In this paper, we propose a coupled diffusion posterior sampling (CDPS) method for HSI and MSI fusion in which the HR-HSIs are no longer required in the training process. Because the LR-HSI contains the spectral information and HR-MSI contains the spatial information of the captured scene, we design an unsupervised strategy that learns the required diffusion priors directly and solely from the input test image pair (the LR-HSI and HR-MSI themselves). Then, a coupled diffusion posterior sampling method is proposed to introduce the two priors in the diffusion posterior sampling which leverages the observed LR-HSI and HR-MSI as fidelity terms. Experimental results demonstrate that the proposed method outperforms other state-of-the-art unsupervised HSI and MSI fusion methods. Additionally, this method utilizes smaller networks that are simpler and easier to train without other data.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"69-84"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implicit Neural Compression of Point Clouds. 点云的隐式神经压缩。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2025.3648141
Hongning Ruan, Yulin Shao, Qianqian Yang, Liang Zhao, Zhaoyang Zhang, Dusit Niyato

Point clouds have gained prominence across numerous applications due to their ability to accurately represent 3D objects and scenes. However, efficiently compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we propose NeRC ${}^{textbf {3}}$ , a novel point cloud compression framework that leverages implicit neural representations (INRs) to encode both geometry and attributes of dense point clouds. Our approach employs two coordinate-based neural networks: one maps spatial coordinates to voxel occupancy, while the other maps occupied voxels to their attributes, thereby implicitly representing the geometry and attributes of a voxelized point cloud. The encoder quantizes and compresses network parameters alongside auxiliary information required for reconstruction, while the decoder reconstructs the original point cloud by inputting voxel coordinates into the neural networks. Furthermore, we extend our method to dynamic point cloud compression through techniques that reduce temporal redundancy, including a 4D spatio-temporal representation termed 4D-NeRC ${}^{textbf {3}}$ . Experimental results validate the effectiveness of our approach: For static point clouds, NeRC ${}^{textbf {3}}$ outperforms octree-based G-PCC standard and existing INR-based methods. For dynamic point clouds, 4D-NeRC ${}^{textbf {3}}$ achieves superior geometry compression performance compared to the latest G-PCC and V-PCC standards, while matching state-of-the-art learning-based methods. It also demonstrates competitive performance in joint geometry and attribute compression.

点云由于其精确表示3D物体和场景的能力,在许多应用中获得了突出地位。然而,高效压缩非结构化、高精度的点云数据仍然是一个重大挑战。在本文中,我们提出了一种新的点云压缩框架NeRC3,它利用隐式神经表示(INRs)对密集点云的几何和属性进行编码。我们的方法采用了两个基于坐标的神经网络:一个将空间坐标映射到体素占用,而另一个将占用的体素映射到它们的属性,从而隐式地表示体素化点云的几何和属性。编码器量化和压缩网络参数以及重建所需的辅助信息,而解码器通过将体素坐标输入神经网络来重建原始点云。此外,我们通过减少时间冗余的技术将我们的方法扩展到动态点云压缩,包括称为4D- nerc3的4D时空表示。实验结果验证了我们方法的有效性:对于静态点云,NeRC3优于基于octree的G-PCC标准和现有的基于inr的方法。对于动态点云,与最新的G-PCC和V-PCC标准相比,4D-NeRC3实现了卓越的几何压缩性能,同时与最先进的基于学习的方法相匹配。它在关节几何和属性压缩方面也表现出具有竞争力的性能。
{"title":"Implicit Neural Compression of Point Clouds.","authors":"Hongning Ruan, Yulin Shao, Qianqian Yang, Liang Zhao, Zhaoyang Zhang, Dusit Niyato","doi":"10.1109/TIP.2025.3648141","DOIUrl":"10.1109/TIP.2025.3648141","url":null,"abstract":"<p><p>Point clouds have gained prominence across numerous applications due to their ability to accurately represent 3D objects and scenes. However, efficiently compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we propose NeRC ${}^{textbf {3}}$ , a novel point cloud compression framework that leverages implicit neural representations (INRs) to encode both geometry and attributes of dense point clouds. Our approach employs two coordinate-based neural networks: one maps spatial coordinates to voxel occupancy, while the other maps occupied voxels to their attributes, thereby implicitly representing the geometry and attributes of a voxelized point cloud. The encoder quantizes and compresses network parameters alongside auxiliary information required for reconstruction, while the decoder reconstructs the original point cloud by inputting voxel coordinates into the neural networks. Furthermore, we extend our method to dynamic point cloud compression through techniques that reduce temporal redundancy, including a 4D spatio-temporal representation termed 4D-NeRC ${}^{textbf {3}}$ . Experimental results validate the effectiveness of our approach: For static point clouds, NeRC ${}^{textbf {3}}$ outperforms octree-based G-PCC standard and existing INR-based methods. For dynamic point clouds, 4D-NeRC ${}^{textbf {3}}$ achieves superior geometry compression performance compared to the latest G-PCC and V-PCC standards, while matching state-of-the-art learning-based methods. It also demonstrates competitive performance in joint geometry and attribute compression.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"260-275"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Token-Level Prompt Mixture With Parameter-Free Routing for Federated Domain Generalization. 联邦域泛化中无参数路由的令牌级提示混合。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3652431
Shuai Gong, Chaoran Cui, Xiaolin Dong, Xiushan Nie, Lei Zhu, Xiaojun Chang

Federated Domain Generalization (FedDG) aims to train a globally generalizable model on data from decentralized, heterogeneous clients. While recent work has adapted vision-language models for FedDG using prompt learning, the prevailing "one-prompt-fits-all" paradigm struggles with sample diversity, causing a marked performance decline on personalized samples. The Mixture of Experts (MoE) architecture offers a promising solution for specialization. However, existing MoE-based prompt learning methods suffer from two key limitations: coarse image-level expert assignment and high communication costs from parameterized routers. To address these limitations, we propose TRIP, a Token-level pRompt mIxture with Parameter-free routing framework for FedDG. TRIP treats prompts as multiple experts, and assigns individual tokens within an image to distinct experts, facilitating the capture of fine-grained visual patterns. To ensure communication efficiency, TRIP introduces a parameter-free routing mechanism based on capacity-aware clustering and Optimal Transport (OT). First, tokens are grouped into capacity-aware clusters to ensure balanced workloads. These clusters are then assigned to experts via OT, stabilized by mapping cluster centroids to static, non-learnable keys. The final instance-specific prompt is synthesized by aggregating experts, weighted by the number of tokens assigned to each. Extensive experiments across four benchmarks demonstrate that TRIP achieves optimal generalization results, with communicating as few as 1K parameters. Our code is available at https://github.com/GongShuai8210/TRIP.

域泛化(FedDG)的目标是在分散的异构客户端数据上训练一个全局可泛化的模型。虽然最近的工作已经为使用提示学习的FedDG调整了视觉语言模型,但流行的“一个提示适合所有”范式与样本多样性作斗争,导致个性化样本的性能明显下降。专家混合(MoE)体系结构为专门化提供了一个很有前途的解决方案。然而,现有的基于moe的提示学习方法存在两个关键的局限性:粗糙的图像级专家分配和参数化路由器的高通信成本。为了解决这些限制,我们提出了TRIP,这是一个用于FedDG的令牌级提示混合无参数路由框架。TRIP将提示视为多个专家,并将图像中的单个令牌分配给不同的专家,从而促进了细粒度视觉模式的捕获。为了保证通信效率,TRIP引入了一种基于容量感知集群和最优传输(OT)的无参数路由机制。首先,将令牌分组到容量感知集群中,以确保平衡工作负载。然后通过OT将这些聚类分配给专家,通过将聚类质心映射到静态的、不可学习的键来稳定聚类。最终的特定于实例的提示由聚合专家合成,并根据分配给每个专家的令牌数量进行加权。在四个基准测试中进行的广泛实验表明,TRIP在通信参数少至1K的情况下实现了最佳的泛化结果。我们的代码可在https://github.com/GongShuai8210/TRIP上获得。
{"title":"Token-Level Prompt Mixture With Parameter-Free Routing for Federated Domain Generalization.","authors":"Shuai Gong, Chaoran Cui, Xiaolin Dong, Xiushan Nie, Lei Zhu, Xiaojun Chang","doi":"10.1109/TIP.2026.3652431","DOIUrl":"10.1109/TIP.2026.3652431","url":null,"abstract":"<p><p>Federated Domain Generalization (FedDG) aims to train a globally generalizable model on data from decentralized, heterogeneous clients. While recent work has adapted vision-language models for FedDG using prompt learning, the prevailing \"one-prompt-fits-all\" paradigm struggles with sample diversity, causing a marked performance decline on personalized samples. The Mixture of Experts (MoE) architecture offers a promising solution for specialization. However, existing MoE-based prompt learning methods suffer from two key limitations: coarse image-level expert assignment and high communication costs from parameterized routers. To address these limitations, we propose TRIP, a Token-level pRompt mIxture with Parameter-free routing framework for FedDG. TRIP treats prompts as multiple experts, and assigns individual tokens within an image to distinct experts, facilitating the capture of fine-grained visual patterns. To ensure communication efficiency, TRIP introduces a parameter-free routing mechanism based on capacity-aware clustering and Optimal Transport (OT). First, tokens are grouped into capacity-aware clusters to ensure balanced workloads. These clusters are then assigned to experts via OT, stabilized by mapping cluster centroids to static, non-learnable keys. The final instance-specific prompt is synthesized by aggregating experts, weighted by the number of tokens assigned to each. Extensive experiments across four benchmarks demonstrate that TRIP achieves optimal generalization results, with communicating as few as 1K parameters. Our code is available at https://github.com/GongShuai8210/TRIP.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"656-669"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal Component Maximization: A Novel Method for SAR Image Recovery From Raw Data Without System Parameters. 主成分最大化:一种无系统参数的原始数据SAR图像恢复新方法。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3657165
Huizhang Yang, Liyuan Chen, Shao-Shan Zuo, Zhong Liu, Jian Yang

Synthetic Aperture Radar (SAR) imaging relies on using focusing algorithms to transform raw measurement data into radar images. These algorithms require knowledge of SAR system parameters, such as wavelength, center slant range, fast time sampling rate, pulse repetition interval, waveform, and platform speed. However, in non-cooperative scenarios or when metadata is corrupted, these parameters are unavailable, rendering traditional algorithms ineffective. To address this challenge, this article presents a novel parameter-free method for recovering SAR images from raw data without the requirement of any SAR system parameters. Firstly, we introduce an approximated matched filtering model that leverages the shift-invariance properties of SAR echoes, enabling image formation via convolving the raw data with an unknown reference echo. Secondly, we develop a Principal Component Maximization (PCM) method that exploits the low-dimensional structure of SAR signals to estimate the reference echo. The PCM method employs a three-stage procedure: 1) segment raw data into blocks; 2) normalize the energy of each block; and 3) maximize the principal component's energy across all blocks, enabling robust estimation of the reference echo under non-stationary clutter. Experimental results on various SAR datasets demonstrate that our method can effectively recover SAR images from raw data without any system parameters. To facilitate reproducibility, the Matlab program is available at https://github.com/huizhangyang/pcm.

合成孔径雷达(SAR)成像依赖于使用聚焦算法将原始测量数据转换为雷达图像。这些算法需要了解SAR系统参数,如波长、中心倾斜范围、快速采样率、脉冲重复间隔、波形和平台速度。然而,在非合作场景或元数据损坏时,这些参数不可用,使传统算法失效。为了解决这一问题,本文提出了一种不需要任何SAR系统参数就能从原始数据中恢复SAR图像的无参数方法。首先,我们引入了一种近似匹配滤波模型,该模型利用SAR回波的平移不变性特性,通过将原始数据与未知参考回波进行卷积形成图像。其次,提出了一种利用SAR信号低维结构估计参考回波的主成分最大化方法。PCM方法采用三个阶段的过程:1)将原始数据分割成块,2)规范化每个块的能量,3)最大化所有块的主成分能量,从而在非平稳杂波下实现参考回波的鲁棒估计。在各种SAR数据集上的实验结果表明,该方法可以有效地从原始数据中恢复SAR图像,而不需要任何系统参数。为了便于再现,matlab程序可在https://github.com/huizhangyang/pcm上获得。
{"title":"Principal Component Maximization: A Novel Method for SAR Image Recovery From Raw Data Without System Parameters.","authors":"Huizhang Yang, Liyuan Chen, Shao-Shan Zuo, Zhong Liu, Jian Yang","doi":"10.1109/TIP.2026.3657165","DOIUrl":"10.1109/TIP.2026.3657165","url":null,"abstract":"<p><p>Synthetic Aperture Radar (SAR) imaging relies on using focusing algorithms to transform raw measurement data into radar images. These algorithms require knowledge of SAR system parameters, such as wavelength, center slant range, fast time sampling rate, pulse repetition interval, waveform, and platform speed. However, in non-cooperative scenarios or when metadata is corrupted, these parameters are unavailable, rendering traditional algorithms ineffective. To address this challenge, this article presents a novel parameter-free method for recovering SAR images from raw data without the requirement of any SAR system parameters. Firstly, we introduce an approximated matched filtering model that leverages the shift-invariance properties of SAR echoes, enabling image formation via convolving the raw data with an unknown reference echo. Secondly, we develop a Principal Component Maximization (PCM) method that exploits the low-dimensional structure of SAR signals to estimate the reference echo. The PCM method employs a three-stage procedure: 1) segment raw data into blocks; 2) normalize the energy of each block; and 3) maximize the principal component's energy across all blocks, enabling robust estimation of the reference echo under non-stationary clutter. Experimental results on various SAR datasets demonstrate that our method can effectively recover SAR images from raw data without any system parameters. To facilitate reproducibility, the Matlab program is available at https://github.com/huizhangyang/pcm.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1231-1245"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1