首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Foundation Model Empowered Real-Time Video Conference With Semantic Communications. 基础模型支持语义通信的实时视频会议。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3659719
Mingkai Chen, Wenbo Ma, Mujian Zeng, Xiaoming He, Jian Xiong, Lei Wang, Anwer Al-Dulaimi, Shahid Mumtaz

With the development of real-time video conferences, interactive multimedia services have proliferated, leading to a surge in traffic. Interactivity becomes one of the main features on future multimedia services, which brings a new challenge to Computer Vision (CV) for communications. In addition, many directions for CV in video, like recognition, understanding, saliency segmentation, coding, and so on, do not satisfy the demands of the multiple tasks of interactivity without integration. Meanwhile, with the rapid development of the foundation models, we apply task-oriented semantic communications to handle them. Therefore, we propose a novel framework, called Real-Time Video Conference with Foundation Model (RTVCFM), to satisfy the requirement of interactivity in the multimedia service. Firstly, at the transmitter, we perform the causal understanding and spatiotemporal decoupling on interactive videos, with the Video Time-Aware Large Language Model (VTimeLLM), Iterated Integrated Attributions (IIA) and Segment Anything Model 2 (SAM2), to accomplish the video semantic segmentation. Secondly, in the transmission, we propose a two-stage semantic transmission optimization driven by Channel State Information (CSI), which is also suitable for the weights of asymmetric semantic information in real-time video, so that we achieve a low bit rate and high semantic fidelity in the video transmission. Thirdly, at the receiver, RTVCFM provides multidimensional fusion with the whole semantic segmentation by using the Diffusion Model for Foreground Background Fusion (DMFBF), and then we reconstruct the video streams. Finally, the simulation result demonstrates that RTVCFM can achieve a compression ratio as high as 95.6%, while it guarantees high semantic similarity of 98.73% in Multi-Scale Structural Similarity Index Measure (MS-SSIM) and 98.35% in Structural Similarity (SSIM), which shows that the reconstructed video is relatively similar to the original video.

随着实时视频会议的发展,交互式多媒体业务激增,导致业务流量激增。交互性将成为未来多媒体服务的主要特征之一,这对通信领域的计算机视觉技术提出了新的挑战。此外,视频中CV的许多方向,如识别、理解、显著性分割、编码等,如果没有整合,就不能满足交互多任务的需求。同时,随着基础模型的快速发展,我们采用面向任务的语义通信来处理它们。为此,我们提出了一种基于基础模型的实时视频会议(RTVCFM)框架,以满足多媒体业务的交互性需求。首先,在发送端,我们利用视频时间感知大语言模型(VTimeLLM)、迭代集成属性(IIA)和片段任意模型2 (SAM2)对交互视频进行因果理解和时空解耦,完成视频语义分割。其次,在传输中,我们提出了一种由信道状态信息(CSI)驱动的两阶段语义传输优化,该优化同样适用于实时视频中不对称语义信息的权重,从而在视频传输中实现低比特率和高语义保真度。第三,在接收端,RTVCFM利用前景背景融合扩散模型(DMFBF)对视频流进行全语义分割的多维融合,重构视频流;最后,仿真结果表明,RTVCFM可以实现高达95.6%的压缩比,同时保证了多尺度结构相似度指标(MS-SSIM)和结构相似度指标(SSIM)的98.73%和98.35%的高语义相似度,表明重构视频与原始视频相对相似。
{"title":"Foundation Model Empowered Real-Time Video Conference With Semantic Communications.","authors":"Mingkai Chen, Wenbo Ma, Mujian Zeng, Xiaoming He, Jian Xiong, Lei Wang, Anwer Al-Dulaimi, Shahid Mumtaz","doi":"10.1109/TIP.2026.3659719","DOIUrl":"10.1109/TIP.2026.3659719","url":null,"abstract":"<p><p>With the development of real-time video conferences, interactive multimedia services have proliferated, leading to a surge in traffic. Interactivity becomes one of the main features on future multimedia services, which brings a new challenge to Computer Vision (CV) for communications. In addition, many directions for CV in video, like recognition, understanding, saliency segmentation, coding, and so on, do not satisfy the demands of the multiple tasks of interactivity without integration. Meanwhile, with the rapid development of the foundation models, we apply task-oriented semantic communications to handle them. Therefore, we propose a novel framework, called Real-Time Video Conference with Foundation Model (RTVCFM), to satisfy the requirement of interactivity in the multimedia service. Firstly, at the transmitter, we perform the causal understanding and spatiotemporal decoupling on interactive videos, with the Video Time-Aware Large Language Model (VTimeLLM), Iterated Integrated Attributions (IIA) and Segment Anything Model 2 (SAM2), to accomplish the video semantic segmentation. Secondly, in the transmission, we propose a two-stage semantic transmission optimization driven by Channel State Information (CSI), which is also suitable for the weights of asymmetric semantic information in real-time video, so that we achieve a low bit rate and high semantic fidelity in the video transmission. Thirdly, at the receiver, RTVCFM provides multidimensional fusion with the whole semantic segmentation by using the Diffusion Model for Foreground Background Fusion (DMFBF), and then we reconstruct the video streams. Finally, the simulation result demonstrates that RTVCFM can achieve a compression ratio as high as 95.6%, while it guarantees high semantic similarity of 98.73% in Multi-Scale Structural Similarity Index Measure (MS-SSIM) and 98.35% in Structural Similarity (SSIM), which shows that the reconstructed video is relatively similar to the original video.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1740-1755"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UGAE: Unified Geometry and Attribute Enhancement for G-PCC Compressed Point Clouds. UGAE: G-PCC压缩点云的统一几何和属性增强。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654348
Pan Zhao, Hui Yuan, Chongzhen Tian, Tian Guo, Raouf Hamzaoui, Zhigeng Pan

Lossy compression of point clouds reduces storage and transmission costs; however, it inevitably leads to irreversible distortion in geometry structure and attribute information. To address these issues, we propose a unified geometry and attribute enhancement (UGAE) framework, which consists of three core components: post-geometry enhancement (PoGE), pre-attribute enhancement (PAE), and post-attribute enhancement (PoAE). In PoGE, a Transformer-based sparse convolutional U-Net is used to reconstruct the geometry structure with high precision by predicting voxel occupancy probabilities. Building on the refined geometry structure, PAE introduces an innovative enhanced geometry-guided recoloring strategy, which uses a detail-aware K-Nearest Neighbors (DA-KNN) method to achieve accurate recoloring and effectively preserve high-frequency details before attribute compression. Finally, at the decoder side, PoAE uses an attribute residual prediction network with a weighted mean squared error (W-MSE) loss to enhance the quality of high-frequency regions while maintaining the fidelity of low-frequency regions. UGAE significantly outperformed existing methods on three benchmark datasets: 8iVFB, Owlii, and MVUB. Compared to the latest G-PCC test model (TMC13v29), in terms of total bitrate setting, UGAE achieved an average BD-PSNR gain of 9.98 dB and -90.54% BD-bitrate for geometry under the D1 metric, as well as a 3.34 dB BD-PSNR improvement with -55.53% BD-bitrate for attributes. Additionally, it improved perceptual quality significantly. Our source code will be released on GitHub at: https://github.com/yuanhui0325/UGAE.

点云的有损压缩降低了存储和传输成本;然而,它不可避免地导致几何结构和属性信息的不可逆畸变。为了解决这些问题,我们提出了一个统一的几何和属性增强(UGAE)框架,该框架由三个核心组件组成:后几何增强(PoGE)、前属性增强(PAE)和后属性增强(PoAE)。在PoGE中,利用基于transformer的稀疏卷积U-Net,通过预测体素占用概率,高精度地重建几何结构。在精细几何结构的基础上,PAE引入了一种创新的增强几何引导重着色策略,该策略使用细节感知的k -近邻(DA-KNN)方法来实现精确的重着色,并在属性压缩之前有效地保留高频细节。最后,在解码器端,PoAE使用加权均方误差(W-MSE)损失的属性残差预测网络来提高高频区域的质量,同时保持低频区域的保真度。UGAE在三个基准数据集(8iVFB、Owlii和MVUB)上的性能明显优于现有方法。与最新的G-PCC测试模型(TMC13v29)相比,在总比特率设置方面,UGAE在D1度量下实现了9.98 dB的平均BD-PSNR增益和-90.54%的几何图形bd -比特率,以及3.34 dB的BD-PSNR改进和-55.53%的属性bd -比特率。此外,它还显著提高了感知质量。我们的源代码将在GitHub上发布:https://github.com/yuanhui0325/UGAE。
{"title":"UGAE: Unified Geometry and Attribute Enhancement for G-PCC Compressed Point Clouds.","authors":"Pan Zhao, Hui Yuan, Chongzhen Tian, Tian Guo, Raouf Hamzaoui, Zhigeng Pan","doi":"10.1109/TIP.2026.3654348","DOIUrl":"10.1109/TIP.2026.3654348","url":null,"abstract":"<p><p>Lossy compression of point clouds reduces storage and transmission costs; however, it inevitably leads to irreversible distortion in geometry structure and attribute information. To address these issues, we propose a unified geometry and attribute enhancement (UGAE) framework, which consists of three core components: post-geometry enhancement (PoGE), pre-attribute enhancement (PAE), and post-attribute enhancement (PoAE). In PoGE, a Transformer-based sparse convolutional U-Net is used to reconstruct the geometry structure with high precision by predicting voxel occupancy probabilities. Building on the refined geometry structure, PAE introduces an innovative enhanced geometry-guided recoloring strategy, which uses a detail-aware K-Nearest Neighbors (DA-KNN) method to achieve accurate recoloring and effectively preserve high-frequency details before attribute compression. Finally, at the decoder side, PoAE uses an attribute residual prediction network with a weighted mean squared error (W-MSE) loss to enhance the quality of high-frequency regions while maintaining the fidelity of low-frequency regions. UGAE significantly outperformed existing methods on three benchmark datasets: 8iVFB, Owlii, and MVUB. Compared to the latest G-PCC test model (TMC13v29), in terms of total bitrate setting, UGAE achieved an average BD-PSNR gain of 9.98 dB and -90.54% BD-bitrate for geometry under the D1 metric, as well as a 3.34 dB BD-PSNR improvement with -55.53% BD-bitrate for attributes. Additionally, it improved perceptual quality significantly. Our source code will be released on GitHub at: https://github.com/yuanhui0325/UGAE.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"888-903"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anatomy-Aware MR-Imaging-Only Radiotherapy. 解剖意识mr成像放射治疗。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3658010
Hao Yang, Yue Sun, Hui Xie, Lina Zhao, Chi Kin Lam, Qiang Zhao, Xiangyu Xiong, Kunyan Cai, Behdad Dashtbozorg, Chenggang Yan, Tao Tan

The synthesis of computed tomography images can supplement electron density information and eliminate MR-CT image registration errors. Consequently, an increasing number of MR-to-CT image translation approaches are being proposed for MR-only radiotherapy planning. However, due to substantial anatomical differences between various regions, traditional approaches often require each model to undergo independent development and use. In this paper, we propose a unified model driven by prompts that dynamically adapt to the different anatomical regions and generates CT images with high structural consistency. Specifically, it utilizes a region-specific attention mechanism, including a region-aware vector and a dynamic gating factor, to achieve MRI-to-CT image translation for multiple anatomical regions. Qualitative and quantitative results on three datasets of anatomical parts demonstrate that our models generate clearer and more anatomically detailed CT images than other state-of-the-art translation models. The results of the dosimetric analysis also indicate that our proposed model generates images with dose distributions more closely aligned to those of the real CT images. Thus, the proposed model demonstrates promising potential for enabling MR-only radiotherapy across multiple anatomical regions. we have released the source code for our RSAM model. The repository is accessible to the public at: https://github.com/yhyumi123/RSAM.

计算机断层图像的合成可以补充电子密度信息,消除核磁共振ct图像配准误差。因此,越来越多的MR-to-CT图像转换方法被提出用于MR-only放疗计划。然而,由于各区域解剖结构的巨大差异,传统方法往往需要每个模型进行独立的开发和使用。在本文中,我们提出了一个由提示符驱动的统一模型,该模型可以动态适应不同的解剖区域,并生成具有高结构一致性的CT图像。具体来说,它利用特定区域的注意机制,包括一个区域感知向量和一个动态门控因子,来实现多个解剖区域的mri到ct图像转换。在三个解剖部位数据集上的定性和定量结果表明,我们的模型比其他最先进的翻译模型产生更清晰、更详细的解剖CT图像。剂量学分析的结果也表明,我们提出的模型产生的图像的剂量分布更接近于真实的CT图像。因此,所提出的模型显示了跨多个解剖区域实现仅磁共振放射治疗的潜力。我们已经发布了RSAM模型的源代码。该存储库可供公众访问:https://github.com/yhyumi123/RSAM。
{"title":"Anatomy-Aware MR-Imaging-Only Radiotherapy.","authors":"Hao Yang, Yue Sun, Hui Xie, Lina Zhao, Chi Kin Lam, Qiang Zhao, Xiangyu Xiong, Kunyan Cai, Behdad Dashtbozorg, Chenggang Yan, Tao Tan","doi":"10.1109/TIP.2026.3658010","DOIUrl":"10.1109/TIP.2026.3658010","url":null,"abstract":"<p><p>The synthesis of computed tomography images can supplement electron density information and eliminate MR-CT image registration errors. Consequently, an increasing number of MR-to-CT image translation approaches are being proposed for MR-only radiotherapy planning. However, due to substantial anatomical differences between various regions, traditional approaches often require each model to undergo independent development and use. In this paper, we propose a unified model driven by prompts that dynamically adapt to the different anatomical regions and generates CT images with high structural consistency. Specifically, it utilizes a region-specific attention mechanism, including a region-aware vector and a dynamic gating factor, to achieve MRI-to-CT image translation for multiple anatomical regions. Qualitative and quantitative results on three datasets of anatomical parts demonstrate that our models generate clearer and more anatomically detailed CT images than other state-of-the-art translation models. The results of the dosimetric analysis also indicate that our proposed model generates images with dose distributions more closely aligned to those of the real CT images. Thus, the proposed model demonstrates promising potential for enabling MR-only radiotherapy across multiple anatomical regions. we have released the source code for our RSAM model. The repository is accessible to the public at: https://github.com/yhyumi123/RSAM.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1680-1695"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Retinex Prior for Compressive Hyperspectral Image Reconstruction. 基于先验学习的压缩高光谱图像重建。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3659746
Mengzu Liu, Junwei Xu, Weisheng Dong, Le Dong, Guangming Shi

Image reconstruction in coded aperture snapshot spectral compressive imaging (CASSI) aims to recover high-fidelity hyperspectral images (HSIs) from compressed 2D measurements. While deep unfolding networks have shown promising performance, the degradation induced by the CASSI degradation model often introduces global illumination discrepancies in the reconstructions, creating artifacts similar to those in low-light images. To address these challenges, we propose a novel Retinex Prior-Driven Unfolding Network (RPDUN), which unfolds the optimization incorporating the Retinex prior as a regularization term into a multi-stage network. This design provides global illumination adjustment for compressed measurements, effectively compensating for spatial-spectral degradation according to physical modulation and capturing intrinsic spectral characteristics. To the best of our knowledge, this is the first application of the Retinex prior in hyperspectral image reconstruction. Furthermore, to mitigate the noise in the reflectance domain, which can be amplified during decomposition, we introduce an Adaptive Token Selection Transformer (ATST). This module adaptively filters out weakly correlated tokens before the self-attention computation, effectively reducing noise and artifacts within the recovered reflectance map. Extensive experiments on both simulated and real-world datasets demonstrate that RPDUN achieves new state-of-the-art performance, significantly improving reconstruction quality while maintaining computational efficiency. The code is available at https://github.com/ZUGE0312/RPDUN.

编码孔径快照光谱压缩成像(CASSI)中的图像重建旨在从压缩的二维测量中恢复高保真高光谱图像(hsi)。虽然深度展开网络表现出了良好的性能,但CASSI退化模型引起的退化通常会在重建中引入全局光照差异,从而产生类似于低光图像的伪影。为了解决这些挑战,我们提出了一种新的Retinex先验驱动的展开网络(RPDUN),它将Retinex先验作为正则化项展开到一个多阶段网络中。该设计为压缩测量提供全局照明调整,根据物理调制有效补偿空间光谱退化并捕获固有光谱特性。据我们所知,这是Retinex先验在高光谱图像重建中的首次应用。此外,为了减轻在分解过程中可能被放大的反射域噪声,我们引入了自适应令牌选择变压器(ATST)。该模块在自关注计算前自适应滤除弱相关令牌,有效地降低了恢复的反射率图中的噪声和伪影。在模拟和真实数据集上的大量实验表明,RPDUN实现了新的最先进的性能,在保持计算效率的同时显着提高了重建质量。代码可在https://github.com/ZUGE0312/RPDUN上获得。
{"title":"Learning Retinex Prior for Compressive Hyperspectral Image Reconstruction.","authors":"Mengzu Liu, Junwei Xu, Weisheng Dong, Le Dong, Guangming Shi","doi":"10.1109/TIP.2026.3659746","DOIUrl":"10.1109/TIP.2026.3659746","url":null,"abstract":"<p><p>Image reconstruction in coded aperture snapshot spectral compressive imaging (CASSI) aims to recover high-fidelity hyperspectral images (HSIs) from compressed 2D measurements. While deep unfolding networks have shown promising performance, the degradation induced by the CASSI degradation model often introduces global illumination discrepancies in the reconstructions, creating artifacts similar to those in low-light images. To address these challenges, we propose a novel Retinex Prior-Driven Unfolding Network (RPDUN), which unfolds the optimization incorporating the Retinex prior as a regularization term into a multi-stage network. This design provides global illumination adjustment for compressed measurements, effectively compensating for spatial-spectral degradation according to physical modulation and capturing intrinsic spectral characteristics. To the best of our knowledge, this is the first application of the Retinex prior in hyperspectral image reconstruction. Furthermore, to mitigate the noise in the reflectance domain, which can be amplified during decomposition, we introduce an Adaptive Token Selection Transformer (ATST). This module adaptively filters out weakly correlated tokens before the self-attention computation, effectively reducing noise and artifacts within the recovered reflectance map. Extensive experiments on both simulated and real-world datasets demonstrate that RPDUN achieves new state-of-the-art performance, significantly improving reconstruction quality while maintaining computational efficiency. The code is available at https://github.com/ZUGE0312/RPDUN.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1786-1801"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DVG-Diffusion: Dual-View-Guided Diffusion Model for CT Reconstruction From X-Rays. dvg扩散:x射线CT重建的双视图引导扩散模型。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3655171
Xing Xie, Jiawei Liu, Huijie Fan, Zhi Han, Yandong Tang, Liangqiong Qu

Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented. The model and code are available at https://github.com/xiexing0916/DVG-Diffusion.

使用端到端深度学习网络从少量2D x射线直接重建3D CT体是一项具有挑战性的任务,因为x射线图像仅仅是3D CT体的投影视图。在这项工作中,我们通过引入新的视图合成来促进复杂的2D x射线图像到3D CT的映射,并通过视图引导特征对齐来降低学习难度。具体而言,我们提出了一种双视图引导扩散模型(DVG-Diffusion),该模型将真实输入x射线视图和合成的新x射线视图耦合在一起,共同指导CT重建。首先,一种新颖的视图参数引导编码器捕获与CT空间对齐的x射线的特征。接下来,我们将提取的双视图特征连接起来,作为潜在扩散模型学习和改进CT潜在表示的条件。最后,将CT潜表示解码为像素空间的CT体。通过结合视图参数引导编码和双视图引导CT重建,我们的DVG-Diffusion可以在CT重建的高保真度和感知质量之间实现有效的平衡。实验结果表明,我们的方法优于最先进的方法。在实验的基础上,对视图和重构进行了综合分析和讨论。该模型和代码可在https://github.com/xiexing0916/DVG-Diffusion上获得。
{"title":"DVG-Diffusion: Dual-View-Guided Diffusion Model for CT Reconstruction From X-Rays.","authors":"Xing Xie, Jiawei Liu, Huijie Fan, Zhi Han, Yandong Tang, Liangqiong Qu","doi":"10.1109/TIP.2026.3655171","DOIUrl":"10.1109/TIP.2026.3655171","url":null,"abstract":"<p><p>Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented. The model and code are available at https://github.com/xiexing0916/DVG-Diffusion.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1158-1173"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LaCon: Late-Constraint Controllable Visual Generation. LaCon:后约束可控视觉生成。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654412
Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Xin Luo, Dong Liu

Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process of diffusion models, previous studies normally adopt extra modules to integrate condition signals by manipulating the intermediate features of the noise predictors, where they often fail in conditions not seen in the training. Although subsequent studies are motivated to handle multi-condition control, they are mostly resource-consuming to implement, where more generalizable and efficient solutions are expected for controllable visual generation. In this paper, we present a late-constraint controllable visual generation method, namely LaCon, which enables generalization across various modalities and granularities for each single-condition control. LaCon establishes an alignment between the external condition and specific diffusion timesteps, and guides diffusion models to produce conditional results based on this built alignment. Experimental results on prevailing benchmark datasets illustrate the promising performance and generalization capability of LaCon under various conditions and settings. Ablation studies analyze different components in LaCon, illustrating its great potential to offer flexible condition controls for different backbones.

扩散模型在生成照片真实感和创造性图像方面表现出令人印象深刻的能力。为了给扩散模型的生成过程提供更多的可控性,以往的研究通常采用额外的模块,通过操纵噪声预测器的中间特征来整合条件信号,而在训练中没有看到的条件下,它们往往会失败。虽然后续研究的动机是处理多条件控制,但它们大多是资源消耗的实现,其中更通用和有效的解决方案,期望可控的视觉生成。在本文中,我们提出了一种后约束可控视觉生成方法,即LaCon,它可以实现对每个单条件控制的各种模态和粒度的泛化。LaCon在外部条件和特定扩散时间步之间建立了一种一致性,并指导扩散模型基于这种构建的一致性产生条件结果。在主流基准数据集上的实验结果表明,LaCon在各种条件和设置下具有良好的性能和泛化能力。消融研究分析了LaCon的不同成分,说明了其为不同骨干提供灵活状态控制的巨大潜力。
{"title":"LaCon: Late-Constraint Controllable Visual Generation.","authors":"Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Xin Luo, Dong Liu","doi":"10.1109/TIP.2026.3654412","DOIUrl":"10.1109/TIP.2026.3654412","url":null,"abstract":"<p><p>Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process of diffusion models, previous studies normally adopt extra modules to integrate condition signals by manipulating the intermediate features of the noise predictors, where they often fail in conditions not seen in the training. Although subsequent studies are motivated to handle multi-condition control, they are mostly resource-consuming to implement, where more generalizable and efficient solutions are expected for controllable visual generation. In this paper, we present a late-constraint controllable visual generation method, namely LaCon, which enables generalization across various modalities and granularities for each single-condition control. LaCon establishes an alignment between the external condition and specific diffusion timesteps, and guides diffusion models to produce conditional results based on this built alignment. Experimental results on prevailing benchmark datasets illustrate the promising performance and generalization capability of LaCon under various conditions and settings. Ablation studies analyze different components in LaCon, illustrating its great potential to offer flexible condition controls for different backbones.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1111-1126"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning-Based Joint Geometry and Attribute Up-Sampling for Large-Scale Colored Point Clouds. 基于深度学习的大规模彩色点云联合几何和属性上采样。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3657214
Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong

Colored point cloud comprising geometry and attribute components is one of the mainstream representations enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method, which learns to model both geometry and attribute patterns and leverages the spatial attribute correlation. Firstly, we establish and release a large-scale dataset for colored point cloud up-sampling, named SYSU-PCUD, which has 121 large-scale colored point clouds with diverse geometry and attribute complexities in six categories and four sampling rates. Secondly, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework to up-sample the geometry and attribute jointly. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Thirdly, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarsely up-sampled attributes for each point. Then, we propose an attribute enhancement module to refine the up-sampled attributes and generate high quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU are 33.90 dB, 32.10 dB, 31.10 dB, and 30.39 dB when up-sampling rates are $4times $ , $8times $ , $12times $ , and $16times $ , respectively. Compared to the state-of-the-art schemes, the JGAU achieves an average of 2.32 dB, 2.47 dB, 2.28 dB and 2.11 dB PSNR gains at four up-sampling rates, respectively, which are significant. The code is released with https://github.com/SYSU-Video/JGAU.

由几何和属性组件组成的彩色点云是实现逼真和沉浸式3D应用的主流表示之一。为了生成大规模和更密集的彩色点云,我们提出了一种基于深度学习的联合几何和属性上采样(JGAU)方法,该方法学习建模几何和属性模式,并利用空间属性相关性。首先,建立并发布了大规模彩色点云上采样数据集SYSU-PCUD,该数据集包含6类、4种采样率、121种不同几何和属性复杂度的大规模彩色点云。其次,为了提高点云上采样的质量,提出了一种基于深度学习的JGAU框架,对几何和属性进行联合上采样。它由一个几何上采样网络和一个属性上采样网络组成,后者利用上采样的辅助几何来建模属性的邻域相关性。第三,提出了几何距离加权属性插值(GDWAI)和基于深度学习的属性插值(DLAI)两种粗属性上采样方法,对每个点进行粗属性上采样。然后,我们提出了一个属性增强模块,通过进一步挖掘固有属性和几何模式来细化上采样属性,生成高质量的点云。大量实验表明,当上采样率为4倍、8倍、12倍和16倍时,JGAU的峰值信噪比分别为33.90 dB、32.10 dB、31.10 dB和30.39 dB。与最先进的方案相比,JGAU在四种上采样率下分别实现了2.32 dB、2.47 dB、2.28 dB和2.11 dB的平均PSNR增益,这是非常显著的。
{"title":"Deep Learning-Based Joint Geometry and Attribute Up-Sampling for Large-Scale Colored Point Clouds.","authors":"Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong","doi":"10.1109/TIP.2026.3657214","DOIUrl":"10.1109/TIP.2026.3657214","url":null,"abstract":"<p><p>Colored point cloud comprising geometry and attribute components is one of the mainstream representations enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method, which learns to model both geometry and attribute patterns and leverages the spatial attribute correlation. Firstly, we establish and release a large-scale dataset for colored point cloud up-sampling, named SYSU-PCUD, which has 121 large-scale colored point clouds with diverse geometry and attribute complexities in six categories and four sampling rates. Secondly, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework to up-sample the geometry and attribute jointly. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Thirdly, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarsely up-sampled attributes for each point. Then, we propose an attribute enhancement module to refine the up-sampled attributes and generate high quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU are 33.90 dB, 32.10 dB, 31.10 dB, and 30.39 dB when up-sampling rates are $4times $ , $8times $ , $12times $ , and $16times $ , respectively. Compared to the state-of-the-art schemes, the JGAU achieves an average of 2.32 dB, 2.47 dB, 2.28 dB and 2.11 dB PSNR gains at four up-sampling rates, respectively, which are significant. The code is released with https://github.com/SYSU-Video/JGAU.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1305-1320"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Double Nonconvex Tensor Robust Kernel Principal Component Analysis and Its Visual Applications. 双非凸张量鲁棒核主成分分析及其可视化应用。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3659302
Liang Wu, Jianjun Wang, Wei-Shi Zheng, Guangming Shi

Tensor robust principal component analysis (TRPCA), as a popular linear low-rank method, has been widely applied to various visual tasks. The mathematical process of the low-rank prior is derived from the linear latent variable model. However, for nonlinear tensor data with rich information, their nonlinear structures may break through the assumption of low-rankness and lead to the large approximation error for TRPCA. Motivated by the latent low-dimensionality of nonlinear tensors, the general paradigm of the nonlinear tensor plus sparse tensor decomposition problem, called tensor robust kernel principal component analysis (TRKPCA), is first established in this paper. To efficiently tackle TRKPCA problem, two novel nonconvex regularizers the kernelized tensor Schatten- $p$ norm (KTSPN) and generalized nonconvex regularization are designed, where the former KTSPN with tighter theoretical support adequately captures nonlinear features (i.e., implicit low-rankness) and the latter ensures the sparser structural coding, guaranteeing more robust separation results. Then by integrating their strengths, we propose a double nonconvex TRKPCA (DNTRKPCA) method to achieve our expectation. Finally, we develop an efficient optimization framework via the alternating direction multiplier method (ADMM) to implement the proposed nonconvex kernel method. Experimental results on synthetic data and several real databases show the higher competitiveness of our method compared with other state-of-the-art regularization methods. The code has been released in our ResearchGate homepage: https://www.researchgate.net/publication/397181729_DNTRKPCA_code.

张量鲁棒主成分分析(TRPCA)作为一种流行的线性低秩方法,已广泛应用于各种视觉任务。从线性潜变量模型出发,推导了低秩先验的数学过程。然而,对于信息丰富的非线性张量数据,其非线性结构可能会突破低秩假设,导致TRPCA的近似误差较大。基于非线性张量的潜在低维性,本文首次建立了非线性张量加稀疏张量分解问题的一般范式——张量鲁棒核主成分分析(TRKPCA)。为了有效地解决TRKPCA问题,设计了核化张量schattenp范数(kernel - tensor schattenp norm, KTSPN)和广义非凸正则化两种新的非凸正则化方法,其中KTSPN具有更强的理论支持,能够充分捕获非线性特征(即隐式低秩),而广义非凸正则化则保证了更稀疏的结构编码,保证了分离结果的鲁棒性。然后,通过整合它们的优势,我们提出了一种双非凸TRKPCA (DNTRKPCA)方法来实现我们的期望。最后,我们通过交替方向乘子法(ADMM)开发了一个有效的优化框架来实现所提出的非凸核方法。在合成数据和几个真实数据库上的实验结果表明,与其他最先进的正则化方法相比,我们的方法具有更高的竞争力。该代码已在我们的ResearchGate主页上发布:https://www.researchgate.net/publication/397181729 DNTRKPCA代码。
{"title":"Double Nonconvex Tensor Robust Kernel Principal Component Analysis and Its Visual Applications.","authors":"Liang Wu, Jianjun Wang, Wei-Shi Zheng, Guangming Shi","doi":"10.1109/TIP.2026.3659302","DOIUrl":"10.1109/TIP.2026.3659302","url":null,"abstract":"<p><p>Tensor robust principal component analysis (TRPCA), as a popular linear low-rank method, has been widely applied to various visual tasks. The mathematical process of the low-rank prior is derived from the linear latent variable model. However, for nonlinear tensor data with rich information, their nonlinear structures may break through the assumption of low-rankness and lead to the large approximation error for TRPCA. Motivated by the latent low-dimensionality of nonlinear tensors, the general paradigm of the nonlinear tensor plus sparse tensor decomposition problem, called tensor robust kernel principal component analysis (TRKPCA), is first established in this paper. To efficiently tackle TRKPCA problem, two novel nonconvex regularizers the kernelized tensor Schatten- $p$ norm (KTSPN) and generalized nonconvex regularization are designed, where the former KTSPN with tighter theoretical support adequately captures nonlinear features (i.e., implicit low-rankness) and the latter ensures the sparser structural coding, guaranteeing more robust separation results. Then by integrating their strengths, we propose a double nonconvex TRKPCA (DNTRKPCA) method to achieve our expectation. Finally, we develop an efficient optimization framework via the alternating direction multiplier method (ADMM) to implement the proposed nonconvex kernel method. Experimental results on synthetic data and several real databases show the higher competitiveness of our method compared with other state-of-the-art regularization methods. The code has been released in our ResearchGate homepage: https://www.researchgate.net/publication/397181729_DNTRKPCA_code.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1711-1726"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion. 基于相机的三维语义场景补全中体素稀疏度的多分辨率对齐。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3660576
Zhiwen Yang, Yuxin Peng

Camera-based 3D semantic scene completion (SSC) offers a cost-effective solution for assessing the geometric occupancy and semantic labels of each voxel in the surrounding 3D scene with image inputs, providing a voxel-level scene perception foundation for the perception-prediction-planning autonomous driving systems. Although significant progress has been made in existing methods, their optimization rely solely on the supervision from voxel labels and face the challenge of voxel sparsity as a large portion of voxels in autonomous driving scenarios are empty, which limits both optimization efficiency and model performance. To address this issue, we propose a Multi-Resolution Alignment (MRA) approach to mitigate voxel sparsity in camera-based 3D semantic scene completion, which exploits the scene and instance level alignment across multi-resolution 3D features as auxiliary supervision. Specifically, we first propose the Multi-resolution View Transformer module, which projects 2D image features into multi-resolution 3D features and aligns them at the scene level through fusing discriminative seed features. Furthermore, we design the Cubic Semantic Anisotropy module to identify the instance-level semantic significance of each voxel, accounting for the semantic differences of a specific voxel against its neighboring voxels within a cubic area. Finally, we devise a Critical Distribution Alignment module, which selects critical voxels as instance-level anchors with the guidance of cubic semantic anisotropy, and applies a circulated loss for auxiliary supervision on the critical feature distribution consistency across different resolutions. Extensive experiments on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate that our MRA approach significantly outperforms existing state-of-the-art methods, showcasing its effectiveness in mitigating the impact of sparse voxel labels. The code is available at https://github.com/PKU-ICST-MIPL/MRA_TIP.

基于摄像头的3D语义场景补全(SSC)提供了一种具有成本效益的解决方案,可通过图像输入评估周围3D场景中每个体素的几何占用率和语义标签,为感知-预测-规划自动驾驶系统提供体素级场景感知基础。虽然现有的方法已经取得了很大的进展,但它们的优化仅仅依赖于体素标签的监督,并且面临着体素稀疏性的挑战,因为自动驾驶场景中很大一部分体素是空的,这限制了优化效率和模型性能。为了解决这个问题,我们提出了一种多分辨率对齐(MRA)方法来缓解基于相机的3D语义场景补全中的体素稀疏性,该方法利用多分辨率3D特征之间的场景和实例级对齐作为辅助监督。具体而言,我们首先提出了多分辨率视图转换模块,该模块将2D图像特征投影到多分辨率3D特征中,并通过融合判别种子特征在场景级对其进行对齐。此外,我们设计了立方体语义各向异性模块来识别每个体素的实例级语义重要性,考虑特定体素与其相邻体素在立方体区域内的语义差异。最后,我们设计了一个关键分布对齐模块,该模块在三次语义各向异性的指导下选择关键体素作为实例级锚点,并应用循环损失对不同分辨率下的关键特征分布一致性进行辅助监督。在SemanticKITTI和sschbench - kitti -360数据集上的大量实验表明,我们的MRA方法显著优于现有的最先进的方法,展示了其在减轻稀疏体素标签影响方面的有效性。代码可在https://github.com/PKU-ICST-MIPL/MRA_TIP上获得。
{"title":"Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion.","authors":"Zhiwen Yang, Yuxin Peng","doi":"10.1109/TIP.2026.3660576","DOIUrl":"10.1109/TIP.2026.3660576","url":null,"abstract":"<p><p>Camera-based 3D semantic scene completion (SSC) offers a cost-effective solution for assessing the geometric occupancy and semantic labels of each voxel in the surrounding 3D scene with image inputs, providing a voxel-level scene perception foundation for the perception-prediction-planning autonomous driving systems. Although significant progress has been made in existing methods, their optimization rely solely on the supervision from voxel labels and face the challenge of voxel sparsity as a large portion of voxels in autonomous driving scenarios are empty, which limits both optimization efficiency and model performance. To address this issue, we propose a Multi-Resolution Alignment (MRA) approach to mitigate voxel sparsity in camera-based 3D semantic scene completion, which exploits the scene and instance level alignment across multi-resolution 3D features as auxiliary supervision. Specifically, we first propose the Multi-resolution View Transformer module, which projects 2D image features into multi-resolution 3D features and aligns them at the scene level through fusing discriminative seed features. Furthermore, we design the Cubic Semantic Anisotropy module to identify the instance-level semantic significance of each voxel, accounting for the semantic differences of a specific voxel against its neighboring voxels within a cubic area. Finally, we devise a Critical Distribution Alignment module, which selects critical voxels as instance-level anchors with the guidance of cubic semantic anisotropy, and applies a circulated loss for auxiliary supervision on the critical feature distribution consistency across different resolutions. Extensive experiments on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate that our MRA approach significantly outperforms existing state-of-the-art methods, showcasing its effectiveness in mitigating the impact of sparse voxel labels. The code is available at https://github.com/PKU-ICST-MIPL/MRA_TIP.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1771-1785"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causally-Aware Unsupervised Feature Selection Learning. 因果意识无监督特征选择学习。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654354
Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li

Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.

近年来,无监督特征选择(Unsupervised feature selection, UFS)因其在处理未标记高维数据方面的有效性而备受关注。然而,现有的方法忽略了数据内在的因果机制,导致选择不相关的特征和较差的可解释性。此外,以前基于图的方法在构建相似图时没有考虑到非因果和因果特征的不同影响,从而导致生成图中的假链接。为了解决这些问题,提出了一种新的UFS方法,称为因果感知无监督特征选择学习(CAUSE-FS)。CAUSE-FS引入了一种新的因果正则器,它重新加权样本以平衡每个处理特征的混杂分布。该正则化器随后集成到广义无监督谱回归模型中,以减轻特征和聚类标签之间的虚假关联,从而实现因果特征选择。此外,CAUSE-FS采用因果引导的分层聚类,将不同因果贡献的特征划分为多个粒度。CAUSE-FS通过对不同粒度自适应学习的相似图进行整合,提高了构建融合相似图时因果特征的重要性,以捕获可靠的数据局部结构。大量的实验结果证明了CAUSE-FS优于最先进的方法,并通过特征可视化进一步验证了其可解释性。
{"title":"Causally-Aware Unsupervised Feature Selection Learning.","authors":"Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li","doi":"10.1109/TIP.2026.3654354","DOIUrl":"10.1109/TIP.2026.3654354","url":null,"abstract":"<p><p>Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1011-1024"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1