首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting IAP:通过实例感知提示改善视觉语言模型的持续学习
IF 13.7 Pub Date : 2026-01-12 DOI: 10.1109/TIP.2025.3650045
Hao Fu;Hanbin Zhao;Jiahua Dong;Henghui Ding;Chao Zhang;Hui Qian
Recent pre-trained vision-language models (PT-VLMs) often face a Multi-Domain Task Incremental Learning (MTIL) scenario in practice, where several classes and domains of multi-modal tasks are arrive incrementally. Without access to previously seen tasks and unseen tasks, memory-constrained MTIL suffers from forward and backward forgetting. To alleviate the above challenges, parameter-efficient fine-tuning techniques (PEFT), such as prompt tuning, are employed to adapt the PT-VLM to the diverse incrementally learned tasks. To achieve effective new task adaptation, existing methods only consider the effect of PEFT strategy selection, but neglect the influence of PEFT parameter setting (e.g., prompting). In this paper, we tackle the challenge of optimizing prompt designs for diverse tasks in MTIL and propose an Instance-Aware Prompting (IAP) framework. Specifically, our Instance-Aware Gated Prompting (IA-GP) strategy enhances adaptation to new tasks while mitigating forgetting by adaptively assigning prompts across transformer layers at the instance level. Our Instance-Aware Class-Distribution-Driven Prompting (IA-CDDP) improves the task adaptation process by determining an accurate task-label-related confidence score for each instance. Experimental evaluations across 11 datasets, using three performance metrics, demonstrate the effectiveness of our proposed method. The source codes are available at https://github.com/FerdinandZJU/IAP
当前的预训练视觉语言模型(PT-VLMs)在实践中经常面临多域任务增量学习(MTIL)的场景,其中多模态任务的多个类和域是增量到达的。由于无法访问先前看到的任务和未看到的任务,记忆受限的MTIL会遭受前向和后向遗忘。为了缓解上述挑战,采用参数有效微调技术(PEFT),如提示调谐,使PT-VLM适应各种增量学习任务。为了实现有效的新任务适应,现有方法只考虑PEFT策略选择的影响,而忽略了PEFT参数设置(如提示)的影响。在本文中,我们解决了优化MTIL中不同任务的提示设计的挑战,并提出了一个实例感知提示(IAP)框架。具体来说,我们的实例感知门控提示(IA-GP)策略增强了对新任务的适应能力,同时通过在实例级跨转换层自适应地分配提示来减轻遗忘。我们的实例感知类分布驱动提示(IA-CDDP)通过为每个实例确定与任务标签相关的准确置信度评分,改进了任务适应过程。使用三个性能指标对11个数据集进行实验评估,证明了我们提出的方法的有效性。源代码可从https://github.com/FerdinandZJU/IAP获得
{"title":"IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting","authors":"Hao Fu;Hanbin Zhao;Jiahua Dong;Henghui Ding;Chao Zhang;Hui Qian","doi":"10.1109/TIP.2025.3650045","DOIUrl":"10.1109/TIP.2025.3650045","url":null,"abstract":"Recent pre-trained vision-language models (PT-VLMs) often face a Multi-Domain Task Incremental Learning (MTIL) scenario in practice, where several classes and domains of multi-modal tasks are arrive incrementally. Without access to previously seen tasks and unseen tasks, memory-constrained MTIL suffers from forward and backward forgetting. To alleviate the above challenges, parameter-efficient fine-tuning techniques (PEFT), such as prompt tuning, are employed to adapt the PT-VLM to the diverse incrementally learned tasks. To achieve effective new task adaptation, existing methods only consider the effect of PEFT strategy selection, but neglect the influence of PEFT parameter setting (e.g., prompting). In this paper, we tackle the challenge of optimizing prompt designs for diverse tasks in MTIL and propose an Instance-Aware Prompting (IAP) framework. Specifically, our Instance-Aware Gated Prompting (IA-GP) strategy enhances adaptation to new tasks while mitigating forgetting by adaptively assigning prompts across transformer layers at the instance level. Our Instance-Aware Class-Distribution-Driven Prompting (IA-CDDP) improves the task adaptation process by determining an accurate task-label-related confidence score for each instance. Experimental evaluations across 11 datasets, using three performance metrics, demonstrate the effectiveness of our proposed method. The source codes are available at <uri>https://github.com/FerdinandZJU/IAP</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"717-731"},"PeriodicalIF":13.7,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reflectance Prediction-Based Knowledge Distillation for Robust 3D Object Detection in Compressed Point Clouds. 基于反射率预测的压缩点云三维目标鲁棒检测方法。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2025.3648203
Hao Jing, Anhong Wang, Yifan Zhang, Donghan Bu, Junhui Hou

Regarding intelligent transportation systems, low-bitrate transmission via lossy point cloud compression is vital for facilitating real-time collaborative perception among connected agents, such as vehicles and infrastructures, under restricted bandwidth. In existing compression transmission systems, the sender lossily compresses point coordinates and reflectance to generate a transmission code stream, which faces transmission burdens from reflectance encoding and limited detection robustness due to information loss. To address these issues, this paper proposes a 3D object detection framework with reflectance prediction-based knowledge distillation (RPKD). We compress point coordinates while discarding reflectance during low-bitrate transmission, and feed the decoded non-reflectance compressed point clouds into a student detector. The discarded reflectance is then reconstructed by a geometry-based reflectance prediction (RP) module within the student detector for precise detection. A teacher detector with the same structure as the student detector is designed for performing reflectance knowledge distillation (RKD) and detection knowledge distillation (DKD) from raw to compressed point clouds. Our cross-source distillation training strategy (CDTS) equips the student detector with robustness to low-quality compressed data while preserving the accuracy benefits of raw data through transferred distillation knowledge. Experimental results on the KITTI and DAIR-V2X-V datasets demonstrate that our method can boost detection accuracy for compressed point clouds across multiple code rates. We will release the code publicly at https://github.com/HaoJing-SX/RPKD.

对于智能交通系统,在有限带宽下,通过有损点云压缩进行的低比特率传输对于促进连接代理(如车辆和基础设施)之间的实时协同感知至关重要。在现有的压缩传输系统中,发送方对点坐标和反射率进行有损压缩生成传输码流,这既面临着反射率编码带来的传输负担,又面临着信息丢失导致的检测鲁棒性受限的问题。为了解决这些问题,本文提出了一种基于反射率预测的知识蒸馏(RPKD)的三维目标检测框架。我们在低比特率传输过程中压缩点坐标,同时丢弃反射率,并将解码后的非反射率压缩点云送入学生探测器。然后通过学生检测器内基于几何的反射率预测(RP)模块重建丢弃的反射率,以进行精确检测。设计了一种与学生检测器结构相同的教师检测器,用于从原始点云到压缩点云进行反射知识蒸馏(RKD)和检测知识蒸馏(DKD)。我们的跨源蒸馏训练策略(CDTS)使学生检测器对低质量压缩数据具有鲁棒性,同时通过转移的蒸馏知识保持原始数据的准确性。在KITTI和DAIR-V2X-V数据集上的实验结果表明,该方法可以提高压缩点云在多个码率下的检测精度。我们将在https://github.com/HaoJing-SX/RPKD公开发布代码。
{"title":"Reflectance Prediction-Based Knowledge Distillation for Robust 3D Object Detection in Compressed Point Clouds.","authors":"Hao Jing, Anhong Wang, Yifan Zhang, Donghan Bu, Junhui Hou","doi":"10.1109/TIP.2025.3648203","DOIUrl":"10.1109/TIP.2025.3648203","url":null,"abstract":"<p><p>Regarding intelligent transportation systems, low-bitrate transmission via lossy point cloud compression is vital for facilitating real-time collaborative perception among connected agents, such as vehicles and infrastructures, under restricted bandwidth. In existing compression transmission systems, the sender lossily compresses point coordinates and reflectance to generate a transmission code stream, which faces transmission burdens from reflectance encoding and limited detection robustness due to information loss. To address these issues, this paper proposes a 3D object detection framework with reflectance prediction-based knowledge distillation (RPKD). We compress point coordinates while discarding reflectance during low-bitrate transmission, and feed the decoded non-reflectance compressed point clouds into a student detector. The discarded reflectance is then reconstructed by a geometry-based reflectance prediction (RP) module within the student detector for precise detection. A teacher detector with the same structure as the student detector is designed for performing reflectance knowledge distillation (RKD) and detection knowledge distillation (DKD) from raw to compressed point clouds. Our cross-source distillation training strategy (CDTS) equips the student detector with robustness to low-quality compressed data while preserving the accuracy benefits of raw data through transferred distillation knowledge. Experimental results on the KITTI and DAIR-V2X-V datasets demonstrate that our method can boost detection accuracy for compressed point clouds across multiple code rates. We will release the code publicly at https://github.com/HaoJing-SX/RPKD.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"85-97"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Procedure-Aware Hierarchical Alignment for Open Surgery Video-Language Pretraining. 开放手术视频语言预训练的程序感知分层对齐。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3659752
Boqiang Xu, Jinlin Wu, Jian Liang, Zhenan Sun, Hongbin Liu, Jiebo Luo, Zhen Lei

Recent advances in surgical robotics and computer vision have greatly improved intelligent systems' autonomy and perception in the operating room (OR), especially in endoscopic and minimally invasive surgeries. However, for open surgery, which is still the predominant form of surgical intervention worldwide, there has been relatively limited exploration due to its inherent complexity and the lack of large-scale, diverse datasets. To close this gap, we present OpenSurgery, by far the largest video-text pretraining and evaluation dataset for open surgery understanding. OpenSurgery consists of two subsets: OpenSurgery-Pretrain and OpenSurgery-EVAL. OpenSurgery-Pretrain consists of 843 publicly available open surgery videos for pretraining, spanning 102 hours and encompassing over 20 distinct surgical types. OpenSurgery-EVAL is a benchmark dataset for evaluating model performance in open surgery understanding, comprising 280 training and 120 test videos, totaling 49 hours. Each video in OpenSurgery is meticulously annotated by expert surgeons at three hierarchical levels of video, operation, and frame to ensure both high quality and strong clinical applicability. Next, we propose the Hierarchical Surgical Knowledge Pretraining (HierSKP) framework to facilitate large-scale multimodal representation learning for open surgery understanding. HierSKP leverages a granularity-aware contrastive learning strategy and enhances procedural comprehension by constructing hard negative samples and incorporating a Dynamic Time Warping (DTW)-based loss to capture fine-grained temporal alignment of visual semantics. Extensive experiments show that HierSKP achieves state-of-the-art performance on OpenSurgegy-EVAL across multiple tasks, including operation recognition, temporal action localization, and zero-shot cross-modal retrieval. This demonstrates its strong generalizability for further advances in open surgery understanding.

外科机器人技术和计算机视觉的最新进展极大地提高了智能系统在手术室(OR)中的自主性和感知能力,特别是在内窥镜和微创手术中。然而,开放手术仍然是世界范围内主要的手术干预形式,由于其固有的复杂性和缺乏大规模、多样化的数据集,其探索相对有限。为了缩小这一差距,我们提出了OpenSurgery,这是迄今为止最大的用于开放手术理解的视频文本预训练和评估数据集。OpenSurgery包括两个子集:OpenSurgery- pretrain和OpenSurgery- eval。OpenSurgery-Pretrain由843个公开的开放式手术视频组成,用于预训练,跨越102小时,涵盖20多种不同的手术类型。OpenSurgery-EVAL是用于评估开放手术理解模型性能的基准数据集,包括280个训练视频和120个测试视频,总计49小时。OpenSurgery的每一个视频都由专家医生从视频、操作、帧三个层次进行精心注释,保证了高质量和较强的临床适用性。接下来,我们提出了分层外科知识预训练(HierSKP)框架,以促进开放手术理解的大规模多模态表示学习。HierSKP利用粒度感知的对比学习策略,通过构建硬负样本和结合基于动态时间扭曲(DTW)的损失来捕获视觉语义的细粒度时间对齐,从而增强程序理解。大量实验表明,HierSKP在opensurgical - eval上实现了最先进的多任务性能,包括操作识别、时间动作定位和零射击跨模态检索。这证明了它对进一步推进开放手术的理解具有很强的通用性。
{"title":"Procedure-Aware Hierarchical Alignment for Open Surgery Video-Language Pretraining.","authors":"Boqiang Xu, Jinlin Wu, Jian Liang, Zhenan Sun, Hongbin Liu, Jiebo Luo, Zhen Lei","doi":"10.1109/TIP.2026.3659752","DOIUrl":"10.1109/TIP.2026.3659752","url":null,"abstract":"<p><p>Recent advances in surgical robotics and computer vision have greatly improved intelligent systems' autonomy and perception in the operating room (OR), especially in endoscopic and minimally invasive surgeries. However, for open surgery, which is still the predominant form of surgical intervention worldwide, there has been relatively limited exploration due to its inherent complexity and the lack of large-scale, diverse datasets. To close this gap, we present OpenSurgery, by far the largest video-text pretraining and evaluation dataset for open surgery understanding. OpenSurgery consists of two subsets: OpenSurgery-Pretrain and OpenSurgery-EVAL. OpenSurgery-Pretrain consists of 843 publicly available open surgery videos for pretraining, spanning 102 hours and encompassing over 20 distinct surgical types. OpenSurgery-EVAL is a benchmark dataset for evaluating model performance in open surgery understanding, comprising 280 training and 120 test videos, totaling 49 hours. Each video in OpenSurgery is meticulously annotated by expert surgeons at three hierarchical levels of video, operation, and frame to ensure both high quality and strong clinical applicability. Next, we propose the Hierarchical Surgical Knowledge Pretraining (HierSKP) framework to facilitate large-scale multimodal representation learning for open surgery understanding. HierSKP leverages a granularity-aware contrastive learning strategy and enhances procedural comprehension by constructing hard negative samples and incorporating a Dynamic Time Warping (DTW)-based loss to capture fine-grained temporal alignment of visual semantics. Extensive experiments show that HierSKP achieves state-of-the-art performance on OpenSurgegy-EVAL across multiple tasks, including operation recognition, temporal action localization, and zero-shot cross-modal retrieval. This demonstrates its strong generalizability for further advances in open surgery understanding.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1966-1976"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep LoRA-Unfolding Networks for Image Restoration. 用于图像恢复的深度lora展开网络。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3661406
Xiangming Wang, Haijin Zeng, Benteng Sun, Jiezhang Cao, Kai Zhang, Qiangqiang Shen, Yongyong Chen

Deep unfolding networks (DUNs), combining conventional iterative optimization algorithms and deep neural networks into a multi-stage framework, have achieved remarkable accomplishments in Image Restoration (IR), such as spectral imaging reconstruction, compressive sensing and super-resolution. It unfolds the iterative optimization steps into a stack of sequentially linked blocks. Each block consists of a Gradient Descent Module (GDM) and a Proximal Mapping Module (PMM) which is equivalent to a denoiser from a Bayesian perspective, operating on Gaussian noise with a known level. However, existing DUNs suffer from two critical limitations: 1) their PMMs share identical architectures and denoising objectives across stages, ignoring the need for stage-specific adaptation to varying noise levels; and 2) their chain of structurally repetitive blocks results in severe parameter redundancy and high memory consumption, hindering deployment in large-scale or resource-constrained scenarios. To address these challenges, we introduce generalized Deep Low-rank Adaptation (LoRA) Unfolding Networks for image restoration, named LoRun, harmonizing denoising objectives and adapting different denoising levels between stages with compressed memory usage for more efficient DUN. LoRun introduces a novel paradigm where a single pretrained base denoiser is shared across all stages, while lightweight, stage-specific LoRA adapters are injected into the PMMs to dynamically modulate denoising behavior according to the noise level at each unfolding step. This design decouples the core restoration capability from task-specific adaptation, enabling precise control over denoising intensity without duplicating full network parameters and achieving up to $N$ times parameter reduction for an $N$ -stage DUN with on-par or better performance. Extensive experiments conducted on three IR tasks validate the efficiency of our method.

深度展开网络(DUNs)将传统的迭代优化算法和深度神经网络结合成一个多阶段的框架,在光谱成像重建、压缩感知和超分辨率等图像恢复(IR)领域取得了显著的成就。它将迭代优化步骤展开为顺序链接块的堆栈。每个块由一个梯度下降模块(GDM)和一个邻域映射模块(PMM)组成,从贝叶斯的角度来看,邻域映射模块相当于一个去噪器,作用于具有已知水平的高斯噪声。然而,现有的DUNs存在两个关键限制:(i)它们的PMMs在各个阶段共享相同的架构和去噪目标,忽略了对不同噪声水平的特定阶段适应的需要;(ii)它们的结构重复块链导致严重的参数冗余和高内存消耗,阻碍了在大规模或资源受限场景下的部署。为了解决这些挑战,我们引入了用于图像恢复的广义深度低秩自适应(LoRA)展开网络,称为LoRun,它协调去噪目标,并在压缩内存使用的情况下适应不同阶段之间的不同去噪水平,以实现更高效的DUN。LoRun引入了一种新颖的范例,在所有阶段共享单个预训练的基础去噪器,同时将轻量级的,特定于阶段的LoRA适配器注入PMMs,根据每个展开步骤的噪声水平动态调制去噪行为。这种设计将核心恢复能力与特定任务的自适应解耦,能够在不重复整个网络参数的情况下精确控制去噪强度,并为N级DUN实现高达N倍的参数降低,具有同等或更好的性能。在三个红外任务上进行的大量实验验证了我们方法的有效性。
{"title":"Deep LoRA-Unfolding Networks for Image Restoration.","authors":"Xiangming Wang, Haijin Zeng, Benteng Sun, Jiezhang Cao, Kai Zhang, Qiangqiang Shen, Yongyong Chen","doi":"10.1109/TIP.2026.3661406","DOIUrl":"10.1109/TIP.2026.3661406","url":null,"abstract":"<p><p>Deep unfolding networks (DUNs), combining conventional iterative optimization algorithms and deep neural networks into a multi-stage framework, have achieved remarkable accomplishments in Image Restoration (IR), such as spectral imaging reconstruction, compressive sensing and super-resolution. It unfolds the iterative optimization steps into a stack of sequentially linked blocks. Each block consists of a Gradient Descent Module (GDM) and a Proximal Mapping Module (PMM) which is equivalent to a denoiser from a Bayesian perspective, operating on Gaussian noise with a known level. However, existing DUNs suffer from two critical limitations: 1) their PMMs share identical architectures and denoising objectives across stages, ignoring the need for stage-specific adaptation to varying noise levels; and 2) their chain of structurally repetitive blocks results in severe parameter redundancy and high memory consumption, hindering deployment in large-scale or resource-constrained scenarios. To address these challenges, we introduce generalized Deep Low-rank Adaptation (LoRA) Unfolding Networks for image restoration, named LoRun, harmonizing denoising objectives and adapting different denoising levels between stages with compressed memory usage for more efficient DUN. LoRun introduces a novel paradigm where a single pretrained base denoiser is shared across all stages, while lightweight, stage-specific LoRA adapters are injected into the PMMs to dynamically modulate denoising behavior according to the noise level at each unfolding step. This design decouples the core restoration capability from task-specific adaptation, enabling precise control over denoising intensity without duplicating full network parameters and achieving up to $N$ times parameter reduction for an $N$ -stage DUN with on-par or better performance. Extensive experiments conducted on three IR tasks validate the efficiency of our method.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1858-1869"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146159653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting HDR Image Reconstruction via Semantic Knowledge Transfer. 基于语义知识转移的HDR图像重建。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3652360
Tao Hu, Longyao Wu, Wei Dong, Peng Wu, Jinqiu Sun, Xiaogang Xu, Qingsen Yan, Yanning Zhang

Recovering High Dynamic Range (HDR) images from multiple Standard Dynamic Range (SDR) images becomes challenging when the SDR images exhibit noticeable degradation and missing content. Leveraging scene-specific semantic priors offers a promising solution for restoring heavily degraded regions. However, these priors are typically extracted from sRGB SDR images, the domain/format gap poses a significant challenge when applying it to HDR imaging. To address this issue, we propose a general framework that transfers semantic knowledge derived from SDR domain via self-distillation to boost existing HDR reconstruction. Specifically, the proposed framework first introduces the Semantic Priors Guided Reconstruction Model (SPGRM), which leverages SDR image semantic knowledge to address ill-posed problems in the initial HDR reconstruction results. Subsequently, we leverage a self-distillation mechanism that constrains the color and content information with semantic knowledge, aligning the external outputs between the baseline and SPGRM. Furthermore, to transfer the semantic knowledge of the internal features, we utilize a Semantic Knowledge Alignment Module (SKAM) to fill the missing semantic contents with the complementary masks. Extensive experiments demonstrate that our framework significantly boosts HDR imaging quality for existing methods without altering the network architecture.

当标准动态范围(SDR)图像表现出明显的退化和内容缺失时,从多个标准动态范围(SDR)图像中恢复高动态范围(HDR)图像变得具有挑战性。利用场景特定的语义先验为恢复严重退化的区域提供了一个有希望的解决方案。然而,这些先验通常是从sRGB SDR图像中提取的,当将其应用于HDR成像时,域/格式差距会带来重大挑战。为了解决这个问题,我们提出了一个通用框架,该框架通过自蒸馏来传输从SDR域获得的语义知识,以促进现有的HDR重建。具体而言,该框架首先引入了语义先验引导重建模型(SPGRM),该模型利用SDR图像语义知识来解决初始HDR重建结果中的不适定问题。随后,我们利用一种自蒸馏机制,用语义知识约束颜色和内容信息,在基线和SPGRM之间对齐外部输出。此外,为了传递内部特征的语义知识,我们利用语义知识对齐模块(semantic knowledge Alignment Module, SKAM)用互补掩码填充缺失的语义内容。大量的实验表明,我们的框架在不改变网络架构的情况下显著提高了现有方法的HDR成像质量。
{"title":"Boosting HDR Image Reconstruction via Semantic Knowledge Transfer.","authors":"Tao Hu, Longyao Wu, Wei Dong, Peng Wu, Jinqiu Sun, Xiaogang Xu, Qingsen Yan, Yanning Zhang","doi":"10.1109/TIP.2026.3652360","DOIUrl":"10.1109/TIP.2026.3652360","url":null,"abstract":"<p><p>Recovering High Dynamic Range (HDR) images from multiple Standard Dynamic Range (SDR) images becomes challenging when the SDR images exhibit noticeable degradation and missing content. Leveraging scene-specific semantic priors offers a promising solution for restoring heavily degraded regions. However, these priors are typically extracted from sRGB SDR images, the domain/format gap poses a significant challenge when applying it to HDR imaging. To address this issue, we propose a general framework that transfers semantic knowledge derived from SDR domain via self-distillation to boost existing HDR reconstruction. Specifically, the proposed framework first introduces the Semantic Priors Guided Reconstruction Model (SPGRM), which leverages SDR image semantic knowledge to address ill-posed problems in the initial HDR reconstruction results. Subsequently, we leverage a self-distillation mechanism that constrains the color and content information with semantic knowledge, aligning the external outputs between the baseline and SPGRM. Furthermore, to transfer the semantic knowledge of the internal features, we utilize a Semantic Knowledge Alignment Module (SKAM) to fill the missing semantic contents with the complementary masks. Extensive experiments demonstrate that our framework significantly boosts HDR imaging quality for existing methods without altering the network architecture.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1910-1922"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UGAE: Unified Geometry and Attribute Enhancement for G-PCC Compressed Point Clouds. UGAE: G-PCC压缩点云的统一几何和属性增强。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654348
Pan Zhao, Hui Yuan, Chongzhen Tian, Tian Guo, Raouf Hamzaoui, Zhigeng Pan

Lossy compression of point clouds reduces storage and transmission costs; however, it inevitably leads to irreversible distortion in geometry structure and attribute information. To address these issues, we propose a unified geometry and attribute enhancement (UGAE) framework, which consists of three core components: post-geometry enhancement (PoGE), pre-attribute enhancement (PAE), and post-attribute enhancement (PoAE). In PoGE, a Transformer-based sparse convolutional U-Net is used to reconstruct the geometry structure with high precision by predicting voxel occupancy probabilities. Building on the refined geometry structure, PAE introduces an innovative enhanced geometry-guided recoloring strategy, which uses a detail-aware K-Nearest Neighbors (DA-KNN) method to achieve accurate recoloring and effectively preserve high-frequency details before attribute compression. Finally, at the decoder side, PoAE uses an attribute residual prediction network with a weighted mean squared error (W-MSE) loss to enhance the quality of high-frequency regions while maintaining the fidelity of low-frequency regions. UGAE significantly outperformed existing methods on three benchmark datasets: 8iVFB, Owlii, and MVUB. Compared to the latest G-PCC test model (TMC13v29), in terms of total bitrate setting, UGAE achieved an average BD-PSNR gain of 9.98 dB and -90.54% BD-bitrate for geometry under the D1 metric, as well as a 3.34 dB BD-PSNR improvement with -55.53% BD-bitrate for attributes. Additionally, it improved perceptual quality significantly. Our source code will be released on GitHub at: https://github.com/yuanhui0325/UGAE.

点云的有损压缩降低了存储和传输成本;然而,它不可避免地导致几何结构和属性信息的不可逆畸变。为了解决这些问题,我们提出了一个统一的几何和属性增强(UGAE)框架,该框架由三个核心组件组成:后几何增强(PoGE)、前属性增强(PAE)和后属性增强(PoAE)。在PoGE中,利用基于transformer的稀疏卷积U-Net,通过预测体素占用概率,高精度地重建几何结构。在精细几何结构的基础上,PAE引入了一种创新的增强几何引导重着色策略,该策略使用细节感知的k -近邻(DA-KNN)方法来实现精确的重着色,并在属性压缩之前有效地保留高频细节。最后,在解码器端,PoAE使用加权均方误差(W-MSE)损失的属性残差预测网络来提高高频区域的质量,同时保持低频区域的保真度。UGAE在三个基准数据集(8iVFB、Owlii和MVUB)上的性能明显优于现有方法。与最新的G-PCC测试模型(TMC13v29)相比,在总比特率设置方面,UGAE在D1度量下实现了9.98 dB的平均BD-PSNR增益和-90.54%的几何图形bd -比特率,以及3.34 dB的BD-PSNR改进和-55.53%的属性bd -比特率。此外,它还显著提高了感知质量。我们的源代码将在GitHub上发布:https://github.com/yuanhui0325/UGAE。
{"title":"UGAE: Unified Geometry and Attribute Enhancement for G-PCC Compressed Point Clouds.","authors":"Pan Zhao, Hui Yuan, Chongzhen Tian, Tian Guo, Raouf Hamzaoui, Zhigeng Pan","doi":"10.1109/TIP.2026.3654348","DOIUrl":"10.1109/TIP.2026.3654348","url":null,"abstract":"<p><p>Lossy compression of point clouds reduces storage and transmission costs; however, it inevitably leads to irreversible distortion in geometry structure and attribute information. To address these issues, we propose a unified geometry and attribute enhancement (UGAE) framework, which consists of three core components: post-geometry enhancement (PoGE), pre-attribute enhancement (PAE), and post-attribute enhancement (PoAE). In PoGE, a Transformer-based sparse convolutional U-Net is used to reconstruct the geometry structure with high precision by predicting voxel occupancy probabilities. Building on the refined geometry structure, PAE introduces an innovative enhanced geometry-guided recoloring strategy, which uses a detail-aware K-Nearest Neighbors (DA-KNN) method to achieve accurate recoloring and effectively preserve high-frequency details before attribute compression. Finally, at the decoder side, PoAE uses an attribute residual prediction network with a weighted mean squared error (W-MSE) loss to enhance the quality of high-frequency regions while maintaining the fidelity of low-frequency regions. UGAE significantly outperformed existing methods on three benchmark datasets: 8iVFB, Owlii, and MVUB. Compared to the latest G-PCC test model (TMC13v29), in terms of total bitrate setting, UGAE achieved an average BD-PSNR gain of 9.98 dB and -90.54% BD-bitrate for geometry under the D1 metric, as well as a 3.34 dB BD-PSNR improvement with -55.53% BD-bitrate for attributes. Additionally, it improved perceptual quality significantly. Our source code will be released on GitHub at: https://github.com/yuanhui0325/UGAE.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"888-903"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DVG-Diffusion: Dual-View-Guided Diffusion Model for CT Reconstruction From X-Rays. dvg扩散:x射线CT重建的双视图引导扩散模型。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3655171
Xing Xie, Jiawei Liu, Huijie Fan, Zhi Han, Yandong Tang, Liangqiong Qu

Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented. The model and code are available at https://github.com/xiexing0916/DVG-Diffusion.

使用端到端深度学习网络从少量2D x射线直接重建3D CT体是一项具有挑战性的任务,因为x射线图像仅仅是3D CT体的投影视图。在这项工作中,我们通过引入新的视图合成来促进复杂的2D x射线图像到3D CT的映射,并通过视图引导特征对齐来降低学习难度。具体而言,我们提出了一种双视图引导扩散模型(DVG-Diffusion),该模型将真实输入x射线视图和合成的新x射线视图耦合在一起,共同指导CT重建。首先,一种新颖的视图参数引导编码器捕获与CT空间对齐的x射线的特征。接下来,我们将提取的双视图特征连接起来,作为潜在扩散模型学习和改进CT潜在表示的条件。最后,将CT潜表示解码为像素空间的CT体。通过结合视图参数引导编码和双视图引导CT重建,我们的DVG-Diffusion可以在CT重建的高保真度和感知质量之间实现有效的平衡。实验结果表明,我们的方法优于最先进的方法。在实验的基础上,对视图和重构进行了综合分析和讨论。该模型和代码可在https://github.com/xiexing0916/DVG-Diffusion上获得。
{"title":"DVG-Diffusion: Dual-View-Guided Diffusion Model for CT Reconstruction From X-Rays.","authors":"Xing Xie, Jiawei Liu, Huijie Fan, Zhi Han, Yandong Tang, Liangqiong Qu","doi":"10.1109/TIP.2026.3655171","DOIUrl":"10.1109/TIP.2026.3655171","url":null,"abstract":"<p><p>Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented. The model and code are available at https://github.com/xiexing0916/DVG-Diffusion.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1158-1173"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LaCon: Late-Constraint Controllable Visual Generation. LaCon:后约束可控视觉生成。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654412
Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Xin Luo, Dong Liu

Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process of diffusion models, previous studies normally adopt extra modules to integrate condition signals by manipulating the intermediate features of the noise predictors, where they often fail in conditions not seen in the training. Although subsequent studies are motivated to handle multi-condition control, they are mostly resource-consuming to implement, where more generalizable and efficient solutions are expected for controllable visual generation. In this paper, we present a late-constraint controllable visual generation method, namely LaCon, which enables generalization across various modalities and granularities for each single-condition control. LaCon establishes an alignment between the external condition and specific diffusion timesteps, and guides diffusion models to produce conditional results based on this built alignment. Experimental results on prevailing benchmark datasets illustrate the promising performance and generalization capability of LaCon under various conditions and settings. Ablation studies analyze different components in LaCon, illustrating its great potential to offer flexible condition controls for different backbones.

扩散模型在生成照片真实感和创造性图像方面表现出令人印象深刻的能力。为了给扩散模型的生成过程提供更多的可控性,以往的研究通常采用额外的模块,通过操纵噪声预测器的中间特征来整合条件信号,而在训练中没有看到的条件下,它们往往会失败。虽然后续研究的动机是处理多条件控制,但它们大多是资源消耗的实现,其中更通用和有效的解决方案,期望可控的视觉生成。在本文中,我们提出了一种后约束可控视觉生成方法,即LaCon,它可以实现对每个单条件控制的各种模态和粒度的泛化。LaCon在外部条件和特定扩散时间步之间建立了一种一致性,并指导扩散模型基于这种构建的一致性产生条件结果。在主流基准数据集上的实验结果表明,LaCon在各种条件和设置下具有良好的性能和泛化能力。消融研究分析了LaCon的不同成分,说明了其为不同骨干提供灵活状态控制的巨大潜力。
{"title":"LaCon: Late-Constraint Controllable Visual Generation.","authors":"Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Xin Luo, Dong Liu","doi":"10.1109/TIP.2026.3654412","DOIUrl":"10.1109/TIP.2026.3654412","url":null,"abstract":"<p><p>Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process of diffusion models, previous studies normally adopt extra modules to integrate condition signals by manipulating the intermediate features of the noise predictors, where they often fail in conditions not seen in the training. Although subsequent studies are motivated to handle multi-condition control, they are mostly resource-consuming to implement, where more generalizable and efficient solutions are expected for controllable visual generation. In this paper, we present a late-constraint controllable visual generation method, namely LaCon, which enables generalization across various modalities and granularities for each single-condition control. LaCon establishes an alignment between the external condition and specific diffusion timesteps, and guides diffusion models to produce conditional results based on this built alignment. Experimental results on prevailing benchmark datasets illustrate the promising performance and generalization capability of LaCon under various conditions and settings. Ablation studies analyze different components in LaCon, illustrating its great potential to offer flexible condition controls for different backbones.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1111-1126"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning-Based Joint Geometry and Attribute Up-Sampling for Large-Scale Colored Point Clouds. 基于深度学习的大规模彩色点云联合几何和属性上采样。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3657214
Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong

Colored point cloud comprising geometry and attribute components is one of the mainstream representations enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method, which learns to model both geometry and attribute patterns and leverages the spatial attribute correlation. Firstly, we establish and release a large-scale dataset for colored point cloud up-sampling, named SYSU-PCUD, which has 121 large-scale colored point clouds with diverse geometry and attribute complexities in six categories and four sampling rates. Secondly, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework to up-sample the geometry and attribute jointly. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Thirdly, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarsely up-sampled attributes for each point. Then, we propose an attribute enhancement module to refine the up-sampled attributes and generate high quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU are 33.90 dB, 32.10 dB, 31.10 dB, and 30.39 dB when up-sampling rates are $4times $ , $8times $ , $12times $ , and $16times $ , respectively. Compared to the state-of-the-art schemes, the JGAU achieves an average of 2.32 dB, 2.47 dB, 2.28 dB and 2.11 dB PSNR gains at four up-sampling rates, respectively, which are significant. The code is released with https://github.com/SYSU-Video/JGAU.

由几何和属性组件组成的彩色点云是实现逼真和沉浸式3D应用的主流表示之一。为了生成大规模和更密集的彩色点云,我们提出了一种基于深度学习的联合几何和属性上采样(JGAU)方法,该方法学习建模几何和属性模式,并利用空间属性相关性。首先,建立并发布了大规模彩色点云上采样数据集SYSU-PCUD,该数据集包含6类、4种采样率、121种不同几何和属性复杂度的大规模彩色点云。其次,为了提高点云上采样的质量,提出了一种基于深度学习的JGAU框架,对几何和属性进行联合上采样。它由一个几何上采样网络和一个属性上采样网络组成,后者利用上采样的辅助几何来建模属性的邻域相关性。第三,提出了几何距离加权属性插值(GDWAI)和基于深度学习的属性插值(DLAI)两种粗属性上采样方法,对每个点进行粗属性上采样。然后,我们提出了一个属性增强模块,通过进一步挖掘固有属性和几何模式来细化上采样属性,生成高质量的点云。大量实验表明,当上采样率为4倍、8倍、12倍和16倍时,JGAU的峰值信噪比分别为33.90 dB、32.10 dB、31.10 dB和30.39 dB。与最先进的方案相比,JGAU在四种上采样率下分别实现了2.32 dB、2.47 dB、2.28 dB和2.11 dB的平均PSNR增益,这是非常显著的。
{"title":"Deep Learning-Based Joint Geometry and Attribute Up-Sampling for Large-Scale Colored Point Clouds.","authors":"Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong","doi":"10.1109/TIP.2026.3657214","DOIUrl":"10.1109/TIP.2026.3657214","url":null,"abstract":"<p><p>Colored point cloud comprising geometry and attribute components is one of the mainstream representations enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method, which learns to model both geometry and attribute patterns and leverages the spatial attribute correlation. Firstly, we establish and release a large-scale dataset for colored point cloud up-sampling, named SYSU-PCUD, which has 121 large-scale colored point clouds with diverse geometry and attribute complexities in six categories and four sampling rates. Secondly, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework to up-sample the geometry and attribute jointly. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Thirdly, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarsely up-sampled attributes for each point. Then, we propose an attribute enhancement module to refine the up-sampled attributes and generate high quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU are 33.90 dB, 32.10 dB, 31.10 dB, and 30.39 dB when up-sampling rates are $4times $ , $8times $ , $12times $ , and $16times $ , respectively. Compared to the state-of-the-art schemes, the JGAU achieves an average of 2.32 dB, 2.47 dB, 2.28 dB and 2.11 dB PSNR gains at four up-sampling rates, respectively, which are significant. The code is released with https://github.com/SYSU-Video/JGAU.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1305-1320"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causally-Aware Unsupervised Feature Selection Learning. 因果意识无监督特征选择学习。
IF 13.7 Pub Date : 2026-01-01 DOI: 10.1109/TIP.2026.3654354
Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li

Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.

近年来,无监督特征选择(Unsupervised feature selection, UFS)因其在处理未标记高维数据方面的有效性而备受关注。然而,现有的方法忽略了数据内在的因果机制,导致选择不相关的特征和较差的可解释性。此外,以前基于图的方法在构建相似图时没有考虑到非因果和因果特征的不同影响,从而导致生成图中的假链接。为了解决这些问题,提出了一种新的UFS方法,称为因果感知无监督特征选择学习(CAUSE-FS)。CAUSE-FS引入了一种新的因果正则器,它重新加权样本以平衡每个处理特征的混杂分布。该正则化器随后集成到广义无监督谱回归模型中,以减轻特征和聚类标签之间的虚假关联,从而实现因果特征选择。此外,CAUSE-FS采用因果引导的分层聚类,将不同因果贡献的特征划分为多个粒度。CAUSE-FS通过对不同粒度自适应学习的相似图进行整合,提高了构建融合相似图时因果特征的重要性,以捕获可靠的数据局部结构。大量的实验结果证明了CAUSE-FS优于最先进的方法,并通过特征可视化进一步验证了其可解释性。
{"title":"Causally-Aware Unsupervised Feature Selection Learning.","authors":"Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li","doi":"10.1109/TIP.2026.3654354","DOIUrl":"10.1109/TIP.2026.3654354","url":null,"abstract":"<p><p>Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":"1011-1024"},"PeriodicalIF":13.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1