首页 > 最新文献

IEEE Transactions on Image Processing最新文献

英文 中文
CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation CycleDiff:非配对图像到图像转换的循环扩散模型
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1109/tip.2026.3657240
Shilong Zou, Yuhang Huang, Renjiao Yi, Chenyang Zhu, Kai Xu
{"title":"CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation","authors":"Shilong Zou, Yuhang Huang, Renjiao Yi, Chenyang Zhu, Kai Xu","doi":"10.1109/tip.2026.3657240","DOIUrl":"https://doi.org/10.1109/tip.2026.3657240","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"87 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146089898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Broadcast-Gated Attention with Identity Adaptive Integration for Efficient Image Super-Resolution. 基于身份自适应集成的广播门控注意力高效图像超分辨率。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1109/tip.2026.3657640
Qian Wang,Yanyu Mao,Ruilong Guo,Mengyang Wang,Jing Wei,Han Pan
Efficient image super-resolution (SR) models are essential for achieving high-quality image reconstruction with reduced computational complexity, particularly in resource-constrained environments. In this paper, we introduce a novel self-attention mechanism, Broadcast-Gated Attention with Identity Adaptive Integration (BGAI). Then, based on this mechanism, we design a lightweight super-resolution network that achieves state-of-the-art performance with minimal computational cost. By observing the sparsity and convergence properties of self-attention, BGAI optimizes computational resource utilization through the effective broadcasting of meaningful features across attention heads and network layers. A key innovation in BGAI is the Broadcast-Gated Multi-head Self-Attention (BGMSA) mechanism, which employs a dedicated head to capture and integrate long-range dependencies, broadcasting this broader contextual information to local attention heads. This design enhances long-range interaction modeling while minimizing redundant computations. Additionally, the Identity Attention Adaptive Integration (IAAI) mechanism facilitates efficient feature propagation by leveraging the continuity in dependencies across layers, with a focus on dynamic variations to improve representational efficiency and accelerate convergence. Comprehensive experiments on standard benchmarks demonstrate that BGAI achieves high-fidelity super-resolution while reducing the number of parameters and FLOPs by up to 35% compared with existing lightweight methods. These results establish BGAI as a robust and scalable solution for resource-efficient SR, with significant potential for deployment in real-world, high-resolution image processing applications. The code and trained models are publicly available at https://github.com/bbbolt/BGAI.
高效的图像超分辨率(SR)模型对于实现低计算复杂度的高质量图像重建至关重要,特别是在资源受限的环境中。本文介绍了一种新的自注意机制——身份自适应集成广播门控注意(BGAI)。然后,基于该机制,我们设计了一个轻量级的超分辨率网络,以最小的计算成本实现了最先进的性能。通过观察自注意的稀疏性和收敛性,BGAI通过在注意头和网络层之间有效地广播有意义的特征来优化计算资源利用。BGAI的一个关键创新是广播门控制多头自注意(BGMSA)机制,它使用专用的头部来捕获和集成远程依赖,将这些更广泛的上下文信息广播到本地注意力头部。这种设计增强了远程交互建模,同时最大限度地减少了冗余计算。此外,身份注意自适应集成(IAAI)机制通过利用跨层依赖关系的连续性来促进有效的特征传播,重点关注动态变化以提高表示效率并加速收敛。在标准基准测试上的综合实验表明,与现有的轻量级方法相比,BGAI实现了高保真的超分辨率,同时减少了高达35%的参数数量和FLOPs。这些结果表明,BGAI是一种强大的、可扩展的资源节能型SR解决方案,具有在现实世界中部署高分辨率图像处理应用的巨大潜力。代码和经过训练的模型可在https://github.com/bbbolt/BGAI上公开获得。
{"title":"Broadcast-Gated Attention with Identity Adaptive Integration for Efficient Image Super-Resolution.","authors":"Qian Wang,Yanyu Mao,Ruilong Guo,Mengyang Wang,Jing Wei,Han Pan","doi":"10.1109/tip.2026.3657640","DOIUrl":"https://doi.org/10.1109/tip.2026.3657640","url":null,"abstract":"Efficient image super-resolution (SR) models are essential for achieving high-quality image reconstruction with reduced computational complexity, particularly in resource-constrained environments. In this paper, we introduce a novel self-attention mechanism, Broadcast-Gated Attention with Identity Adaptive Integration (BGAI). Then, based on this mechanism, we design a lightweight super-resolution network that achieves state-of-the-art performance with minimal computational cost. By observing the sparsity and convergence properties of self-attention, BGAI optimizes computational resource utilization through the effective broadcasting of meaningful features across attention heads and network layers. A key innovation in BGAI is the Broadcast-Gated Multi-head Self-Attention (BGMSA) mechanism, which employs a dedicated head to capture and integrate long-range dependencies, broadcasting this broader contextual information to local attention heads. This design enhances long-range interaction modeling while minimizing redundant computations. Additionally, the Identity Attention Adaptive Integration (IAAI) mechanism facilitates efficient feature propagation by leveraging the continuity in dependencies across layers, with a focus on dynamic variations to improve representational efficiency and accelerate convergence. Comprehensive experiments on standard benchmarks demonstrate that BGAI achieves high-fidelity super-resolution while reducing the number of parameters and FLOPs by up to 35% compared with existing lightweight methods. These results establish BGAI as a robust and scalable solution for resource-efficient SR, with significant potential for deployment in real-world, high-resolution image processing applications. The code and trained models are publicly available at https://github.com/bbbolt/BGAI.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"281 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146073164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SACMark: Spatial-Angle Consistency Watermarking Network for Light Field Image Copyright Protection. 用于光场图像版权保护的空间角度一致性水印网络。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1109/tip.2026.3657635
Junfeng Guo,Hui Wang,Shouxin Liu,Yushu Zhang,Zhongyun Hua,Seok-Tae Kim,Xiaowei Li
Light Field (LF) images provide rich visual representations of 3D scenes by capturing both spatial and angular information of light rays. However, their high dimensions present substantial challenges for conventional 2D image watermarking techniques in effectively ensuring copyright protection. In this work, we propose a deep learning-based Spatial-Angular Consistency waterMarking (SACMark) network, designed to address the unique challenges of watermark embedding and extraction in LF images. SACMark employs a spatial-angular feature extraction module to capture the multidimensional information of LF images and introduces consistency matching and fusion strategies to enhance feature utilization. The network adopts an encoder-noise-decoder architecture, optimized through adversarial training to improve the imperceptibility and robustness of the watermark. Experimental results demonstrate that SACMark maintains high visual quality across various embedding capacities and has minimal impact on depth estimation. Compared to traditional LF watermarking approaches and existing deep learning-based methods for 2D images, SACMark demonstrates improved resilience to noise while preserving essential LF characteristics. These findings suggest that SACMark holds promise for practical applications and may contribute to future developments in secure and adaptive LF image protection.
光场(LF)图像通过捕捉光线的空间和角度信息,提供丰富的3D场景视觉表现。然而,它们的高维给传统的二维图像水印技术在有效保护版权方面带来了巨大的挑战。在这项工作中,我们提出了一种基于深度学习的空间-角度一致性水印(SACMark)网络,旨在解决低频图像中水印嵌入和提取的独特挑战。SACMark采用空间-角度特征提取模块捕获LF图像的多维信息,并引入一致性匹配和融合策略,提高特征利用率。该网络采用编码器-噪声-解码器结构,并通过对抗性训练进行优化,提高了水印的不可感知性和鲁棒性。实验结果表明,SACMark在各种嵌入容量下都能保持较高的视觉质量,并且对深度估计的影响最小。与传统的LF水印方法和现有的基于深度学习的2D图像水印方法相比,SACMark在保留LF基本特征的同时提高了对噪声的恢复能力。这些发现表明SACMark具有实际应用的前景,并可能有助于未来安全和自适应LF图像保护的发展。
{"title":"SACMark: Spatial-Angle Consistency Watermarking Network for Light Field Image Copyright Protection.","authors":"Junfeng Guo,Hui Wang,Shouxin Liu,Yushu Zhang,Zhongyun Hua,Seok-Tae Kim,Xiaowei Li","doi":"10.1109/tip.2026.3657635","DOIUrl":"https://doi.org/10.1109/tip.2026.3657635","url":null,"abstract":"Light Field (LF) images provide rich visual representations of 3D scenes by capturing both spatial and angular information of light rays. However, their high dimensions present substantial challenges for conventional 2D image watermarking techniques in effectively ensuring copyright protection. In this work, we propose a deep learning-based Spatial-Angular Consistency waterMarking (SACMark) network, designed to address the unique challenges of watermark embedding and extraction in LF images. SACMark employs a spatial-angular feature extraction module to capture the multidimensional information of LF images and introduces consistency matching and fusion strategies to enhance feature utilization. The network adopts an encoder-noise-decoder architecture, optimized through adversarial training to improve the imperceptibility and robustness of the watermark. Experimental results demonstrate that SACMark maintains high visual quality across various embedding capacities and has minimal impact on depth estimation. Compared to traditional LF watermarking approaches and existing deep learning-based methods for 2D images, SACMark demonstrates improved resilience to noise while preserving essential LF characteristics. These findings suggest that SACMark holds promise for practical applications and may contribute to future developments in secure and adaptive LF image protection.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"3 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146073158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep G-PCC Geometry Preprocessing via Joint Optimization with a Differentiable Codec Surrogate for Enhanced Compression Efficiency. 基于可微编解码器代理的深度G-PCC几何预处理提高压缩效率。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tip.2026.3655187
Wanhao Ma,Wei Zhang,Shuai Wan,Fuzheng Yang
Geometry-based point cloud compression (G-PCC), an international standard designed by MPEG, provides a generic framework for compressing diverse types of point clouds while ensuring interoperability across applications and devices. However, G-PCC underperforms compared to recent deep learning-based PCC methods despite its lower computational power consumption. To enhance the efficiency of G-PCC without sacrificing its interoperability or computational flexibility, we propose the first compression-oriented point cloud voxelization network jointly optimized with a differentiable G-PCC surrogate model. The surrogate model mimics the rate-distortion behavior of the non-differentiable G-PCC codec, enabling end-to-end gradient propagation. The versatile voxelization network adaptively transforms input point clouds using learning-based voxelization and effectively manipulates point clouds via global scaling, fine-grained pruning, and point-level editing for rate-distortion trade-off. During inference, only the lightweight voxelization network is prepended to the G-PCC encoder, requiring no modifications to the decoder, thus introducing no computational overhead for end users. Extensive experiments demonstrate a 38.84% average BD-rate reduction over G-PCC. By bridging classical codecs with deep learning, this work offers a practical pathway to enhance legacy compression standards while preserving their backward compatibility, making it ideal for real-world deployment.
基于几何的点云压缩(G-PCC)是由MPEG设计的一项国际标准,它为压缩不同类型的点云提供了通用框架,同时确保了应用程序和设备之间的互操作性。然而,尽管G-PCC的计算功耗较低,但与最近基于深度学习的PCC方法相比,G-PCC表现不佳。为了在不牺牲其互操作性和计算灵活性的前提下提高G-PCC的效率,我们提出了第一个与可微G-PCC代理模型联合优化的面向压缩的点云体素化网络。代理模型模拟不可微G-PCC编解码器的速率失真行为,实现端到端梯度传播。通用体素化网络使用基于学习的体素化自适应转换输入点云,并通过全局缩放、细粒度剪枝和点级编辑有效地操纵点云,以实现率失真权衡。在推理过程中,G-PCC编码器只添加轻量级体素化网络,不需要修改解码器,因此不会给最终用户带来计算开销。大量实验表明,与G-PCC相比,平均bd速率降低了38.84%。通过将经典编解码器与深度学习相结合,这项工作提供了一种实用的途径来增强传统压缩标准,同时保持其向后兼容性,使其成为现实世界部署的理想选择。
{"title":"Deep G-PCC Geometry Preprocessing via Joint Optimization with a Differentiable Codec Surrogate for Enhanced Compression Efficiency.","authors":"Wanhao Ma,Wei Zhang,Shuai Wan,Fuzheng Yang","doi":"10.1109/tip.2026.3655187","DOIUrl":"https://doi.org/10.1109/tip.2026.3655187","url":null,"abstract":"Geometry-based point cloud compression (G-PCC), an international standard designed by MPEG, provides a generic framework for compressing diverse types of point clouds while ensuring interoperability across applications and devices. However, G-PCC underperforms compared to recent deep learning-based PCC methods despite its lower computational power consumption. To enhance the efficiency of G-PCC without sacrificing its interoperability or computational flexibility, we propose the first compression-oriented point cloud voxelization network jointly optimized with a differentiable G-PCC surrogate model. The surrogate model mimics the rate-distortion behavior of the non-differentiable G-PCC codec, enabling end-to-end gradient propagation. The versatile voxelization network adaptively transforms input point clouds using learning-based voxelization and effectively manipulates point clouds via global scaling, fine-grained pruning, and point-level editing for rate-distortion trade-off. During inference, only the lightweight voxelization network is prepended to the G-PCC encoder, requiring no modifications to the decoder, thus introducing no computational overhead for end users. Extensive experiments demonstrate a 38.84% average BD-rate reduction over G-PCC. By bridging classical codecs with deep learning, this work offers a practical pathway to enhance legacy compression standards while preserving their backward compatibility, making it ideal for real-world deployment.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"42 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain-Complementary Prior with Fine-Grained Feedback for Scene Text Image Super-Resolution. 基于细粒度反馈的场景文本图像超分辨率域互补先验。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tip.2026.3657246
Shen Zhang,Yang Li,Pengwen Dai,Xiaozhou Zhou,Guotao Xie
Enhancing the resolution of scene text images is a critical preprocessing step that can substantially improve the accuracy of downstream text recognition in low-quality images. Existing methods primarily rely on auxiliary text features to guide the super-resolution process. However, these features often lack rich low-level information, making them insufficient for faithfully reconstructing both the global structure and fine-grained details of text. Moreover, previous methods often learn suboptimal feature representations from the original low-quality landmark images, which cannot provide precise guidance for super-resolution. In this study, we propose a Fine-Grained Feedback Domain-Complementary Network (FDNet) for scene text image super-resolution. Specifically, we first employ a fine-grained feedback mechanism to selectively refine landmark images, thereby enhancing feature representations. Then, we introduce a novel domain-trace prior interaction generator, which integrates domain-specific traces with a text prior to comprehensively complement the clear edges and structural coverage of the text. Finally, motivated by the limitations of existing datasets, which often exhibit limited scene scales and insufficient challenging scenarios, we introduce a new dataset, MDRText. The proposed dataset MDRText features multi-scale and diverse characteristics and is designed to support challenging text image recognition and super-resolution tasks. Extensive experiments on the MDRText and TextZoom datasets demonstrate that our method achieves superior performance in scene text image super-resolution and further improves the accuracy of subsequent recognition tasks.
提高场景文本图像的分辨率是一个关键的预处理步骤,可以大大提高低质量图像下游文本识别的准确性。现有的方法主要依靠辅助文本特征来指导超分辨率过程。然而,这些特征往往缺乏丰富的底层信息,不足以忠实地重建文本的全局结构和细粒度细节。此外,以前的方法往往从原始的低质量地标图像中学习次优特征表示,无法为超分辨率提供精确的指导。在这项研究中,我们提出了一种用于场景文本图像超分辨率的细粒度反馈域互补网络(FDNet)。具体来说,我们首先采用细粒度反馈机制来选择性地细化地标图像,从而增强特征表征。然后,我们引入了一种新的领域跟踪先验交互生成器,它将特定于领域的跟踪与文本先验相结合,以全面补充文本的清晰边缘和结构覆盖。最后,由于现有数据集的局限性,通常表现出有限的场景规模和缺乏挑战性的场景,我们引入了一个新的数据集MDRText。提出的数据集MDRText具有多尺度和多样化的特征,旨在支持具有挑战性的文本图像识别和超分辨率任务。在MDRText和TextZoom数据集上的大量实验表明,我们的方法在场景文本图像超分辨率方面取得了优异的性能,并进一步提高了后续识别任务的准确性。
{"title":"Domain-Complementary Prior with Fine-Grained Feedback for Scene Text Image Super-Resolution.","authors":"Shen Zhang,Yang Li,Pengwen Dai,Xiaozhou Zhou,Guotao Xie","doi":"10.1109/tip.2026.3657246","DOIUrl":"https://doi.org/10.1109/tip.2026.3657246","url":null,"abstract":"Enhancing the resolution of scene text images is a critical preprocessing step that can substantially improve the accuracy of downstream text recognition in low-quality images. Existing methods primarily rely on auxiliary text features to guide the super-resolution process. However, these features often lack rich low-level information, making them insufficient for faithfully reconstructing both the global structure and fine-grained details of text. Moreover, previous methods often learn suboptimal feature representations from the original low-quality landmark images, which cannot provide precise guidance for super-resolution. In this study, we propose a Fine-Grained Feedback Domain-Complementary Network (FDNet) for scene text image super-resolution. Specifically, we first employ a fine-grained feedback mechanism to selectively refine landmark images, thereby enhancing feature representations. Then, we introduce a novel domain-trace prior interaction generator, which integrates domain-specific traces with a text prior to comprehensively complement the clear edges and structural coverage of the text. Finally, motivated by the limitations of existing datasets, which often exhibit limited scene scales and insufficient challenging scenarios, we introduce a new dataset, MDRText. The proposed dataset MDRText features multi-scale and diverse characteristics and is designed to support challenging text image recognition and super-resolution tasks. Extensive experiments on the MDRText and TextZoom datasets demonstrate that our method achieves superior performance in scene text image super-resolution and further improves the accuracy of subsequent recognition tasks.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"7 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors StealthMark:基于不确定性引导后门的医疗分割的无害和隐形所有权验证
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tip.2026.3655563
Qinkai Yu, Chong Zhang, Gaojie Jin, Tianjin Huang, Wei Zhou, Wenhui Li, Xiaobo Jin, Bo Huang, Yitian Zhao, Guang Yang, Gregory Y.H. Lip, Yalin Zheng, Aline Villavicencio, Yanda Meng
{"title":"StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors","authors":"Qinkai Yu, Chong Zhang, Gaojie Jin, Tianjin Huang, Wei Zhou, Wenhui Li, Xiaobo Jin, Bo Huang, Yitian Zhao, Guang Yang, Gregory Y.H. Lip, Yalin Zheng, Aline Villavicencio, Yanda Meng","doi":"10.1109/tip.2026.3655563","DOIUrl":"https://doi.org/10.1109/tip.2026.3655563","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"35 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AttriPrompt: Class Attribute-aware Prompt Tuning for Vision-Language Model. AttriPrompt:视觉语言模型的类属性感知提示调优。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tip.2026.3657216
Yuling Su,Xueliang Liu,Zhen Huang,Yunwei Zhao,Richang Hong,Meng Wang
Prompt tuning has proven to be an effective alternative for fine-tuning the pre-trained vision-language models (VLMs) to downstream tasks. Among existing approaches, class-shared prompts learn a unified prompt shared across all classes, while sample-specific prompts generate distinct prompts tailored to each individual sample. However, both approaches often struggle to adequately capture the unique characteristics of underrepresented classes, particularly in imbalanced scenarios where data for tail classes is scarce. To alleviate this issue, we propose an attribute-aware prompt tuning framework that prompts a more balanced understanding for imbalance tasks by explicitly modeling critical class-level attributes. The key intuition is that from the perspective of class, essential attributes tend to be relatively consistent across classes, regardless of sample sizes. Specifically, we build an attribute pool to learn potential semantic attributes of classes based on VLMs. For each input sample, we generate a unique attribute-aware prompt by selecting relevant class attributes from this pool through a matching mechanism. This design enables the model to capture essential class semantics and generate informative prompts, even for classes with limited data. Additionally, we introduce a ProAdapter module to facilitate the transfer of foundational knowledge from VLMs while enhancing generalization to underrepresented classes in imbalanced settings. Extensive experiments on standard and imbalance few-shot tasks demonstrate that our model achieves superior performance especially in tail classes.
提示调优已被证明是对预训练的视觉语言模型(vlm)进行下游任务微调的有效替代方法。在现有的方法中,类共享提示学习跨所有类共享的统一提示,而特定于示例的提示生成针对每个单独示例的不同提示。然而,这两种方法往往难以充分捕捉代表性不足的类的独特特征,特别是在尾部类数据稀缺的不平衡场景中。为了缓解这个问题,我们提出了一个属性感知的提示调优框架,该框架通过显式地建模关键的类级属性来促进对不平衡任务的更平衡的理解。关键的直觉是,从类的角度来看,无论样本大小如何,基本属性往往在类之间相对一致。具体来说,我们建立了一个属性池来学习基于vlm的类的潜在语义属性。对于每个输入样本,我们通过匹配机制从这个池中选择相关的类属性,从而生成一个唯一的属性感知提示。这种设计使模型能够捕获基本的类语义并生成信息提示,即使对于数据有限的类也是如此。此外,我们引入了ProAdapter模块,以促进vlm基础知识的转移,同时增强对不平衡设置中代表性不足的类的泛化。在标准任务和不平衡任务上的大量实验表明,我们的模型在尾类任务上取得了优异的性能。
{"title":"AttriPrompt: Class Attribute-aware Prompt Tuning for Vision-Language Model.","authors":"Yuling Su,Xueliang Liu,Zhen Huang,Yunwei Zhao,Richang Hong,Meng Wang","doi":"10.1109/tip.2026.3657216","DOIUrl":"https://doi.org/10.1109/tip.2026.3657216","url":null,"abstract":"Prompt tuning has proven to be an effective alternative for fine-tuning the pre-trained vision-language models (VLMs) to downstream tasks. Among existing approaches, class-shared prompts learn a unified prompt shared across all classes, while sample-specific prompts generate distinct prompts tailored to each individual sample. However, both approaches often struggle to adequately capture the unique characteristics of underrepresented classes, particularly in imbalanced scenarios where data for tail classes is scarce. To alleviate this issue, we propose an attribute-aware prompt tuning framework that prompts a more balanced understanding for imbalance tasks by explicitly modeling critical class-level attributes. The key intuition is that from the perspective of class, essential attributes tend to be relatively consistent across classes, regardless of sample sizes. Specifically, we build an attribute pool to learn potential semantic attributes of classes based on VLMs. For each input sample, we generate a unique attribute-aware prompt by selecting relevant class attributes from this pool through a matching mechanism. This design enables the model to capture essential class semantics and generate informative prompts, even for classes with limited data. Additionally, we introduce a ProAdapter module to facilitate the transfer of foundational knowledge from VLMs while enhancing generalization to underrepresented classes in imbalanced settings. Extensive experiments on standard and imbalance few-shot tasks demonstrate that our model achieves superior performance especially in tail classes.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"3 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain-aware Adversarial Domain Augmentation Network for Hyperspectral Image Classification. 高光谱图像分类的域感知对抗域增强网络。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tip.2026.3657203
Yi Huang,Jiangtao Peng,Weiwei Sun,Na Chen,Zhijing Ye,Qian Du
Classifying hyperspectral remote sensing images across different scenes has recently emerged as a significant challenge. When only historical labeled images (source domain, SD) are available, it is crucial to leverage these images effectively to train a model with strong generalization ability that can be directly applied to classify unseen samples (target domain, TD). To address these challenges, this paper proposes a novel single-domain generalization (SDG) network, termed the domain-aware adversarial domain augmentation network (DADAnet) for cross-scene hyperspectral image classification (HSIC). DADAnet involves two stages: adversarial domain augmentation (ADA) and task-specific training. ADA employs a progressive adversarial generation strategy to construct an augmented domain (AD). To enhance variability in both spatial and spectral dimensions, a domain-aware spatial-spectral mask (DSSM) encoder is constructed to increase the diversity of the generated adversarial samples. Furthermore, a two-level contrastive loss (TCC) is designed and incorporated into the ADA to ensure both the diversity and effectiveness of AD samples. Finally, DADAnet performs supervised learning jointly on the SD and AD during the task-specific training stage. Experimental results on two public hyperspectral image datasets and a new Hangzhouwan (HZW) dataset demonstrate that the proposed DADAnet outperforms existing domain adaptation (DA) and domain generalization (DG) methods, achieving overall accuracies of 80.69%, 63.75%, and 87.61% on three datasets, respectively.
对不同场景的高光谱遥感图像进行分类最近成为一项重大挑战。当只有历史标记图像(源域,SD)可用时,至关重要的是有效利用这些图像来训练具有强大泛化能力的模型,该模型可以直接应用于未见样本(目标域,TD)的分类。为了解决这些挑战,本文提出了一种新的单域泛化(SDG)网络,称为域感知对抗域增强网络(DADAnet),用于跨场景高光谱图像分类(HSIC)。DADAnet包括两个阶段:对抗域增强(ADA)和特定任务训练。ADA采用渐进式对抗生成策略构建增广域。为了增强空间和光谱维度的可变性,构建了一个域感知空间光谱掩码(DSSM)编码器来增加生成的对抗样本的多样性。此外,设计并将两级对比损耗(TCC)纳入ADA,以确保AD样本的多样性和有效性。最后,DADAnet在特定任务的训练阶段对SD和AD共同执行监督学习。在两个公开的高光谱图像数据集和一个新的杭州湾(HZW)数据集上的实验结果表明,所提出的DADAnet优于现有的域自适应(DA)和域概化(DG)方法,在三个数据集上的总体精度分别达到80.69%、63.75%和87.61%。
{"title":"Domain-aware Adversarial Domain Augmentation Network for Hyperspectral Image Classification.","authors":"Yi Huang,Jiangtao Peng,Weiwei Sun,Na Chen,Zhijing Ye,Qian Du","doi":"10.1109/tip.2026.3657203","DOIUrl":"https://doi.org/10.1109/tip.2026.3657203","url":null,"abstract":"Classifying hyperspectral remote sensing images across different scenes has recently emerged as a significant challenge. When only historical labeled images (source domain, SD) are available, it is crucial to leverage these images effectively to train a model with strong generalization ability that can be directly applied to classify unseen samples (target domain, TD). To address these challenges, this paper proposes a novel single-domain generalization (SDG) network, termed the domain-aware adversarial domain augmentation network (DADAnet) for cross-scene hyperspectral image classification (HSIC). DADAnet involves two stages: adversarial domain augmentation (ADA) and task-specific training. ADA employs a progressive adversarial generation strategy to construct an augmented domain (AD). To enhance variability in both spatial and spectral dimensions, a domain-aware spatial-spectral mask (DSSM) encoder is constructed to increase the diversity of the generated adversarial samples. Furthermore, a two-level contrastive loss (TCC) is designed and incorporated into the ADA to ensure both the diversity and effectiveness of AD samples. Finally, DADAnet performs supervised learning jointly on the SD and AD during the task-specific training stage. Experimental results on two public hyperspectral image datasets and a new Hangzhouwan (HZW) dataset demonstrate that the proposed DADAnet outperforms existing domain adaptation (DA) and domain generalization (DG) methods, achieving overall accuracies of 80.69%, 63.75%, and 87.61% on three datasets, respectively.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"296 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Few-Shot Class Incremental Learning Method Using Graph Neural Networks. 基于图神经网络的几次类增量学习方法。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tip.2026.3657170
Yuqian Ma,Youfa Liu,Bo Du
Few-shot class incremental learning (FSCIL) aims to continuously learn new classes from limited training samples while retaining previously acquired knowledge. Existing approaches are not fully capable of balancing stability and plasticity in dynamic scenarios. To overcome this limitation, we introduce a novel FSCIL framework that leverages graph neural networks (GNNs) to model interdependencies between different categories and enhance cross-modal alignment. Our framework incorporates three key components: (1) a Graph Isomorphism Network (GIN) to propagate contextual relationships among prompts; (2) a Hamiltonian Graph Network with Energy Conservation (HGN-EC) to stabilize training dynamics via energy conservation constraints; and (3) an Adversarially Constrained Graph Autoencoder (ACGA) to enforce latent space consistency. By integrating these components with a parameter-efficient CLIP backbone, our method dynamically adapts graph structures to model semantic correlations between textual and visual modalities. Additionally, contrastive learning with energy-based regularization is employed to mitigate catastrophic forgetting and improve generalization. Comprehensive experiments on benchmark datasets validate the framework's incremental accuracy and stability compared to state-of-the-art baselines. This work advances FSCIL by unifying graph-based relational reasoning with physics-inspired optimization, offering a scalable and interpretable framework.
Few-shot class incremental learning (FSCIL)旨在从有限的训练样本中不断学习新的类,同时保留先前获得的知识。现有的方法不能完全平衡动态场景下的稳定性和可塑性。为了克服这一限制,我们引入了一种新的FSCIL框架,该框架利用图神经网络(gnn)来模拟不同类别之间的相互依赖关系,并增强跨模态对齐。我们的框架包含三个关键组件:(1)在提示符之间传播上下文关系的图同构网络(GIN);(2)利用能量守恒约束稳定训练动态的哈密顿图网络(HGN-EC);(3)采用对抗约束图自编码器(ACGA)来增强潜在空间一致性。通过将这些组件与参数高效的CLIP主干集成,我们的方法动态地调整图结构来建模文本和视觉模式之间的语义相关性。此外,采用基于能量的正则化对比学习来减轻灾难性遗忘和提高泛化。在基准数据集上的综合实验验证了该框架与最先进的基线相比的增量精度和稳定性。这项工作通过将基于图的关系推理与物理启发的优化结合起来,提供了一个可扩展和可解释的框架,从而推动了FSCIL的发展。
{"title":"A Few-Shot Class Incremental Learning Method Using Graph Neural Networks.","authors":"Yuqian Ma,Youfa Liu,Bo Du","doi":"10.1109/tip.2026.3657170","DOIUrl":"https://doi.org/10.1109/tip.2026.3657170","url":null,"abstract":"Few-shot class incremental learning (FSCIL) aims to continuously learn new classes from limited training samples while retaining previously acquired knowledge. Existing approaches are not fully capable of balancing stability and plasticity in dynamic scenarios. To overcome this limitation, we introduce a novel FSCIL framework that leverages graph neural networks (GNNs) to model interdependencies between different categories and enhance cross-modal alignment. Our framework incorporates three key components: (1) a Graph Isomorphism Network (GIN) to propagate contextual relationships among prompts; (2) a Hamiltonian Graph Network with Energy Conservation (HGN-EC) to stabilize training dynamics via energy conservation constraints; and (3) an Adversarially Constrained Graph Autoencoder (ACGA) to enforce latent space consistency. By integrating these components with a parameter-efficient CLIP backbone, our method dynamically adapts graph structures to model semantic correlations between textual and visual modalities. Additionally, contrastive learning with energy-based regularization is employed to mitigate catastrophic forgetting and improve generalization. Comprehensive experiments on benchmark datasets validate the framework's incremental accuracy and stability compared to state-of-the-art baselines. This work advances FSCIL by unifying graph-based relational reasoning with physics-inspired optimization, offering a scalable and interpretable framework.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"52 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BP-NeRF: End-to-End Neural Radiance Fields for Sparse Images without Camera Pose in Complex Scenes. BP-NeRF:复杂场景中无相机姿态的稀疏图像的端到端神经辐射场。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tip.2026.3657188
Yaru Qiu,Guoxia Wu,Yuanyuan Sun
Synthesizing novel perspectives of complex scenes in high quality using sparse image sequences, especially for those without camera poses, is a challenging task. The key to enhancing accuracy in such scenarios lies in sufficient prior knowledge and accurate camera motion constraints. Therefore, we propose an end-to-end novel view synthesis network named BP-NeRF. It is capable of using sequences of sparse images captured in indoor and outdoor complex scenes to estimate camera motion trajectories and generate novel view images. Firstly, to address the issue of inaccurate prediction of depth map caused by insufficient overlapping features in sparse images, we designed the RDP-Net module to generate depth maps for sparse image sequences and calculate the depth accuracy of these maps, providing the network with a reliable depth prior. Secondly, to enhance the accuracy of camera pose estimation, we construct a loss function based on the geometric consistency of 2D and 3D feature variations between frames, improving the accuracy and robustness of the network's estimations. We conducted experimental evaluations on the LLFF and Tanks datasets, and the results show that, compared to the current mainstream methods, BP-NeRF can generate more accurate novel views without camera poses.
利用稀疏图像序列合成高质量复杂场景的新视角,特别是对于那些没有相机姿势的场景,是一项具有挑战性的任务。在这种情况下提高精度的关键在于充分的先验知识和准确的摄像机运动约束。因此,我们提出了一种新型的端到端视图合成网络BP-NeRF。它能够使用在室内和室外复杂场景中捕获的稀疏图像序列来估计相机运动轨迹并生成新的视图图像。首先,针对稀疏图像中重叠特征不足导致深度图预测不准确的问题,我们设计了RDP-Net模块,对稀疏图像序列生成深度图,并计算深度图的深度精度,为网络提供可靠的深度先验。其次,为了提高摄像机姿态估计的精度,我们基于帧间二维和三维特征变化的几何一致性构造了一个损失函数,提高了网络估计的精度和鲁棒性。我们对LLFF和Tanks数据集进行了实验评估,结果表明,与目前的主流方法相比,BP-NeRF可以在不需要相机姿态的情况下生成更准确的新视图。
{"title":"BP-NeRF: End-to-End Neural Radiance Fields for Sparse Images without Camera Pose in Complex Scenes.","authors":"Yaru Qiu,Guoxia Wu,Yuanyuan Sun","doi":"10.1109/tip.2026.3657188","DOIUrl":"https://doi.org/10.1109/tip.2026.3657188","url":null,"abstract":"Synthesizing novel perspectives of complex scenes in high quality using sparse image sequences, especially for those without camera poses, is a challenging task. The key to enhancing accuracy in such scenarios lies in sufficient prior knowledge and accurate camera motion constraints. Therefore, we propose an end-to-end novel view synthesis network named BP-NeRF. It is capable of using sequences of sparse images captured in indoor and outdoor complex scenes to estimate camera motion trajectories and generate novel view images. Firstly, to address the issue of inaccurate prediction of depth map caused by insufficient overlapping features in sparse images, we designed the RDP-Net module to generate depth maps for sparse image sequences and calculate the depth accuracy of these maps, providing the network with a reliable depth prior. Secondly, to enhance the accuracy of camera pose estimation, we construct a loss function based on the geometric consistency of 2D and 3D feature variations between frames, improving the accuracy and robustness of the network's estimations. We conducted experimental evaluations on the LLFF and Tanks datasets, and the results show that, compared to the current mainstream methods, BP-NeRF can generate more accurate novel views without camera poses.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"31 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1