首页 > 最新文献

IEEE Transactions on Image Processing最新文献

英文 中文
Underdetermined Blind Source Separation via Weighted Simplex Shrinkage Regularization and Quantum Deep Image Prior. 基于加权单纯形收缩正则化和量子深度图像先验的欠定盲源分离。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-19 DOI: 10.1109/tip.2026.3673957
Chia-Hsiang Lin,Si-Sheng Young
As most optical satellites remotely acquire multispectral images (MSIs) with limited spatial resolution, multispectral unmixing (MU) becomes a critical signal processing technology for analyzing the pure material spectra for high-precision classification and identification. Unlike the widely investigated hyperspectral unmixing (HU) problem, MU is much more challenging as it corresponds to the underdetermined blind source separation (BSS) problem, where the number of sources is larger than the number of available multispectral bands. In this article, we transform MU into its overdetermined counterpart (i.e., HU) by inventing a radically new quantum deep image prior (QDIP), which relies on the virtual band-splitting task conducted on the observed MSI for generating the virtual hyperspectral image (HSI). Then, we perform HU on the virtual HSI to obtain the virtual hyperspectral sources. Though HU is overdetermined, it still suffers from the ill-posed issue, for which we employ the convex geometry structure of the HSI pixels to customize a weighted simplex shrinkage (WSS) regularizer to mitigate the ill-posedness. Finally, the virtual hyperspectral sources are spectrally downsampled to obtain the desired multispectral sources. The proposed geometry/quantum-empowered MU (GQ-μ) algorithm can also effectively obtain the spatial abundance distribution map for each source, where the geometric WSS regularization is adaptively and automatically controlled based on the sparsity pattern of the abundance tensor. Simulation and real-world data experiments demonstrate the practicality of our unsupervised GQ-μ algorithm for the challenging MU task. Ablation study demonstrates the strength of QDIP, not achieved by classical DIP, and validates the mechanics-inspired WSS geometry regularizer. The associated code will be available at https://github.com/IHCLab/GQ-mu.
由于大多数光学卫星远程获取的多光谱图像空间分辨率有限,多光谱解混(MU)成为分析纯物质光谱进行高精度分类识别的关键信号处理技术。与广泛研究的高光谱解混(HU)问题不同,MU的挑战更大,因为它对应的是欠定盲源分离(BSS)问题,其中源的数量大于可用的多光谱带的数量。在本文中,我们通过发明一种全新的量子深度图像先验(QDIP)将MU转换为其超确定的对象物(即HU),该量子深度图像先验(QDIP)依赖于对观测到的MSI进行的虚拟带分裂任务来生成虚拟高光谱图像(HSI)。然后,对虚拟HSI进行HU处理,得到虚拟高光谱源。虽然HU是过度确定的,但它仍然存在不适定问题,为此我们使用HSI像素的凸几何结构来定制加权单纯形收缩(WSS)正则化器来减轻不适定性。最后,对虚拟高光谱源进行光谱下采样,得到期望的多光谱源。本文提出的几何/量子增强MU (GQ-μ)算法还可以有效地获得每个源的空间丰度分布图,其中基于丰度张量的稀疏性模式自适应自动控制几何WSS正则化。仿真和实际数据实验证明了我们的无监督GQ-μ算法对于具有挑战性的MU任务的实用性。烧蚀研究证明了QDIP的强度,这是经典DIP无法达到的,并验证了力学启发的WSS几何正则化器。相关代码可在https://github.com/IHCLab/GQ-mu上获得。
{"title":"Underdetermined Blind Source Separation via Weighted Simplex Shrinkage Regularization and Quantum Deep Image Prior.","authors":"Chia-Hsiang Lin,Si-Sheng Young","doi":"10.1109/tip.2026.3673957","DOIUrl":"https://doi.org/10.1109/tip.2026.3673957","url":null,"abstract":"As most optical satellites remotely acquire multispectral images (MSIs) with limited spatial resolution, multispectral unmixing (MU) becomes a critical signal processing technology for analyzing the pure material spectra for high-precision classification and identification. Unlike the widely investigated hyperspectral unmixing (HU) problem, MU is much more challenging as it corresponds to the underdetermined blind source separation (BSS) problem, where the number of sources is larger than the number of available multispectral bands. In this article, we transform MU into its overdetermined counterpart (i.e., HU) by inventing a radically new quantum deep image prior (QDIP), which relies on the virtual band-splitting task conducted on the observed MSI for generating the virtual hyperspectral image (HSI). Then, we perform HU on the virtual HSI to obtain the virtual hyperspectral sources. Though HU is overdetermined, it still suffers from the ill-posed issue, for which we employ the convex geometry structure of the HSI pixels to customize a weighted simplex shrinkage (WSS) regularizer to mitigate the ill-posedness. Finally, the virtual hyperspectral sources are spectrally downsampled to obtain the desired multispectral sources. The proposed geometry/quantum-empowered MU (GQ-μ) algorithm can also effectively obtain the spatial abundance distribution map for each source, where the geometric WSS regularization is adaptively and automatically controlled based on the sparsity pattern of the abundance tensor. Simulation and real-world data experiments demonstrate the practicality of our unsupervised GQ-μ algorithm for the challenging MU task. Ablation study demonstrates the strength of QDIP, not achieved by classical DIP, and validates the mechanics-inspired WSS geometry regularizer. The associated code will be available at https://github.com/IHCLab/GQ-mu.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"57 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCL: Dynamic Causal Learning for Cross-modality Cardiac Image Segmentation. 跨模态心脏图像分割的动态因果学习。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-19 DOI: 10.1109/tip.2026.3673293
Saidi Guo,Xinlong Liu,Qixin Lin,Weijie Cai,Guohua Zhao,Mingyi Wu,Qiujie Lv,Laurence T Yang
Accurate cross-modality cardiac image segmentation is essential for effectively diagnosing and treating heart disease. Different imaging modalities help to determine suitable pre-procedure planning. However, most methods face the difficulty of spatial-temporal confounding, where the anatomy element and modality element of cardiac images are intertwined across both spatial and temporal dimensions. It is derived from the imaging diversity and structure diversity of cardiac images. The spatial-temporal confounding hinders knowledge transfer between cardiac images on different modalities. In this paper, we propose a novel dynamic causal learning (DCL) to solve spatial-temporal confounding. The DCL explores multi-dimensional causal intervention to consider not only the causal relationship between images and labels, but also the causality in time dimension and space dimension. It integrates historical optimal interventions and facilitates the transfer of this knowledge across temporal contexts. In addition, the DCL utilizes the diffusion mechanism to further ensure that the extracted anatomy element remains causal invariant, improving model performance across multiple imaging modalities. Extensive experiments on cross-modality cardiac images (MR, CT, and US) demonstrate the effectiveness of the DCL (mean Dice = 0.951), outperforming other advanced segmentation methods. DCL is freely accessible at https://github.com/asdww0721ww/DCL.
准确的心脏交叉模态图像分割是有效诊断和治疗心脏病的关键。不同的成像方式有助于确定合适的术前计划。然而,大多数方法都面临着时空混淆的困难,其中心脏图像的解剖元素和模态元素在空间和时间维度上交织在一起。它来源于心脏图像的成像多样性和结构多样性。时空混淆阻碍了不同模态心脏图像之间的知识传递。在本文中,我们提出一种新的动态因果学习(DCL)来解决时空混淆。DCL探索多维因果干预,不仅考虑图像与标签之间的因果关系,还考虑时间维度和空间维度上的因果关系。它整合了历史上最优的干预措施,并促进了这些知识在时间背景下的转移。此外,DCL利用扩散机制进一步确保提取的解剖元素保持因果不变性,从而提高了模型在多种成像模式下的性能。对跨模态心脏图像(MR、CT和US)的大量实验证明了DCL的有效性(平均Dice = 0.951),优于其他高级分割方法。DCL可以在https://github.com/asdww0721ww/DCL上免费访问。
{"title":"DCL: Dynamic Causal Learning for Cross-modality Cardiac Image Segmentation.","authors":"Saidi Guo,Xinlong Liu,Qixin Lin,Weijie Cai,Guohua Zhao,Mingyi Wu,Qiujie Lv,Laurence T Yang","doi":"10.1109/tip.2026.3673293","DOIUrl":"https://doi.org/10.1109/tip.2026.3673293","url":null,"abstract":"Accurate cross-modality cardiac image segmentation is essential for effectively diagnosing and treating heart disease. Different imaging modalities help to determine suitable pre-procedure planning. However, most methods face the difficulty of spatial-temporal confounding, where the anatomy element and modality element of cardiac images are intertwined across both spatial and temporal dimensions. It is derived from the imaging diversity and structure diversity of cardiac images. The spatial-temporal confounding hinders knowledge transfer between cardiac images on different modalities. In this paper, we propose a novel dynamic causal learning (DCL) to solve spatial-temporal confounding. The DCL explores multi-dimensional causal intervention to consider not only the causal relationship between images and labels, but also the causality in time dimension and space dimension. It integrates historical optimal interventions and facilitates the transfer of this knowledge across temporal contexts. In addition, the DCL utilizes the diffusion mechanism to further ensure that the extracted anatomy element remains causal invariant, improving model performance across multiple imaging modalities. Extensive experiments on cross-modality cardiac images (MR, CT, and US) demonstrate the effectiveness of the DCL (mean Dice = 0.951), outperforming other advanced segmentation methods. DCL is freely accessible at https://github.com/asdww0721ww/DCL.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"59 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention Redundancy Reduction for Image Super-Resolution 图像超分辨率的注意力冗余减少
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-17 DOI: 10.1109/tip.2026.3671624
Yican Liu, Jiacheng Li, Yuhao Jiang, Delu Zeng, Zhou Wang
{"title":"Attention Redundancy Reduction for Image Super-Resolution","authors":"Yican Liu, Jiacheng Li, Yuhao Jiang, Delu Zeng, Zhou Wang","doi":"10.1109/tip.2026.3671624","DOIUrl":"https://doi.org/10.1109/tip.2026.3671624","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"189 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Federated Dynamic Graph HyperNetwork for Image Classification. 用于图像分类的异构联邦动态图超网络。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-16 DOI: 10.1109/tip.2026.3672375
Liu Yang,Kegen Chen,Qilong Wang,Zhengyi Xu,Shiqiao Gu,Qinghua Hu
Federated learning (FL) enables privacy-preserving collaboration among distributed clients, but practical deployments often face heterogeneous models and non-IID data, leading to degraded communication and personalization. In addition, real-world FL systems frequently encounter newly joined clients that require rapid adaptation and abnormal clients that may upload corrupted updates, further exacerbating instability and hindering global convergence. To address these challenges in image classification, we propose HFedDGHN, a Heterogeneous Federated Dynamic Graph HyperNetwork that jointly models inter-client relations and personalized parameter generation. Specifically, a graph structure learner adaptively captures client correlations to construct a dynamic collaboration graph, while a graph-convolutional hypernetwork generates model parameters for heterogeneous architectures, enabling implicit knowledge transfer without sharing local data or weights. Moreover, the framework naturally supports meta-learning-based generalization, allowing efficient adaptation to newly joined clients. Furthermore, the dynamic graph enhances robustness by isolating abnormal clients, as they tend to be excluded from most neighborhoods during adaptive graph construction. Extensive experiments across multiple benchmarks demonstrate that HFed-DGHN achieves superior accuracy compared to state-of-the-art personalized and heterogeneous FL methods, while naturally improving robustness and scalability in real-world deployments.
联邦学习(FL)支持分布式客户机之间的隐私保护协作,但实际部署经常面临异构模型和非iid数据,从而导致通信和个性化程度下降。此外,现实世界的FL系统经常遇到需要快速适应的新加入的客户端和可能上传损坏更新的异常客户端,这进一步加剧了不稳定性并阻碍了全局收敛。为了解决图像分类中的这些挑战,我们提出了HFedDGHN,一个异构联邦动态图超网络,它联合建模客户间关系和个性化参数生成。具体来说,图结构学习器自适应捕获客户端相关性以构建动态协作图,而图卷积超网络为异构体系结构生成模型参数,从而在不共享本地数据或权重的情况下实现隐式知识转移。此外,该框架自然支持基于元学习的泛化,允许有效地适应新加入的客户端。此外,动态图通过隔离异常客户端来增强鲁棒性,因为在自适应图构建过程中,它们往往被排除在大多数邻域之外。在多个基准测试中进行的大量实验表明,与最先进的个性化和异构FL方法相比,HFed-DGHN实现了更高的准确性,同时在实际部署中自然地提高了鲁棒性和可扩展性。
{"title":"Heterogeneous Federated Dynamic Graph HyperNetwork for Image Classification.","authors":"Liu Yang,Kegen Chen,Qilong Wang,Zhengyi Xu,Shiqiao Gu,Qinghua Hu","doi":"10.1109/tip.2026.3672375","DOIUrl":"https://doi.org/10.1109/tip.2026.3672375","url":null,"abstract":"Federated learning (FL) enables privacy-preserving collaboration among distributed clients, but practical deployments often face heterogeneous models and non-IID data, leading to degraded communication and personalization. In addition, real-world FL systems frequently encounter newly joined clients that require rapid adaptation and abnormal clients that may upload corrupted updates, further exacerbating instability and hindering global convergence. To address these challenges in image classification, we propose HFedDGHN, a Heterogeneous Federated Dynamic Graph HyperNetwork that jointly models inter-client relations and personalized parameter generation. Specifically, a graph structure learner adaptively captures client correlations to construct a dynamic collaboration graph, while a graph-convolutional hypernetwork generates model parameters for heterogeneous architectures, enabling implicit knowledge transfer without sharing local data or weights. Moreover, the framework naturally supports meta-learning-based generalization, allowing efficient adaptation to newly joined clients. Furthermore, the dynamic graph enhances robustness by isolating abnormal clients, as they tend to be excluded from most neighborhoods during adaptive graph construction. Extensive experiments across multiple benchmarks demonstrate that HFed-DGHN achieves superior accuracy compared to state-of-the-art personalized and heterogeneous FL methods, while naturally improving robustness and scalability in real-world deployments.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"414 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147465043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identity-Compensated Style Distillation for Visible-Infrared Person Re-Identification. 身份补偿式蒸馏在可见-红外人物再识别中的应用。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-16 DOI: 10.1109/tip.2026.3672369
Yongguo Ling,Zihao Hu,Nan Pu,Zhun Zhong,Xudong Jiang
Visible-Infrared Person Re-Identification (VI-ReID) that matches pedestrian images across visible and infrared modalities suffers from substantial modality discrepancies and intra-class variations. While existing methods typically address the modality gap via style alignment, they often lose identity-relevant semantics and overlook fine-grained inter-class nuances, such as body part contours and structural cues around the head, shoulders, or feet. To tackle these challenges, we propose an Identity-Compensated Style Distillation (ICSD) network that enforces cross-modality style consistency and enhances the discriminative power of modality-invariant features. Specifically, ICSD comprises two core components: (1) a Style Knowledge Distillation (SKD) module, which integrates Style Discrepancy Reduction (SDR) and Identity Knowledge Compensation (IKC) to align modality styles while preserving identity-relevant semantics; (2) an Identity Discrimination Amplification (IDA) module, which captures and enhances subtle inter-class differences by refining identity-specific cues, thereby facilitating more accurate discrimination between different pedestrians. Extensive experiments on three public benchmarks-SYSU-MM01, RegDB, and LLCM-demonstrate that ICSD consistently outperforms state-of-the-art methods, validating the effectiveness and complementarity of its components.
在可见光和红外模式下匹配行人图像的可见-红外人物再识别(VI-ReID)受到大量模态差异和类别内变化的影响。虽然现有的方法通常通过样式对齐来解决模态差距,但它们经常失去与身份相关的语义,并且忽略了细粒度的类间细微差别,例如身体部位轮廓和头部、肩膀或脚周围的结构线索。为了解决这些挑战,我们提出了一个身份补偿风格蒸馏(ICSD)网络,该网络强制跨模态风格一致性并增强模态不变特征的判别能力。具体而言,ICSD包括两个核心组件:(1)风格知识蒸馏(SKD)模块,该模块集成了风格差异减少(SDR)和身份知识补偿(IKC),以在保持身份相关语义的同时对齐模态风格;(2)身份识别放大(Identity Discrimination Amplification, IDA)模块,通过细化身份特异性线索,捕捉和增强微妙的类间差异,从而更准确地区分不同行人。在三个公共基准(sysu - mm01、RegDB和llcm)上的广泛实验表明,ICSD始终优于最先进的方法,验证了其组件的有效性和互补性。
{"title":"Identity-Compensated Style Distillation for Visible-Infrared Person Re-Identification.","authors":"Yongguo Ling,Zihao Hu,Nan Pu,Zhun Zhong,Xudong Jiang","doi":"10.1109/tip.2026.3672369","DOIUrl":"https://doi.org/10.1109/tip.2026.3672369","url":null,"abstract":"Visible-Infrared Person Re-Identification (VI-ReID) that matches pedestrian images across visible and infrared modalities suffers from substantial modality discrepancies and intra-class variations. While existing methods typically address the modality gap via style alignment, they often lose identity-relevant semantics and overlook fine-grained inter-class nuances, such as body part contours and structural cues around the head, shoulders, or feet. To tackle these challenges, we propose an Identity-Compensated Style Distillation (ICSD) network that enforces cross-modality style consistency and enhances the discriminative power of modality-invariant features. Specifically, ICSD comprises two core components: (1) a Style Knowledge Distillation (SKD) module, which integrates Style Discrepancy Reduction (SDR) and Identity Knowledge Compensation (IKC) to align modality styles while preserving identity-relevant semantics; (2) an Identity Discrimination Amplification (IDA) module, which captures and enhances subtle inter-class differences by refining identity-specific cues, thereby facilitating more accurate discrimination between different pedestrians. Extensive experiments on three public benchmarks-SYSU-MM01, RegDB, and LLCM-demonstrate that ICSD consistently outperforms state-of-the-art methods, validating the effectiveness and complementarity of its components.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"189 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147465039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MotionPrior: Exploring Efficient Learning of Motion Concepts for Few-shot Video Generation. MotionPrior:探索运动概念的有效学习,用于少量视频生成。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-16 DOI: 10.1109/tip.2026.3672374
Yaosi Hu,Chang Wen Chen
The diffusion-based text-to-image generation has achieved remarkable progress and realistic content generation performance, greatly promoting the development in text-to-video generation. Although equipped with powerful image diffusion models, video generation modeling still requires massive labeled data and a high training resource cost. Recent, work has been focused on cost-effective video generation in a one-shot or few-shot manner based on the image diffusion model with minimum demand for video data and computing resources. However, these video generation models only support the generation of one single motion pattern/concept. This raises an important question: Can we improve generation freedom with a light training burden? In this paper, we explore a cost-effective video generation scheme for adaptive motion concepts by learning motion priors from a small set of video data. Specifically, we construct a learnable bank for motion concepts and propose the Dual-Semantic-guided Motion Attention module to locate the corresponding motion elements from the bank with the guidance of textual semantic and visual semantic. The extracted motion elements are inserted into video latents via lightweight motion injection layer, which is capable of integrating motion semantic effectively with much fewer parameters compared to the conventional temporal attention layer. In addition, we introduce a temporal-aware noise prior and an inter-frame consistency constraint to strengthen the learning of temporal dependency and improve video smoothness. Extensive experiments validate that the proposed method can learn motion priors adaptively from a small set of training videos to generate smooth videos that involve either single or multiple motion concepts. The results demonstrate that the proposed scheme achieves superior performance compared to existing few-shot video generation methods and even some large-scale video generation models. More information and results are available at https://youncy-hu.github.io/motionprior/.
基于扩散的文本到图像生成取得了显著的进步和逼真的内容生成性能,极大地促进了文本到视频生成的发展。虽然具有强大的图像扩散模型,但视频生成建模仍然需要大量的标记数据和较高的训练资源成本。近年来,基于图像扩散模型,在对视频数据和计算资源需求最小的情况下,以低成本的方式生成单镜头或少镜头视频已成为研究的重点。然而,这些视频生成模型只支持生成一个单一的运动模式/概念。这就提出了一个重要的问题:我们能否以较轻的训练负担来提高生成自由度?在本文中,我们通过从一小部分视频数据中学习运动先验,探索了一种具有成本效益的自适应运动概念视频生成方案。具体而言,我们构建了一个可学习的运动概念库,并提出了双语义引导的运动注意模块,在文本语义和视觉语义的引导下,从运动概念库中定位相应的运动元素。将提取的运动元素通过轻量级的运动注入层插入到视频中,与传统的时间注意层相比,轻量级的运动注入层能够以更少的参数有效地整合运动语义。此外,我们还引入了时间感知噪声先验和帧间一致性约束,以加强对时间依赖性的学习,提高视频的平滑性。大量的实验验证了该方法可以自适应地从一小组训练视频中学习运动先验,从而生成包含单个或多个运动概念的平滑视频。结果表明,与现有的小镜头视频生成方法甚至一些大规模视频生成模型相比,该方案具有更优越的性能。更多信息和结果可在https://youncy-hu.github.io/motionprior/上获得。
{"title":"MotionPrior: Exploring Efficient Learning of Motion Concepts for Few-shot Video Generation.","authors":"Yaosi Hu,Chang Wen Chen","doi":"10.1109/tip.2026.3672374","DOIUrl":"https://doi.org/10.1109/tip.2026.3672374","url":null,"abstract":"The diffusion-based text-to-image generation has achieved remarkable progress and realistic content generation performance, greatly promoting the development in text-to-video generation. Although equipped with powerful image diffusion models, video generation modeling still requires massive labeled data and a high training resource cost. Recent, work has been focused on cost-effective video generation in a one-shot or few-shot manner based on the image diffusion model with minimum demand for video data and computing resources. However, these video generation models only support the generation of one single motion pattern/concept. This raises an important question: Can we improve generation freedom with a light training burden? In this paper, we explore a cost-effective video generation scheme for adaptive motion concepts by learning motion priors from a small set of video data. Specifically, we construct a learnable bank for motion concepts and propose the Dual-Semantic-guided Motion Attention module to locate the corresponding motion elements from the bank with the guidance of textual semantic and visual semantic. The extracted motion elements are inserted into video latents via lightweight motion injection layer, which is capable of integrating motion semantic effectively with much fewer parameters compared to the conventional temporal attention layer. In addition, we introduce a temporal-aware noise prior and an inter-frame consistency constraint to strengthen the learning of temporal dependency and improve video smoothness. Extensive experiments validate that the proposed method can learn motion priors adaptively from a small set of training videos to generate smooth videos that involve either single or multiple motion concepts. The results demonstrate that the proposed scheme achieves superior performance compared to existing few-shot video generation methods and even some large-scale video generation models. More information and results are available at https://youncy-hu.github.io/motionprior/.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"36 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147465040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UD-Gaussian: Uncertainty-Driven Gaussian Modeling for Occluded Person Re-identification. ud -高斯:不确定性驱动的高斯模型对闭塞的人再识别。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-16 DOI: 10.1109/tip.2026.3672380
Yanping Li,Yizhang Liu,Hongyun Zhang,Cairong Zhao,Zhihua Wei,Duoqian Miao
Occluded person re-identification aims to address the identification challenges posed by pedestrians obscured by other individuals or objects. Existing methods often rely on incorporating pose or semantic information to improve model performance under occlusion. However, such information often depends on external models with inevitably cross-domain gaps, whose stability is limited in complex occlusion environments and prone to false results. In this paper, we propose a Transformer-based uncertainty-driven Gaussian model, termed as UD-Gaussian. Firstly, to enrich the detailed features of pedestrian images, a high-frequency enhancement module is introduced. The high-frequency components of the pedestrian image are extracted by Discrete Haar Wavelet Transform, and Top-K high-frequency patches are extracted to construct a graph Laplacian matrix to achieve high-frequency graph attention, which is fused with features learned from self-attention to enhance the high-frequency feature representation. Given the uncertainty in pedestrian feature learning induced by occlusion makes it challenging to obtain reliable and stable pedestrian features, we propose a probability distribution learning module. This module establishes a memory bank to build Gaussian distributions for each pedestrian identity and the entropy is introduced as a loss function to encourage the model to generate more deterministic and relatively independent probability distributions, thereby enhancing the discriminative ability of the model across different pedestrian identities. The high-frequency enhancement module provides a solid foundation for the probability distribution learning module, alleviating uncertainty caused by pedestrian images themselves. Experimental results on occluded and holistic person re-identification datasets demonstrate the superiority of the proposed method.
遮挡人再识别旨在解决被其他个人或物体遮挡的行人所带来的识别挑战。现有的方法通常依赖于结合姿态或语义信息来提高遮挡下的模型性能。然而,这些信息往往依赖于外部模型,不可避免地存在跨域间隙,其稳定性在复杂的遮挡环境中受到限制,容易产生错误的结果。在本文中,我们提出了一个基于变压器的不确定性驱动高斯模型,称为ud -高斯模型。首先,为了丰富行人图像的细节特征,引入了高频增强模块;通过离散Haar小波变换提取行人图像的高频成分,提取Top-K高频斑块构建图拉普拉斯矩阵实现高频图注意,并将高频图注意与从自注意中学习到的特征融合,增强高频特征表征。鉴于遮挡引起的行人特征学习的不确定性使得获取可靠稳定的行人特征变得困难,我们提出了一种概率分布学习模块。该模块建立一个记忆库,对每个行人身份建立高斯分布,并引入熵作为损失函数,促使模型生成更具确定性且相对独立的概率分布,从而增强模型对不同行人身份的判别能力。高频增强模块为概率分布学习模块提供了坚实的基础,缓解了行人图像本身带来的不确定性。在闭塞数据集和整体数据集上的实验结果表明了该方法的优越性。
{"title":"UD-Gaussian: Uncertainty-Driven Gaussian Modeling for Occluded Person Re-identification.","authors":"Yanping Li,Yizhang Liu,Hongyun Zhang,Cairong Zhao,Zhihua Wei,Duoqian Miao","doi":"10.1109/tip.2026.3672380","DOIUrl":"https://doi.org/10.1109/tip.2026.3672380","url":null,"abstract":"Occluded person re-identification aims to address the identification challenges posed by pedestrians obscured by other individuals or objects. Existing methods often rely on incorporating pose or semantic information to improve model performance under occlusion. However, such information often depends on external models with inevitably cross-domain gaps, whose stability is limited in complex occlusion environments and prone to false results. In this paper, we propose a Transformer-based uncertainty-driven Gaussian model, termed as UD-Gaussian. Firstly, to enrich the detailed features of pedestrian images, a high-frequency enhancement module is introduced. The high-frequency components of the pedestrian image are extracted by Discrete Haar Wavelet Transform, and Top-K high-frequency patches are extracted to construct a graph Laplacian matrix to achieve high-frequency graph attention, which is fused with features learned from self-attention to enhance the high-frequency feature representation. Given the uncertainty in pedestrian feature learning induced by occlusion makes it challenging to obtain reliable and stable pedestrian features, we propose a probability distribution learning module. This module establishes a memory bank to build Gaussian distributions for each pedestrian identity and the entropy is introduced as a loss function to encourage the model to generate more deterministic and relatively independent probability distributions, thereby enhancing the discriminative ability of the model across different pedestrian identities. The high-frequency enhancement module provides a solid foundation for the probability distribution learning module, alleviating uncertainty caused by pedestrian images themselves. Experimental results on occluded and holistic person re-identification datasets demonstrate the superiority of the proposed method.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"9 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147465042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification 一种端到端优化的无透镜保护人脸验证系统
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-12 DOI: 10.1109/tip.2026.3671651
Xin Cai, Hailong Zhang, Chenchen Wang, Wentao Liu, Jinwei Gu, Tianfan Xue
{"title":"An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification","authors":"Xin Cai, Hailong Zhang, Chenchen Wang, Wentao Liu, Jinwei Gu, Tianfan Xue","doi":"10.1109/tip.2026.3671651","DOIUrl":"https://doi.org/10.1109/tip.2026.3671651","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"45 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147439784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Downstream Task Inspired Underwater Image Enhancement: A Perception-Aware Study from Dataset Construction to Network Design 基于下游任务的水下图像增强:从数据集构建到网络设计的感知感知研究
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-12 DOI: 10.1109/tip.2026.3671595
Bosen Lin, Feng Gao, Yanwei Yu, Junyu Dong, Qian Du
{"title":"Downstream Task Inspired Underwater Image Enhancement: A Perception-Aware Study from Dataset Construction to Network Design","authors":"Bosen Lin, Feng Gao, Yanwei Yu, Junyu Dong, Qian Du","doi":"10.1109/tip.2026.3671595","DOIUrl":"https://doi.org/10.1109/tip.2026.3671595","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"80 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147439805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IAMAgent: Towards An Interactive and Adaptive Multi-Agent System for Image Restoration. IAMAgent:面向图像恢复的交互式自适应多agent系统。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-12 DOI: 10.1109/tip.2026.3671594
Yanyan Wei,Yilin Zhang,Huan Zheng,Jiahuan Ren,Xiaogang Xu,Zenglin Shi,Zhao Zhang,Meng Wang
Existing image restoration and enhancement (IRE) methods suffer from three fundamental limitations: 1) they present a high technical barrier, requiring expert knowledge and lacking intuitive natural language control; 2) they are inflexible and poorly adaptable, as models are typically designed for single, specific degradations and fail on complex or mixed real-world scenarios; 3) they lack interactivity and ignore subjectivity, operating as "black-box" tools that cannot incorporate human feedback or understand nuanced user intentions. To overcome these challenges, we pioneer a novel paradigm: a Multi-Agent System (MAS) for interactive and adaptive image restoration. We design and implement a prototype system, Interactive and Adaptive Multi-Agent System (IAMAgent), which orchestrates a team of specialized agents to collaboratively solve complex IRE tasks. At its core, a Manager Agent, driven by a Large Language Model, interprets user commands, devises strategies, and allocates sub-tasks. It directs a Perception Agent for degradation diagnosis, a suite of specialized Execution Agents that encapsulate various low-level vision models, and a Critique Agent for automated quality assessment. This collaborative framework enables an innovative, language-driven, and human-in-the-loop optimization process. Our work is the first to introduce the MAS paradigm to the IRE domain, transforming it from a collection of static tools into a dynamic, user-centric, and intelligent system. We demonstrate that IAMAgent not only significantly enhances restoration performance and adaptability but also bridges the critical gap between high-level human intention and low-level vision tasks.
现有的图像恢复和增强(IRE)方法存在三个基本局限性:1)技术壁垒高,需要专业知识,缺乏直观的自然语言控制;2)它们不够灵活,适应性差,因为模型通常是为单一的、特定的退化而设计的,在复杂或混合的现实世界场景中失败;3)它们缺乏交互性,忽视主观性,像“黑箱”工具一样运作,无法整合人类反馈或理解细微的用户意图。为了克服这些挑战,我们开创了一种新的范例:用于交互式和自适应图像恢复的多代理系统(MAS)。我们设计并实现了一个原型系统,交互式和自适应多代理系统(IAMAgent),它协调了一个专业代理团队来协作解决复杂的IRE任务。其核心是由大型语言模型驱动的Manager Agent,它解释用户命令、设计策略并分配子任务。它指导一个用于退化诊断的感知代理,一套封装各种低级视觉模型的专门执行代理,以及一个用于自动质量评估的批判代理。这个协作框架实现了一个创新的、语言驱动的、人在循环的优化过程。我们的工作是第一个将MAS范式引入IRE领域,将其从静态工具的集合转变为动态的、以用户为中心的智能系统。我们证明了IAMAgent不仅显著提高了恢复性能和适应性,而且弥合了高级人类意图和低级视觉任务之间的关键差距。
{"title":"IAMAgent: Towards An Interactive and Adaptive Multi-Agent System for Image Restoration.","authors":"Yanyan Wei,Yilin Zhang,Huan Zheng,Jiahuan Ren,Xiaogang Xu,Zenglin Shi,Zhao Zhang,Meng Wang","doi":"10.1109/tip.2026.3671594","DOIUrl":"https://doi.org/10.1109/tip.2026.3671594","url":null,"abstract":"Existing image restoration and enhancement (IRE) methods suffer from three fundamental limitations: 1) they present a high technical barrier, requiring expert knowledge and lacking intuitive natural language control; 2) they are inflexible and poorly adaptable, as models are typically designed for single, specific degradations and fail on complex or mixed real-world scenarios; 3) they lack interactivity and ignore subjectivity, operating as \"black-box\" tools that cannot incorporate human feedback or understand nuanced user intentions. To overcome these challenges, we pioneer a novel paradigm: a Multi-Agent System (MAS) for interactive and adaptive image restoration. We design and implement a prototype system, Interactive and Adaptive Multi-Agent System (IAMAgent), which orchestrates a team of specialized agents to collaboratively solve complex IRE tasks. At its core, a Manager Agent, driven by a Large Language Model, interprets user commands, devises strategies, and allocates sub-tasks. It directs a Perception Agent for degradation diagnosis, a suite of specialized Execution Agents that encapsulate various low-level vision models, and a Critique Agent for automated quality assessment. This collaborative framework enables an innovative, language-driven, and human-in-the-loop optimization process. Our work is the first to introduce the MAS paradigm to the IRE domain, transforming it from a collection of static tools into a dynamic, user-centric, and intelligent system. We demonstrate that IAMAgent not only significantly enhances restoration performance and adaptability but also bridges the critical gap between high-level human intention and low-level vision tasks.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"52 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147439233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1