首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
E2MPL: An Enduring and Efficient Meta Prompt Learning Framework for Few-Shot Unsupervised Domain Adaptation MPL:一种持久有效的小样本无监督域自适应元提示学习框架
IF 13.7 Pub Date : 2025-12-23 DOI: 10.1109/TIP.2025.3645560
Wanqi Yang;Haoran Wang;Wei Wang;Lei Wang;Ge Song;Ming Yang;Yang Gao
Few-shot unsupervised domain adaptation (FS-UDA) leverages a limited amount of labeled data from a source domain to enable accurate classification in an unlabeled target domain. Despite recent advancements, current approaches of FS-UDA continue to confront a major challenge: models often demonstrate instability when adapted to new FS-UDA tasks and necessitate considerable time investment. To address these challenges, we put forward a novel framework called Enduring and Efficient Meta-Prompt Learning (E2MPL) for FS-UDA. Within this framework, we utilize the pre-trained CLIP model as the backbone of feature learning. Firstly, we design domain-shared prompts, consisting of virtual tokens, which primarily capture meta-knowledge from a wide range of meta-tasks to mitigate the domain gaps. Secondly, we develop a task prompt learning network that adaptively learns task-specific prompts with the goal of achieving fast and stable task generalization. Thirdly, we formulate the meta-prompt learning process as a bilevel optimization problem, consisting of (outer) meta-prompt learner and (inner) task-specific classifier and domain adapter. Also, the inner objective of each meta-task has the closed-form solution, which enables efficient prompt learning and adaptation to new tasks in a single step. Extensive experimental studies demonstrate the promising performance of our framework in a domain adaptation benchmark dataset DomainNet. Compared with state-of-the-art methods, our approach has improved the average accuracy by at least 15 percentage points and reduces the average time by 64.67% in the 5-way 1-shot task; in the 5-way 5-shot task, it achieves at least a 9-percentage-point improvement in average accuracy and reduces the average time by 63.18%. Moreover, our method exhibits more enduring and stable performance than the other methods, i.e., reducing the average IQR value by over 40.80% and 25.35% in the 5-way 1-shot and 5-shot task, respectively.
少射无监督域适应(FS-UDA)利用来自源域的有限数量的标记数据,在未标记的目标域中实现准确分类。尽管最近取得了进展,但目前的FS-UDA方法仍然面临着一个重大挑战:模型在适应新的FS-UDA任务时往往表现出不稳定性,需要大量的时间投入。为了解决这些挑战,我们提出了一个新的框架,称为FS-UDA的持久和高效元提示学习(E2MPL)。在此框架内,我们利用预训练的CLIP模型作为特征学习的主干。首先,我们设计了由虚拟令牌组成的领域共享提示,主要从广泛的元任务中捕获元知识,以减轻领域差距。其次,我们开发了一个自适应学习特定任务提示的任务提示学习网络,以实现快速稳定的任务泛化。第三,我们将元提示学习过程描述为一个双层优化问题,包括(外部)元提示学习器和(内部)任务特定分类器和领域适配器。此外,每个元任务的内部目标都有封闭的解决方案,可以在一个步骤中有效地快速学习和适应新任务。大量的实验研究表明,我们的框架在领域自适应基准数据集DomainNet中具有良好的性能。与现有方法相比,我们的方法在5路1次射击任务中平均准确率提高了至少15个百分点,平均时间缩短了64.67%;在5-way 5-shot任务中,平均准确率至少提高了9个百分点,平均时间缩短了63.18%。此外,我们的方法比其他方法表现出更持久和稳定的性能,在5-way 1-shot和5-shot任务中,平均IQR值分别降低了40.80%和25.35%以上。
{"title":"E2MPL: An Enduring and Efficient Meta Prompt Learning Framework for Few-Shot Unsupervised Domain Adaptation","authors":"Wanqi Yang;Haoran Wang;Wei Wang;Lei Wang;Ge Song;Ming Yang;Yang Gao","doi":"10.1109/TIP.2025.3645560","DOIUrl":"10.1109/TIP.2025.3645560","url":null,"abstract":"Few-shot unsupervised domain adaptation (FS-UDA) leverages a limited amount of labeled data from a source domain to enable accurate classification in an unlabeled target domain. Despite recent advancements, current approaches of FS-UDA continue to confront a major challenge: models often demonstrate instability when adapted to new FS-UDA tasks and necessitate considerable time investment. To address these challenges, we put forward a novel framework called Enduring and Efficient Meta-Prompt Learning (E2MPL) for FS-UDA. Within this framework, we utilize the pre-trained CLIP model as the backbone of feature learning. Firstly, we design domain-shared prompts, consisting of virtual tokens, which primarily capture meta-knowledge from a wide range of meta-tasks to mitigate the domain gaps. Secondly, we develop a task prompt learning network that adaptively learns task-specific prompts with the goal of achieving fast and stable task generalization. Thirdly, we formulate the meta-prompt learning process as a bilevel optimization problem, consisting of (outer) meta-prompt learner and (inner) task-specific classifier and domain adapter. Also, the inner objective of each meta-task has the closed-form solution, which enables efficient prompt learning and adaptation to new tasks in a single step. Extensive experimental studies demonstrate the promising performance of our framework in a domain adaptation benchmark dataset DomainNet. Compared with state-of-the-art methods, our approach has improved the average accuracy by at least 15 percentage points and reduces the average time by 64.67% in the 5-way 1-shot task; in the 5-way 5-shot task, it achieves at least a 9-percentage-point improvement in average accuracy and reduces the average time by 63.18%. Moreover, our method exhibits more enduring and stable performance than the other methods, i.e., reducing the average IQR value by over 40.80% and 25.35% in the 5-way 1-shot and 5-shot task, respectively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8656-8671"},"PeriodicalIF":13.7,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TSFormer: Efficient Ultra-High-Definition Image Restoration via Trusted Min-p TSFormer:高效的超高清图像恢复通过可信的Min- p
IF 13.7 Pub Date : 2025-12-23 DOI: 10.1109/TIP.2025.3645583
Zhuoran Zheng;Pu Wang;Liubing Hu;Xin Su
Ultra-high-definition (UHD) image restoration is vital for applications demanding exceptional visual fidelity, yet existing methods often face a trade-off between restoration quality and efficiency, limiting their practical deployment. In this paper, we propose TSFormer, an all-in-one framework that integrates Trusted learning with Sparsification to boost both generalization capability and computational efficiency in UHD image restoration. The key to sparsification is that only a small amount of token movement is allowed within the model. To efficiently filter tokens, we use Min- $p$ with random matrix theory to quantify the uncertainty of tokens (lower trustworthiness), thereby improving the robustness of the model. Our model can run a 4K ( $3840times 2160$ ) image in real time (40fps) with 3.38 M parameters. Extensive experiments demonstrate that TSFormer achieves state-of-the-art restoration quality while enhancing generalization and reducing computational demands. In addition, our token filtering method can be applied to other image restoration models to effectively accelerate inference and maintain performance.
超高清(UHD)图像恢复对于要求超高视觉保真度的应用至关重要,但现有的方法往往面临恢复质量和效率之间的权衡,限制了它们的实际部署。在本文中,我们提出了TSFormer,一个集成了可信学习和稀疏化的一体化框架,以提高超高清图像恢复的泛化能力和计算效率。稀疏化的关键是在模型中只允许少量的令牌移动。为了有效地过滤令牌,我们使用Min- $p$结合随机矩阵理论来量化令牌的不确定性(可信度较低),从而提高模型的鲁棒性。我们的模型可以在3.38 M参数下实时运行4K(3840美元× 2160美元)图像(40fps)。大量的实验表明,TSFormer在提高泛化和减少计算需求的同时,达到了最先进的恢复质量。此外,我们的令牌过滤方法可以应用于其他图像恢复模型,有效地加速推理并保持性能。
{"title":"TSFormer: Efficient Ultra-High-Definition Image Restoration via Trusted Min-p","authors":"Zhuoran Zheng;Pu Wang;Liubing Hu;Xin Su","doi":"10.1109/TIP.2025.3645583","DOIUrl":"10.1109/TIP.2025.3645583","url":null,"abstract":"Ultra-high-definition (UHD) image restoration is vital for applications demanding exceptional visual fidelity, yet existing methods often face a trade-off between restoration quality and efficiency, limiting their practical deployment. In this paper, we propose TSFormer, an all-in-one framework that integrates Trusted learning with Sparsification to boost both generalization capability and computational efficiency in UHD image restoration. The key to sparsification is that only a small amount of token movement is allowed within the model. To efficiently filter tokens, we use Min-<inline-formula> <tex-math>$p$ </tex-math></inline-formula> with random matrix theory to quantify the uncertainty of tokens (lower trustworthiness), thereby improving the robustness of the model. Our model can run a 4K (<inline-formula> <tex-math>$3840times 2160$ </tex-math></inline-formula>) image in real time (40fps) with 3.38 M parameters. Extensive experiments demonstrate that TSFormer achieves state-of-the-art restoration quality while enhancing generalization and reducing computational demands. In addition, our token filtering method can be applied to other image restoration models to effectively accelerate inference and maintain performance.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"247-259"},"PeriodicalIF":13.7,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlocking Cross-Domain Synergies for Domain Adaptive Semantic Segmentation 解锁跨领域协同的领域自适应语义分割
IF 13.7 Pub Date : 2025-12-23 DOI: 10.1109/TIP.2025.3645599
Qin Xu;Qihang Wu;Bo Jiang;Jiahui Wang;Yuan Chen;Jinhui Tang
Unsupervised domain adaptation semantic segmentation (UDASS) aims to perform dense prediction on the unlabeled target domain by training the model on a labeled source domain. In this field, self-training approaches have demonstrated strong competitiveness and advantages. However, existing methods often rely on additional training data (such as reference datasets or depth maps) to rectify the unreliable pseudo-labels, ignoring the cross-domain interaction between the target and source domains. To address this issue, in this paper, we propose a novel method for unsupervised domain adaptation semantic segmentation, termed Unlocking Cross-Domain Synergies (UCDS). Specifically, in the UCDS network, we design a new Dynamic Self-Correction (DSC) module that effectively transfers source domain knowledge and generates high-confidence pseudo-labels without additional training resources. Unlike the existing methods, DSC proposes a Dynamic Noisy Label Detection method for the target domain. To correct the noisy pseudo-labels, we design a Dual Bank mechanism that explores the reliable and unreliable predictions of the source domain, and conducts cross-domain synergy through Weighted Reassignment Self-Correction and Negative Correction Prevention strategies. To enhance the discriminative ability of features and amplify the dissimilarity of different categories, we propose Discrepancy-based Contrastive Learning (DCL). The DCL selects positive and negative samples in the source and target domains based on the semantic discrepancies among different categories, effectively avoiding the numerous false negative samples found in existing methods. Extensive experimental results on three commonly used datasets demonstrate the superiority of the proposed UCDS in comparison with the state-of-the-art methods. The project and code are available at https://github.com/wqh011128/UCDS
无监督域自适应语义分割(Unsupervised domain adaptive semantic segmentation, UDASS)旨在通过在标记的源域上训练模型,对未标记的目标域进行密集预测。在这一领域,自我训练的方法已经显示出很强的竞争力和优势。然而,现有的方法往往依赖于额外的训练数据(如参考数据集或深度图)来纠正不可靠的伪标签,忽略了目标域和源域之间的跨域交互。为了解决这一问题,本文提出了一种新的无监督域自适应语义分割方法,称为解锁跨域协同(UCDS)。具体而言,在UCDS网络中,我们设计了一个新的动态自校正(DSC)模块,该模块可以有效地传递源领域知识并生成高置信度的伪标签,而无需额外的训练资源。与现有方法不同,DSC提出了一种针对目标域的动态噪声标记检测方法。为了纠正有噪声的伪标签,我们设计了一种Dual Bank机制,该机制探索源域的可靠和不可靠预测,并通过加权重分配自校正和负校正预防策略进行跨域协同。为了增强特征的判别能力,放大不同类别之间的差异性,我们提出了基于差异的对比学习(DCL)。DCL根据不同类别之间的语义差异选择源域和目标域中的阳性和阴性样本,有效避免了现有方法中存在的大量假阴性样本。在三个常用数据集上的大量实验结果表明,与最先进的方法相比,所提出的UCDS具有优越性。该项目和代码可在https://github.com/wqh011128/UCDS上获得
{"title":"Unlocking Cross-Domain Synergies for Domain Adaptive Semantic Segmentation","authors":"Qin Xu;Qihang Wu;Bo Jiang;Jiahui Wang;Yuan Chen;Jinhui Tang","doi":"10.1109/TIP.2025.3645599","DOIUrl":"10.1109/TIP.2025.3645599","url":null,"abstract":"Unsupervised domain adaptation semantic segmentation (UDASS) aims to perform dense prediction on the unlabeled target domain by training the model on a labeled source domain. In this field, self-training approaches have demonstrated strong competitiveness and advantages. However, existing methods often rely on additional training data (such as reference datasets or depth maps) to rectify the unreliable pseudo-labels, ignoring the cross-domain interaction between the target and source domains. To address this issue, in this paper, we propose a novel method for unsupervised domain adaptation semantic segmentation, termed Unlocking Cross-Domain Synergies (UCDS). Specifically, in the UCDS network, we design a new Dynamic Self-Correction (DSC) module that effectively transfers source domain knowledge and generates high-confidence pseudo-labels without additional training resources. Unlike the existing methods, DSC proposes a Dynamic Noisy Label Detection method for the target domain. To correct the noisy pseudo-labels, we design a Dual Bank mechanism that explores the reliable and unreliable predictions of the source domain, and conducts cross-domain synergy through Weighted Reassignment Self-Correction and Negative Correction Prevention strategies. To enhance the discriminative ability of features and amplify the dissimilarity of different categories, we propose Discrepancy-based Contrastive Learning (DCL). The DCL selects positive and negative samples in the source and target domains based on the semantic discrepancies among different categories, effectively avoiding the numerous false negative samples found in existing methods. Extensive experimental results on three commonly used datasets demonstrate the superiority of the proposed UCDS in comparison with the state-of-the-art methods. The project and code are available at <uri>https://github.com/wqh011128/UCDS</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"136-149"},"PeriodicalIF":13.7,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bi-Grid Reconstruction for Image Anomaly Detection 基于双网格重构的图像异常检测
IF 13.7 Pub Date : 2025-12-22 DOI: 10.1109/TIP.2025.3644787
Aimin Feng;Huichuan Huang;Guangyu Wei;Wenlong Sun
In the domain of image anomaly detection, significant progress has been made in unsupervised and self-supervised methods with datasets containing only normal samples. Although these methods perform well in general industrial anomaly detection scenarios, they often struggle with over- or under-detection when faced with fine-grained anomalies in products. In this paper, we propose GRAD: Bi-Grid Reconstruction for Image Anomaly Detection, which utilizes two continuous grids to detect anomalies from both normal and abnormal perspectives. In this work: 1) Grids serve as feature repositories to assist in the reconstruction task, achieving stronger generalization compared to discrete storage, while also helping to avoid the Identical Shortcut (IS) problem common in general reconstruction methods. 2) An additional grid storing abnormal features is introduced alongside the normal grid storing normal features, which refines the boundaries of normal features, thereby enhancing GRAD’s detection performance for fine-grained defects. 3) The Feature Block Pasting (FBP) module is designed to synthesize a variety of anomalies at the feature level, enabling the rapid deployment of the abnormal grid. Additionally, benefiting from the powerful representation capabilities of grids, GRAD is suitable for a unified task setting, requiring only a single model to be trained for multiple classes. GRAD has been comprehensively tested on classic industrial datasets including MVTecAD, VisA, and the newest GoodsAD dataset, showing significant improvement over current state-of-the-art methods.
在图像异常检测领域,仅包含正常样本的数据集的无监督和自监督方法取得了重大进展。尽管这些方法在一般的工业异常检测场景中表现良好,但当面对产品中的细粒度异常时,它们往往会遇到检测过度或检测不足的问题。在本文中,我们提出了用于图像异常检测的GRAD:双网格重建,它利用两个连续的网格从正常和异常的角度检测异常。在这项工作中:1)网格作为特征库来协助重建任务,与离散存储相比,实现了更强的泛化,同时也有助于避免一般重建方法中常见的相同快捷方式(IS)问题。2)在常规网格存储常规特征的基础上,增加了存储异常特征的网格,细化了常规特征的边界,提高了GRAD对细粒度缺陷的检测性能。3)特征块粘贴(Feature Block paste, FBP)模块用于在特征级合成各种异常,实现异常网格的快速部署。此外,得益于网格强大的表示能力,GRAD适用于统一的任务设置,只需要对多个类训练一个模型。GRAD已在经典工业数据集(包括MVTecAD, VisA和最新的GoodsAD数据集)上进行了全面测试,显示出比当前最先进的方法有显着改进。
{"title":"Bi-Grid Reconstruction for Image Anomaly Detection","authors":"Aimin Feng;Huichuan Huang;Guangyu Wei;Wenlong Sun","doi":"10.1109/TIP.2025.3644787","DOIUrl":"10.1109/TIP.2025.3644787","url":null,"abstract":"In the domain of image anomaly detection, significant progress has been made in unsupervised and self-supervised methods with datasets containing only normal samples. Although these methods perform well in general industrial anomaly detection scenarios, they often struggle with over- or under-detection when faced with fine-grained anomalies in products. In this paper, we propose GRAD: Bi-Grid Reconstruction for Image Anomaly Detection, which utilizes two continuous grids to detect anomalies from both normal and abnormal perspectives. In this work: 1) Grids serve as feature repositories to assist in the reconstruction task, achieving stronger generalization compared to discrete storage, while also helping to avoid the Identical Shortcut (IS) problem common in general reconstruction methods. 2) An additional grid storing abnormal features is introduced alongside the normal grid storing normal features, which refines the boundaries of normal features, thereby enhancing GRAD’s detection performance for fine-grained defects. 3) The Feature Block Pasting (FBP) module is designed to synthesize a variety of anomalies at the feature level, enabling the rapid deployment of the abnormal grid. Additionally, benefiting from the powerful representation capabilities of grids, GRAD is suitable for a unified task setting, requiring only a single model to be trained for multiple classes. GRAD has been comprehensively tested on classic industrial datasets including MVTecAD, VisA, and the newest GoodsAD dataset, showing significant improvement over current state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8599-8613"},"PeriodicalIF":13.7,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Faithful Multi-Modal LLMs via Complementary Visual Grounding 通过互补的视觉基础增强忠实的多模态llm
IF 13.7 Pub Date : 2025-12-22 DOI: 10.1109/TIP.2025.3644140
Zheren Fu;Zhendong Mao;Lei Zhang;Yongdong Zhang
Multimodal Large Language Models (MLLMs) exhibit impressive performance across vision-language tasks, but still face the hallucination challenges, where generated texts are factually inconsistent with visual input. Existing mitigation methods focus on surface symptoms of hallucination and heavily rely on post-hoc corrections, extensive data curation, or costly inference schemes. In this work, we identify two key factors of MLLM hallucination: Insufficient Visual Context, where ambiguous visual contexts lead to language speculation, and Progressive Textual Drift, where model attention strays from visual inputs in longer responses. To address these problems, we propose a novel Complementary Visual Grounding (CVG) framework. CVG exploits the intrinsic architecture of MLLMs, without requiring any external tools, models, or additional data. CVG first disentangles visual context into two complementary branches based on query relevance, then maintains steadfast visual grounding during the auto-regressive generation. Finally, it contrasts the output distributions of two branches to produce a faithful response. Extensive experiments on various hallucination and general benchmarks demonstrate that CVG achieves state-of-the-art performances across MLLM architectures and scales.
多模态大型语言模型(Multimodal Large Language Models, mllm)在视觉语言任务中表现出令人印象深刻的性能,但仍然面临幻觉的挑战,即生成的文本实际上与视觉输入不一致。现有的缓解方法侧重于幻觉的表面症状,严重依赖于事后纠正、广泛的数据管理或昂贵的推理方案。在这项工作中,我们确定了MLLM幻觉的两个关键因素:视觉上下文不足,其中模糊的视觉上下文导致语言猜测,以及渐进式文本漂移,其中模型注意力在较长的响应中偏离视觉输入。为了解决这些问题,我们提出了一个新的互补视觉基础(CVG)框架。CVG利用了mlm的内在架构,而不需要任何外部工具、模型或额外的数据。CVG首先根据查询相关性将视觉上下文分解为两个互补的分支,然后在自动回归生成过程中保持稳定的视觉基础。最后,对比两个分支的输出分布,以产生忠实的响应。对各种幻觉和一般基准的广泛实验表明,CVG在MLLM架构和规模上实现了最先进的性能。
{"title":"Boosting Faithful Multi-Modal LLMs via Complementary Visual Grounding","authors":"Zheren Fu;Zhendong Mao;Lei Zhang;Yongdong Zhang","doi":"10.1109/TIP.2025.3644140","DOIUrl":"10.1109/TIP.2025.3644140","url":null,"abstract":"Multimodal Large Language Models (MLLMs) exhibit impressive performance across vision-language tasks, but still face the hallucination challenges, where generated texts are factually inconsistent with visual input. Existing mitigation methods focus on surface symptoms of hallucination and heavily rely on post-hoc corrections, extensive data curation, or costly inference schemes. In this work, we identify two key factors of MLLM hallucination: Insufficient Visual Context, where ambiguous visual contexts lead to language speculation, and Progressive Textual Drift, where model attention strays from visual inputs in longer responses. To address these problems, we propose a novel Complementary Visual Grounding (CVG) framework. CVG exploits the intrinsic architecture of MLLMs, without requiring any external tools, models, or additional data. CVG first disentangles visual context into two complementary branches based on query relevance, then maintains steadfast visual grounding during the auto-regressive generation. Finally, it contrasts the output distributions of two branches to produce a faithful response. Extensive experiments on various hallucination and general benchmarks demonstrate that CVG achieves state-of-the-art performances across MLLM architectures and scales.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8641-8655"},"PeriodicalIF":13.7,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Multifractal Image Segmentation 贝叶斯多重分形图像分割
IF 13.7 Pub Date : 2025-12-22 DOI: 10.1109/TIP.2025.3644793
Kareth M. León-López;Abderrahim Halimi;Jean-Yves Tourneret;Herwig Wendt
Multifractal analysis (MFA) provides a framework for the global characterization of image textures by describing the spatial fluctuations of their local regularity based on the multifractal spectrum. Several works have shown the interest of using MFA for the description of homogeneous textures in images. Nevertheless, natural images can be composed of several textures and, in turn, multifractal properties associated with those textures. This paper introduces an unsupervised Bayesian multifractal segmentation method to model and segment multifractal textures by jointly estimating the multifractal parameters and labels on images, at the pixel-level. For this, a computationally and statistically efficient multifractal parameter estimation model for wavelet leaders is firstly developed, defining different multifractality parameters for different regions of an image. Then, a multiscale Potts Markov random field is introduced as a prior to model the inherent spatial and scale correlations (referred to as cross-scale correlations) between the labels of the wavelet leaders. A Gibbs sampling methodology is finally used to draw samples from the posterior distribution of the unknown model parameters. Numerical experiments are conducted on synthetic multifractal images to evaluate the performance of the proposed segmentation approach. The proposed method achieves superior performance compared to traditional unsupervised segmentation techniques as well as modern deep learning-based approaches, showing its effectiveness for multifractal image segmentation.
多重分形分析(multiple fractal analysis, MFA)基于多重分形谱描述图像纹理局部规律性的空间波动,为图像纹理的全局表征提供了一个框架。一些作品已经表明了使用MFA来描述图像中均匀纹理的兴趣。然而,自然图像可以由几种纹理组成,反过来,与这些纹理相关的多重分形属性。本文介绍了一种无监督贝叶斯多重分形分割方法,通过在像素级上对图像上的多重分形参数和标记进行联合估计,对多重分形纹理进行建模和分割。为此,首先建立了一种计算和统计效率高的小波前导多重分形参数估计模型,对图像的不同区域定义不同的多重分形参数。然后,引入多尺度波茨马尔可夫随机场作为先验模型,对小波前导标签之间固有的空间和尺度相关性(称为跨尺度相关性)进行建模。最后采用Gibbs抽样方法从未知模型参数的后验分布中抽取样本。对合成的多重分形图像进行了数值实验,以评价该分割方法的性能。与传统的无监督分割技术和现代基于深度学习的方法相比,该方法取得了更好的性能,显示了其对多重分形图像分割的有效性。
{"title":"Bayesian Multifractal Image Segmentation","authors":"Kareth M. León-López;Abderrahim Halimi;Jean-Yves Tourneret;Herwig Wendt","doi":"10.1109/TIP.2025.3644793","DOIUrl":"10.1109/TIP.2025.3644793","url":null,"abstract":"Multifractal analysis (MFA) provides a framework for the global characterization of image textures by describing the spatial fluctuations of their local regularity based on the multifractal spectrum. Several works have shown the interest of using MFA for the description of homogeneous textures in images. Nevertheless, natural images can be composed of several textures and, in turn, multifractal properties associated with those textures. This paper introduces an unsupervised Bayesian multifractal segmentation method to model and segment multifractal textures by jointly estimating the multifractal parameters and labels on images, at the pixel-level. For this, a computationally and statistically efficient multifractal parameter estimation model for wavelet leaders is firstly developed, defining different multifractality parameters for different regions of an image. Then, a multiscale Potts Markov random field is introduced as a prior to model the inherent spatial and scale correlations (referred to as cross-scale correlations) between the labels of the wavelet leaders. A Gibbs sampling methodology is finally used to draw samples from the posterior distribution of the unknown model parameters. Numerical experiments are conducted on synthetic multifractal images to evaluate the performance of the proposed segmentation approach. The proposed method achieves superior performance compared to traditional unsupervised segmentation techniques as well as modern deep learning-based approaches, showing its effectiveness for multifractal image segmentation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8500-8510"},"PeriodicalIF":13.7,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SGNet: Style-Guided Network With Temporal Compensation for Unpaired Low-Light Colonoscopy Video Enhancement 基于时间补偿的风格引导网络在非配对低光结肠镜视频增强中的应用
IF 13.7 Pub Date : 2025-12-22 DOI: 10.1109/TIP.2025.3644172
Guanghui Yue;Lixin Zhang;Wanqing Liu;Jingfeng Du;Tianwei Zhou;Hanhe Lin;Qiuping Jiang;Wenqi Ren
A low-light colonoscopy video enhancement method is needed as poor illumination in colonoscopy can hinder accurate disease diagnosis and adversely affect surgical procedures. Existing low-light video enhancement methods usually apply a frame-by-frame enhancement strategy without considering the temporal correlation between them, which often causes a flickering problem. In addition, most methods are designed for endoscopic devices with fixed imaging styles and cannot be easily adapted to different devices. In this paper, we propose a Style-Guided Network (SGNet) for unpaired Low-Light Colonoscopy Video Enhancement (LLCVE). Given that collecting content-consistent paired videos is difficult, SGNet adopts a CycleGAN-based framework to convert low-light videos to normal-light videos, in which a Temporal Compensation (TC) module and a Style Guidance (SG) module are proposed to alleviate the flickering problem and achieve flexible style transfer, respectively. The TC module compensates for a low-light frame by learning the correlated feature of its adjacent frames, thereby improving the temporal smoothness of the enhanced video. The SG module encodes the text of the imaging style and adaptively explores its intrinsic relationships with video features to obtain style representations, which are then used to guide the subsequent enhancement process. Extensive experiments on a curated database show that SGNet achieves promising performance on the LLCVE task, outperforming state-of-the-art methods in both quantitative metrics and visual quality.
低照度结肠镜检查需要视频增强方法,因为结肠镜检查中光照不足会妨碍疾病的准确诊断,并对手术过程产生不利影响。现有的弱光视频增强方法通常采用逐帧增强策略,而不考虑它们之间的时间相关性,这往往会导致闪烁问题。此外,大多数方法都是针对固定成像方式的内窥镜设备设计的,不容易适应不同的设备。在本文中,我们提出了一种用于非配对低光结肠镜视频增强(LLCVE)的风格引导网络(SGNet)。针对收集内容一致的配对视频比较困难的问题,SGNet采用基于cyclegan的框架将弱光视频转换为常光视频,其中提出了时间补偿(Temporal Compensation, TC)模块和风格引导(Style Guidance, SG)模块,分别缓解闪烁问题和实现灵活的风格转换。TC模块通过学习相邻帧的相关特征来补偿弱光帧,从而提高增强视频的时间平滑度。SG模块对图像风格文本进行编码,并自适应地探索其与视频特征的内在关系,以获得风格表征,然后用于指导后续的增强过程。在一个精心策划的数据库上进行的大量实验表明,SGNet在LLCVE任务上取得了很好的性能,在定量指标和视觉质量方面都优于最先进的方法。
{"title":"SGNet: Style-Guided Network With Temporal Compensation for Unpaired Low-Light Colonoscopy Video Enhancement","authors":"Guanghui Yue;Lixin Zhang;Wanqing Liu;Jingfeng Du;Tianwei Zhou;Hanhe Lin;Qiuping Jiang;Wenqi Ren","doi":"10.1109/TIP.2025.3644172","DOIUrl":"10.1109/TIP.2025.3644172","url":null,"abstract":"A low-light colonoscopy video enhancement method is needed as poor illumination in colonoscopy can hinder accurate disease diagnosis and adversely affect surgical procedures. Existing low-light video enhancement methods usually apply a frame-by-frame enhancement strategy without considering the temporal correlation between them, which often causes a flickering problem. In addition, most methods are designed for endoscopic devices with fixed imaging styles and cannot be easily adapted to different devices. In this paper, we propose a Style-Guided Network (SGNet) for unpaired Low-Light Colonoscopy Video Enhancement (LLCVE). Given that collecting content-consistent paired videos is difficult, SGNet adopts a CycleGAN-based framework to convert low-light videos to normal-light videos, in which a Temporal Compensation (TC) module and a Style Guidance (SG) module are proposed to alleviate the flickering problem and achieve flexible style transfer, respectively. The TC module compensates for a low-light frame by learning the correlated feature of its adjacent frames, thereby improving the temporal smoothness of the enhanced video. The SG module encodes the text of the imaging style and adaptively explores its intrinsic relationships with video features to obtain style representations, which are then used to guide the subsequent enhancement process. Extensive experiments on a curated database show that SGNet achieves promising performance on the LLCVE task, outperforming state-of-the-art methods in both quantitative metrics and visual quality.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"234-246"},"PeriodicalIF":13.7,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A3-TTA: Adaptive Anchor Alignment Test-Time Adaptation for Image Segmentation 自适应锚对准测试时间自适应图像分割
IF 13.7 Pub Date : 2025-12-22 DOI: 10.1109/TIP.2025.3644789
Jianghao Wu;Xiangde Luo;Yubo Zhou;Lianming Wu;Guotai Wang;Shaoting Zhang
Test-Time Adaptation (TTA) offers a practical solution for deploying image segmentation models under domain shift without accessing source data or retraining. Among existing TTA strategies, pseudo-label-based methods have shown promising performance. However, they often rely on perturbation-ensemble heuristics (e.g., dropout sampling, test-time augmentation, Gaussian noise), which lack distributional grounding and yield unstable training signals. This can trigger error accumulation and catastrophic forgetting during adaptation. To address this, we propose A3-TTA, a TTA framework that constructs reliable pseudo-labels through anchor-guided supervision. Specifically, we identify well-predicted target domain images using a class compact density metric, under the assumption that confident predictions imply distributional proximity to the source domain. These anchors serve as stable references to guide pseudo-label generation, which is further regularized via semantic consistency and boundary-aware entropy minimization. Additionally, we introduce a self-adaptive exponential moving average strategy to mitigate label noise and stabilize model update during adaptation. Evaluated on both multi-domain medical images (heart structure and prostate segmentation) and natural images, A3-TTA significantly improves average Dice scores by 10.40 to 17.68 percentage points compared to the source model, outperforming several state-of-the-art TTA methods under different segmentation model architectures. A3-TTA also excels in continual TTA, maintaining high performance across sequential target domains with strong anti-forgetting ability. The code will be made publicly available at https://github.com/HiLab-git/A3-TTA
测试时间自适应(TTA)为在域移位情况下部署图像分割模型提供了一种实用的解决方案,无需访问源数据或重新训练。在现有的翻译策略中,基于伪标签的方法表现出了良好的性能。然而,它们通常依赖于扰动集合启发式(例如,dropout采样,测试时间增强,高斯噪声),这些方法缺乏分布基础,产生不稳定的训练信号。这可能会在适应过程中引发错误积累和灾难性遗忘。为了解决这个问题,我们提出了A3-TTA,这是一个通过锚引导监督构建可靠伪标签的TTA框架。具体来说,我们使用类紧凑密度度量来识别预测良好的目标域图像,假设自信的预测意味着与源域的分布接近。这些锚点作为稳定的参考来指导伪标签的生成,并通过语义一致性和边界感知熵最小化进一步正则化伪标签。此外,我们引入了一种自适应指数移动平均策略,以减轻标签噪声并稳定自适应过程中的模型更新。在多域医学图像(心脏结构和前列腺分割)和自然图像上,与源模型相比,A3-TTA显著提高了平均Dice分数10.40至17.68个百分点,在不同的分割模型架构下优于几种最先进的TTA方法。A3-TTA在连续TTA方面也表现出色,在连续目标域保持较高的性能,具有较强的抗遗忘能力。该代码将在https://github.com/HiLab-git/A3-TTA上公开发布
{"title":"A3-TTA: Adaptive Anchor Alignment Test-Time Adaptation for Image Segmentation","authors":"Jianghao Wu;Xiangde Luo;Yubo Zhou;Lianming Wu;Guotai Wang;Shaoting Zhang","doi":"10.1109/TIP.2025.3644789","DOIUrl":"10.1109/TIP.2025.3644789","url":null,"abstract":"Test-Time Adaptation (TTA) offers a practical solution for deploying image segmentation models under domain shift without accessing source data or retraining. Among existing TTA strategies, pseudo-label-based methods have shown promising performance. However, they often rely on perturbation-ensemble heuristics (e.g., dropout sampling, test-time augmentation, Gaussian noise), which lack distributional grounding and yield unstable training signals. This can trigger error accumulation and catastrophic forgetting during adaptation. To address this, we propose A3-TTA, a TTA framework that constructs reliable pseudo-labels through anchor-guided supervision. Specifically, we identify well-predicted target domain images using a class compact density metric, under the assumption that confident predictions imply distributional proximity to the source domain. These anchors serve as stable references to guide pseudo-label generation, which is further regularized via semantic consistency and boundary-aware entropy minimization. Additionally, we introduce a self-adaptive exponential moving average strategy to mitigate label noise and stabilize model update during adaptation. Evaluated on both multi-domain medical images (heart structure and prostate segmentation) and natural images, A3-TTA significantly improves average Dice scores by 10.40 to 17.68 percentage points compared to the source model, outperforming several state-of-the-art TTA methods under different segmentation model architectures. A3-TTA also excels in continual TTA, maintaining high performance across sequential target domains with strong anti-forgetting ability. The code will be made publicly available at <uri>https://github.com/HiLab-git/A3-TTA</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8511-8522"},"PeriodicalIF":13.7,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation PoseMoE:用于单目三维人体姿态估计的混合专家网络
IF 13.7 Pub Date : 2025-12-22 DOI: 10.1109/TIP.2025.3644785
Mengyuan Liu;Jiajie Liu;Jinyan Zhang;Wenhao Li;Junsong Yuan
The lifting-based methods have dominated monocular 3D human pose estimation by leveraging detected 2D poses as intermediate representations. The 2D component of the final 3D human pose benefits from the detected 2D poses, whereas its depth counterpart must be estimated from scratch. The lifting-based methods encode the detected 2D pose and unknown depth in an entangled feature space, explicitly introducing depth uncertainty to the detected 2D pose, thereby limiting overall estimation accuracy. This work reveals that the depth representation is pivotal for the estimation process. Specifically, when depth is in an initial, completely unknown state, jointly encoding depth features with 2D pose features is detrimental to the estimation process. In contrast, when depth is initially refined to a more dependable state via network-based estimation, encoding it together with 2D pose information is beneficial. To address this limitation, we present a Mixture-of-Experts network for monocular 3D pose estimation named PoseMoE. Our approach introduces: 1) A mixture-of-experts network where specialized expert modules refine the well-detected 2D pose features and learn the depth features. This mixture-of-experts design disentangles the feature encoding process for 2D pose and depth, therefore reducing the explicit influence of uncertain depth features on 2D pose features. 2) A cross-expert knowledge aggregation module is proposed to aggregate cross-expert spatio-temporal contextual information. This step enhances features through bidirectional mapping between 2D pose and depth. Extensive experiments show that our proposed PoseMoE outperforms the conventional lifting-based methods on three widely used datasets: Human3.6M, MPI-INF-3DHP, and 3DPW.
基于提升的方法通过利用检测到的2D姿态作为中间表示,主导了单目3D人体姿态估计。最终3D人体姿态的2D分量受益于检测到的2D姿态,而其深度对应部分必须从头开始估计。基于提升的方法将检测到的2D位姿和未知深度编码在纠缠的特征空间中,明确地将深度不确定性引入到检测到的2D位姿中,从而限制了整体估计精度。这项工作表明,深度表示是关键的估计过程。具体来说,当深度处于初始的、完全未知的状态时,将深度特征与2D姿态特征联合编码不利于估计过程。相比之下,当深度最初通过基于网络的估计细化到更可靠的状态时,将其与2D姿态信息一起编码是有益的。为了解决这一限制,我们提出了一个名为PoseMoE的用于单眼3D姿态估计的混合专家网络。我们的方法引入了:1)一个混合专家网络,其中专门的专家模块精炼已检测到的2D姿态特征并学习深度特征。这种混合专家设计解决了二维姿态和深度的特征编码过程,从而减少了不确定深度特征对二维姿态特征的显式影响。2)提出跨专家知识聚合模块,实现跨专家时空语境信息的聚合。这一步通过2D姿态和深度之间的双向映射来增强特征。大量的实验表明,我们提出的PoseMoE在三个广泛使用的数据集(Human3.6M, MPI-INF-3DHP和3DPW)上优于传统的基于提升的方法。
{"title":"PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation","authors":"Mengyuan Liu;Jiajie Liu;Jinyan Zhang;Wenhao Li;Junsong Yuan","doi":"10.1109/TIP.2025.3644785","DOIUrl":"10.1109/TIP.2025.3644785","url":null,"abstract":"The lifting-based methods have dominated monocular 3D human pose estimation by leveraging detected 2D poses as intermediate representations. The 2D component of the final 3D human pose benefits from the detected 2D poses, whereas its depth counterpart must be estimated from scratch. The lifting-based methods encode the detected 2D pose and unknown depth in an entangled feature space, explicitly introducing depth uncertainty to the detected 2D pose, thereby limiting overall estimation accuracy. This work reveals that the depth representation is pivotal for the estimation process. Specifically, when depth is in an initial, completely unknown state, jointly encoding depth features with 2D pose features is detrimental to the estimation process. In contrast, when depth is initially refined to a more dependable state via network-based estimation, encoding it together with 2D pose information is beneficial. To address this limitation, we present a Mixture-of-Experts network for monocular 3D pose estimation named PoseMoE. Our approach introduces: 1) A mixture-of-experts network where specialized expert modules refine the well-detected 2D pose features and learn the depth features. This mixture-of-experts design disentangles the feature encoding process for 2D pose and depth, therefore reducing the explicit influence of uncertain depth features on 2D pose features. 2) A cross-expert knowledge aggregation module is proposed to aggregate cross-expert spatio-temporal contextual information. This step enhances features through bidirectional mapping between 2D pose and depth. Extensive experiments show that our proposed PoseMoE outperforms the conventional lifting-based methods on three widely used datasets: Human3.6M, MPI-INF-3DHP, and 3DPW.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8537-8551"},"PeriodicalIF":13.7,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Degradation-Aware Prompted Transformer for Unified Medical Image Restoration 用于统一医学图像恢复的退化感知提示变压器
IF 13.7 Pub Date : 2025-12-22 DOI: 10.1109/TIP.2025.3644795
Jinbao Wei;Gang Yang;Zhijie Wang;Shimin Tao;Aiping Liu;Xun Chen
Medical image restoration (MedIR) aims to recover high-quality images from degraded inputs, yet faces unique challenges from physics-driven degradations and multi-modal task interference. While existing all-in-one methods handle natural image degradations well, they struggle with medical scenarios due to limited degradation perception and suboptimal multi-task optimization. In response, we introduce DaPT, a Degradation-aware Prompted Transformer, which integrates dynamic prompt learning and modular expert mining for unified MedIR. First, DaPT introduces spatially compact prompts with optimal transport regularization, amplifying inter-prompt differences to capture diverse degradation patterns. Second, a mixture of experts dynamically routes inputs to specialized modules via prompt guidance, resolving task conflicts while reducing computational overhead. The synergy of prompt learning and expert mining further enables robust restoration across multi-modal medical data, offering a practical solution for clinical imaging. Extensive experiments across multiple modalities (MRI, CT, PET) and diverse degradations, covering both in-distribution and out-of-distribution scenarios, demonstrate that DaPT consistently outperforms state-of-the-art methods and generalizes reliably to unseen settings, underscoring its robustness, effectiveness, and clinical practicality. The source code will be released at https://github.com/weijinbao1998/DaPT
医学图像恢复(MedIR)旨在从退化的输入中恢复高质量的图像,但面临着物理驱动的退化和多模态任务干扰的独特挑战。虽然现有的一体化方法可以很好地处理自然图像的退化,但由于退化感知有限和次优的多任务优化,它们在医疗场景中很困难。作为回应,我们引入了DaPT,一种退化感知提示变压器,它集成了动态提示学习和模块化专家挖掘,用于统一的MedIR。首先,DaPT引入具有最佳传输正则化的空间紧凑提示,放大提示间差异以捕获不同的退化模式。其次,混合专家通过及时的指导动态地将输入路由到专门的模块,在减少计算开销的同时解决任务冲突。快速学习和专家挖掘的协同作用进一步实现了跨多模态医疗数据的稳健恢复,为临床成像提供了实用的解决方案。在多种模式(MRI、CT、PET)和不同退化情况下进行的广泛实验,涵盖了分布内和分布外的情况,表明DaPT始终优于最先进的方法,并可靠地推广到未知的环境,强调了其稳健性、有效性和临床实用性。源代码将在https://github.com/weijinbao1998/DaPT上发布
{"title":"Degradation-Aware Prompted Transformer for Unified Medical Image Restoration","authors":"Jinbao Wei;Gang Yang;Zhijie Wang;Shimin Tao;Aiping Liu;Xun Chen","doi":"10.1109/TIP.2025.3644795","DOIUrl":"10.1109/TIP.2025.3644795","url":null,"abstract":"Medical image restoration (MedIR) aims to recover high-quality images from degraded inputs, yet faces unique challenges from physics-driven degradations and multi-modal task interference. While existing all-in-one methods handle natural image degradations well, they struggle with medical scenarios due to limited degradation perception and suboptimal multi-task optimization. In response, we introduce DaPT, a Degradation-aware Prompted Transformer, which integrates dynamic prompt learning and modular expert mining for unified MedIR. First, DaPT introduces spatially compact prompts with optimal transport regularization, amplifying inter-prompt differences to capture diverse degradation patterns. Second, a mixture of experts dynamically routes inputs to specialized modules via prompt guidance, resolving task conflicts while reducing computational overhead. The synergy of prompt learning and expert mining further enables robust restoration across multi-modal medical data, offering a practical solution for clinical imaging. Extensive experiments across multiple modalities (MRI, CT, PET) and diverse degradations, covering both in-distribution and out-of-distribution scenarios, demonstrate that DaPT consistently outperforms state-of-the-art methods and generalizes reliably to unseen settings, underscoring its robustness, effectiveness, and clinical practicality. The source code will be released at <uri>https://github.com/weijinbao1998/DaPT</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"8583-8598"},"PeriodicalIF":13.7,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1