首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
DKC: Data-driven and knowledge-guided causal discovery with application to healthcare data DKC:数据驱动和知识引导的因果关系发现及其在医疗保健数据中的应用
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.knosys.2026.115384
Uzma Hasan, Md Osman Gani
Efficient causal discovery is essential for constructing reliable causal graphs that provide actionable insights in domains where randomized experiments are infeasible. This study introduces DKC, a novel causal discovery algorithm that utilizes both observational data and prior knowledge to enable reliable learning of causal graphs that supports decision-making in complex domains such as healthcare. Traditional causal discovery methods often rely exclusively on observational data, which reduces their effectiveness when datasets are noisy, limited in size, or involve intricate causal relationships. Moreover, existing approaches seldom incorporate prior knowledge in a flexible manner, limiting their applicability in real-world scenarios. DKC addresses these challenges by efficiently incorporating causal priors into the discovery process through a tailored scoring criterion that supports both hard and soft constraints. The framework operates in three stages: (i) estimation of a topological ordering of variables, (ii) ranking candidate edges according to likelihood, and (iii) performing a constrained causal search using the proposed score to balance model fit, complexity, and prior knowledge. We establish theoretical guarantees demonstrating that the score is statistically consistent, converging to the true causal structure as sample size grows. Extensive experiments on synthetic datasets of varying scales, as well as real-world healthcare data, confirm that DKC outperforms state-of-the-art baselines in terms of structural accuracy and robustness. By harmonizing data-driven insights with prior knowledge, DKC provides a trustworthy foundation for causal inference across diverse fields. Its application to a clinical problem highlights its potential to guide critical decision-making, while its general framework ensures broad utility in any domains requiring reliable, knowledge-informed causal reasoning.
有效的因果发现对于构建可靠的因果图至关重要,在随机实验不可行的领域提供可操作的见解。本研究介绍了DKC,一种新的因果发现算法,它利用观察数据和先验知识来实现因果图的可靠学习,从而支持医疗保健等复杂领域的决策。传统的因果发现方法通常完全依赖于观测数据,当数据集嘈杂、规模有限或涉及复杂的因果关系时,这降低了它们的有效性。此外,现有的方法很少以灵活的方式纳入先验知识,限制了它们在现实场景中的适用性。DKC通过支持硬约束和软约束的定制评分标准,有效地将因果先验纳入发现过程,从而解决了这些挑战。该框架分三个阶段运行:(i)估计变量的拓扑顺序,(ii)根据似然对候选边进行排序,以及(iii)使用提议的分数执行约束因果搜索以平衡模型拟合,复杂性和先验知识。我们建立了理论保证,证明分数在统计上是一致的,随着样本量的增长收敛到真正的因果结构。在不同规模的合成数据集以及现实世界的医疗保健数据上进行的大量实验证实,DKC在结构准确性和稳健性方面优于最先进的基线。通过将数据驱动的见解与先验知识相协调,DKC为跨不同领域的因果推理提供了可靠的基础。它在临床问题上的应用突出了其指导关键决策的潜力,而其总体框架确保了在任何需要可靠的、知识灵通的因果推理的领域的广泛效用。
{"title":"DKC: Data-driven and knowledge-guided causal discovery with application to healthcare data","authors":"Uzma Hasan,&nbsp;Md Osman Gani","doi":"10.1016/j.knosys.2026.115384","DOIUrl":"10.1016/j.knosys.2026.115384","url":null,"abstract":"<div><div>Efficient causal discovery is essential for constructing reliable causal graphs that provide actionable insights in domains where randomized experiments are infeasible. This study introduces DKC, a novel causal discovery algorithm that utilizes both observational data and prior knowledge to enable reliable learning of causal graphs that supports decision-making in complex domains such as healthcare. Traditional causal discovery methods often rely exclusively on observational data, which reduces their effectiveness when datasets are noisy, limited in size, or involve intricate causal relationships. Moreover, existing approaches seldom incorporate prior knowledge in a flexible manner, limiting their applicability in real-world scenarios. DKC addresses these challenges by efficiently incorporating causal priors into the discovery process through a tailored scoring criterion that supports both hard and soft constraints. The framework operates in three stages: (i) estimation of a topological ordering of variables, (ii) ranking candidate edges according to likelihood, and (iii) performing a constrained causal search using the proposed score to balance model fit, complexity, and prior knowledge. We establish theoretical guarantees demonstrating that the score is statistically consistent, converging to the true causal structure as sample size grows. Extensive experiments on synthetic datasets of varying scales, as well as real-world healthcare data, confirm that DKC outperforms state-of-the-art baselines in terms of structural accuracy and robustness. By harmonizing data-driven insights with prior knowledge, DKC provides a trustworthy foundation for causal inference across diverse fields. Its application to a clinical problem highlights its potential to guide critical decision-making, while its general framework ensures broad utility in any domains requiring reliable, knowledge-informed causal reasoning.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115384"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedCLIP-Distill: Heterogeneous federated cross-modal knowledge distillation for multi-domain visual recognition FedCLIP-Distill:面向多领域视觉识别的异构联邦跨模态知识蒸馏
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.knosys.2026.115383
Yuankun Xia, Hui Wang, Yufeng Zhou
Federated learning (FL) for multi-domain visual recognition confronts significant challenges due to heterogeneous data distributions and domain shifts, which severely impair the semantic generalization capability of existing methods. To address these challenges, we propose FedCLIP-Distill, a novel framework that employs dual-domain knowledge distillation (KD) and contrastive relational distillation (CRD) to leverage the powerful visual-language alignment of CLIP in heterogeneous FL environments. Our approach employs a centralized CLIP teacher model to distill robust visual-textual semantics into lightweight client-side student models, thereby enabling effective local domain adaptation. We provide a theoretical convergence analysis proving that our distillation mechanism effectively mitigates domain gaps and facilitates robust convergence under non-IID settings. Extensive experiments on Office-Caltech10 and DomainNet benchmarks show that FedCLIP-Distill outperforms other methods: it achieves an average cross-domain accuracy of 98.5% on Office-Caltech10 and 80.50% on DomainNet. In different heterogeneous situations (e.g., Dirichlet α = 0.5, 9.52% higher than FedCLIP), demonstrating significant improvements in accuracy and generalization under heterogeneous scenarios. The source code is available at https://github.com/Yuankun-Xia/FedCLIP-Distill.
多领域视觉识别中的联邦学习由于数据的异构分布和领域的转移而面临着巨大的挑战,严重影响了现有方法的语义泛化能力。为了解决这些挑战,我们提出了FedCLIP-Distill,这是一个采用双领域知识蒸馏(KD)和对比关系蒸馏(CRD)的新框架,以利用CLIP在异构FL环境中强大的视觉语言校准功能。我们的方法采用集中式CLIP教师模型,将鲁棒的视觉文本语义提取到轻量级的客户端学生模型中,从而实现有效的局部领域适应。我们提供了一个理论收敛分析,证明我们的蒸馏机制有效地减轻了域差距,并促进了非iid设置下的鲁棒收敛。在Office-Caltech10和DomainNet基准测试上的大量实验表明,FedCLIP-Distill优于其他方法:它在Office-Caltech10和DomainNet上的平均跨域准确率分别达到98.5%和80.50%。在不同的异构情况下(例如,Dirichlet α = 0.5,比FedCLIP高9.52%),显示出在异构场景下准确性和泛化性的显著提高。源代码可从https://github.com/Yuankun-Xia/FedCLIP-Distill获得。
{"title":"FedCLIP-Distill: Heterogeneous federated cross-modal knowledge distillation for multi-domain visual recognition","authors":"Yuankun Xia,&nbsp;Hui Wang,&nbsp;Yufeng Zhou","doi":"10.1016/j.knosys.2026.115383","DOIUrl":"10.1016/j.knosys.2026.115383","url":null,"abstract":"<div><div>Federated learning (FL) for multi-domain visual recognition confronts significant challenges due to heterogeneous data distributions and domain shifts, which severely impair the semantic generalization capability of existing methods. To address these challenges, we propose FedCLIP-Distill, a novel framework that employs dual-domain knowledge distillation (KD) and contrastive relational distillation (CRD) to leverage the powerful visual-language alignment of CLIP in heterogeneous FL environments. Our approach employs a centralized CLIP teacher model to distill robust visual-textual semantics into lightweight client-side student models, thereby enabling effective local domain adaptation. We provide a theoretical convergence analysis proving that our distillation mechanism effectively mitigates domain gaps and facilitates robust convergence under non-IID settings. Extensive experiments on Office-Caltech10 and DomainNet benchmarks show that FedCLIP-Distill outperforms other methods: it achieves an average cross-domain accuracy of 98.5% on Office-Caltech10 and 80.50% on DomainNet. In different heterogeneous situations (e.g., Dirichlet <em>α</em> = 0.5, 9.52% higher than FedCLIP), demonstrating significant improvements in accuracy and generalization under heterogeneous scenarios. The source code is available at <span><span>https://github.com/Yuankun-Xia/FedCLIP-Distill</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115383"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Heterogeneous Graph Learning with Semantic-Aware Meta-Path Diffusion and Dual Optimization 基于语义感知元路径扩散和对偶优化的异构图学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.knosys.2026.115385
Guanghua Ding , Rui Tang , Xian Mo
Heterogeneous graph learning aims to extract semantic and structural information from multiple node types, edges, and meta-paths, learning low-dimensional embeddings that preserve core characteristics to support downstream tasks. To address the core challenges of insufficient semantic mining and weak learning synergy in heterogeneous graph learning, this paper proposes a heterogeneous graph learning method integrating Semantic-aware Meta-path perturbation and Collaborative Dual-learning optimization(SMCD). First, the method constructs auxiliary meta-paths based on the original meta-paths, and then designs two augmentation schemes to generate augmented views: For semantic-level augmentation, it performs edge perturbation based on semantic similarity, and enhances the semantics of core meta-paths with the semantics of auxiliary meta-paths via a diffusion model; For task-level augmentation, it utilizes a diffusion model and semantic weights to select the top-k semantically relevant nodes for each node in the core meta-path graph, reconstructing the meta-paths graph structure. Then, a two-stage attention aggregation graph encoder is adopted to output the final node embeddings. Finally, a self-supervised and supervised (i.e., Dual-learning) collaborative optimization strategy that flexibly adapts to label distribution is used to optimize the objective-this not only balances the discriminability and generality of representations but also adapts to scenarios with different degrees of label scarcity. Experimental results on three public datasets illustrate that our proposed method achieves remarkable advantages in both node classification and node clustering tasks. Our datasets and source code are available.1
异构图学习旨在从多个节点类型、边缘和元路径中提取语义和结构信息,学习保留核心特征的低维嵌入,以支持下游任务。针对异构图学习中语义挖掘不足和学习协同能力弱的核心问题,提出了一种集成语义感知元路径扰动和协同双学习优化(SMCD)的异构图学习方法。该方法首先在原始元路径的基础上构建辅助元路径,然后设计两种增强方案生成增强视图:语义级增强,基于语义相似度进行边缘扰动,通过扩散模型利用辅助元路径的语义增强核心元路径的语义;对于任务级增强,它利用扩散模型和语义权重为核心元路径图中的每个节点选择top-k个语义相关节点,重构元路径图结构。然后,采用两阶段注意力聚集图编码器输出最终节点嵌入。最后,采用一种灵活适应标签分布的自监督和监督(即双学习)协同优化策略对目标进行优化,这不仅平衡了表征的可辨别性和一般性,而且适应了不同标签稀缺程度的场景。在三个公共数据集上的实验结果表明,我们的方法在节点分类和节点聚类任务上都取得了显著的优势。我们的数据集和源代码是可用的
{"title":"Enhancing Heterogeneous Graph Learning with Semantic-Aware Meta-Path Diffusion and Dual Optimization","authors":"Guanghua Ding ,&nbsp;Rui Tang ,&nbsp;Xian Mo","doi":"10.1016/j.knosys.2026.115385","DOIUrl":"10.1016/j.knosys.2026.115385","url":null,"abstract":"<div><div>Heterogeneous graph learning aims to extract semantic and structural information from multiple node types, edges, and meta-paths, learning low-dimensional embeddings that preserve core characteristics to support downstream tasks. To address the core challenges of insufficient semantic mining and weak learning synergy in heterogeneous graph learning, this paper proposes a heterogeneous graph learning method integrating <u>S</u>emantic-aware <u>M</u>eta-path perturbation and <u>C</u>ollaborative <u>D</u>ual-learning optimization(SMCD). First, the method constructs auxiliary meta-paths based on the original meta-paths, and then designs two augmentation schemes to generate augmented views: For semantic-level augmentation, it performs edge perturbation based on semantic similarity, and enhances the semantics of core meta-paths with the semantics of auxiliary meta-paths via a diffusion model; For task-level augmentation, it utilizes a diffusion model and semantic weights to select the top-k semantically relevant nodes for each node in the core meta-path graph, reconstructing the meta-paths graph structure. Then, a two-stage attention aggregation graph encoder is adopted to output the final node embeddings. Finally, a self-supervised and supervised (i.e., Dual-learning) collaborative optimization strategy that flexibly adapts to label distribution is used to optimize the objective-this not only balances the discriminability and generality of representations but also adapts to scenarios with different degrees of label scarcity. Experimental results on three public datasets illustrate that our proposed method achieves remarkable advantages in both node classification and node clustering tasks. Our datasets and source code are available.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115385"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransXV2S-NET: A novel hybrid deep learning architecture with dual-contextual graph attention for multi-class skin lesion classification TransXV2S-NET:一种新的基于双上下文图关注的混合深度学习架构,用于多类皮肤病变分类
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.knosys.2026.115407
Adnan Saeed , Khurram Shehzad , Muhammad Ghulam Abbas Malik , Saim Ahmed , Ahmad Taher Azar
Accurate early-stage diagnosis of skin lesions remains challenging for dermatologists due to visual complexity and subtle inter-class differences. Traditional computer-assisted diagnostic tools struggle to capture detailed patterns and contextual relationships, especially under varying imaging conditions. In this study, we introduce TransXV2S-Net, a new hybrid deep-learning model based on multiple branches designed for automated skin lesion classification. These branches enable to extract features at different stages from skin lesions separately and learn complex combinations between them. These branches include an EfficientNetV2S, Swin Transformer, and a modified Xception architecture, a new feature extraction method, as well as a Dual-Contextual Graph Attention Network (DCGAN) that is proposed to make the network focus on discriminative parts of skin lesions. A novel Dual-Contextual Graph Attention Network (DCGAN) enhances discriminative feature learning through dual-path attention mechanisms and graph-based operations that effectively capture both local textural details and global contextual patterns. The Gray World Standard Deviation (GWSD) preprocessing algorithm improves lesion visibility and removes imaging artifacts Benchmarking against an 8-class skin cancer dataset confirmed the model's efficacy, yielding 95.26% accuracy, 94.30% recall, and an AUC-ROC of 99.62%. Further validation on the HAM10000 dataset demonstrates exceptional performance with 95% accuracy, confirming the model's robustness and generalization capability.
由于视觉复杂性和微妙的类间差异,准确的早期诊断皮肤病变仍然是皮肤科医生的挑战。传统的计算机辅助诊断工具很难捕获详细的模式和上下文关系,特别是在不同的成像条件下。在这项研究中,我们引入了TransXV2S-Net,这是一种新的基于多分支的混合深度学习模型,旨在实现皮肤病变的自动分类。这些分支能够分别从皮肤病变中提取不同阶段的特征,并学习它们之间的复杂组合。这些分支包括高效netv2s、Swin Transformer、改进的异常架构、一种新的特征提取方法,以及提出的双上下文图注意网络(Dual-Contextual Graph Attention Network, DCGAN),该网络将重点放在皮肤病变的鉴别部分。一种新的双上下文图注意网络(DCGAN)通过双路径注意机制和基于图的操作来增强判别特征学习,有效地捕获局部纹理细节和全局上下文模式。灰色世界标准偏差(GWSD)预处理算法提高了病变可见性并消除了成像伪像,对8类皮肤癌数据集的基准测试证实了该模型的有效性,准确率为95.26%,召回率为94.30%,AUC-ROC为99.62%。在HAM10000数据集上的进一步验证表明,该模型具有优异的性能,准确率达到95%,证实了模型的鲁棒性和泛化能力。
{"title":"TransXV2S-NET: A novel hybrid deep learning architecture with dual-contextual graph attention for multi-class skin lesion classification","authors":"Adnan Saeed ,&nbsp;Khurram Shehzad ,&nbsp;Muhammad Ghulam Abbas Malik ,&nbsp;Saim Ahmed ,&nbsp;Ahmad Taher Azar","doi":"10.1016/j.knosys.2026.115407","DOIUrl":"10.1016/j.knosys.2026.115407","url":null,"abstract":"<div><div>Accurate early-stage diagnosis of skin lesions remains challenging for dermatologists due to visual complexity and subtle inter-class differences. Traditional computer-assisted diagnostic tools struggle to capture detailed patterns and contextual relationships, especially under varying imaging conditions. In this study, we introduce TransXV2S-Net, a new hybrid deep-learning model based on multiple branches designed for automated skin lesion classification. These branches enable to extract features at different stages from skin lesions separately and learn complex combinations between them. These branches include an EfficientNetV2S, Swin Transformer, and a modified Xception architecture, a new feature extraction method, as well as a Dual-Contextual Graph Attention Network (DCGAN) that is proposed to make the network focus on discriminative parts of skin lesions. A novel Dual-Contextual Graph Attention Network (DCGAN) enhances discriminative feature learning through dual-path attention mechanisms and graph-based operations that effectively capture both local textural details and global contextual patterns. The Gray World Standard Deviation (GWSD) preprocessing algorithm improves lesion visibility and removes imaging artifacts Benchmarking against an 8-class skin cancer dataset confirmed the model's efficacy, yielding 95.26% accuracy, 94.30% recall, and an AUC-ROC of 99.62%. Further validation on the HAM10000 dataset demonstrates exceptional performance with 95% accuracy, confirming the model's robustness and generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115407"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OACI: Object-aware contextual integration for image captioning OACI:用于图像字幕的对象感知上下文集成
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.knosys.2026.115374
Shuhan Xu , Mengya Han , Wei Yu , Zheng He , Xin Zhou , Yong Luo
Image captioning is a fundamental task in visual understanding, aiming to generate textual descriptions for given images. Current image captioning methods are gradually shifting towards a fully end-to-end paradigm, which leverages pre-trained vision models to process images directly and generate captions, eliminating the need for separating object detectors. These methods typically rely on global features, neglecting the precise perception of local ones. The lack of fine-grained focus on the object may result in suboptimal prototype features contaminated by surrounding noise, and thus negatively affect the generation of object-related captions. To address this issue, we propose a novel method termed object-aware context integration (OACI), which captures the salient prototypes of individual objects and understands their relationships by leveraging the global context of the entire scene. Specifically, we propose an object-aware prototype learning (OAPL) module that focuses on regions containing objects to enhance object perception and selects the most confident regions for learning object prototypes. Moreover, a class affinity constraint (CAC) is designed to facilitate the learning of these prototypes. To understand the relationships between objects, we further propose an object-context integration (OCI) module that integrates global context with local object prototypes, enhancing the understanding of image content and improving the generated image captions. We conduct extensive experiments on the popular MSCOCO, Flickr8k and Flickr30k datasets, and the results demonstrate that integrating global context with local object details significantly improves the quality of generated captions, validating the effectiveness of the proposed OACI method.
图像字幕是视觉理解中的一项基本任务,旨在为给定图像生成文本描述。当前的图像字幕方法正逐渐转向完全端到端范式,它利用预训练的视觉模型直接处理图像并生成字幕,从而消除了分离目标检测器的需要。这些方法通常依赖于全局特征,而忽略了对局部特征的精确感知。缺乏对对象的细粒度关注可能会导致受周围噪声污染的次优原型特征,从而对对象相关标题的生成产生负面影响。为了解决这个问题,我们提出了一种称为对象感知上下文集成(OACI)的新方法,该方法捕获单个对象的显著原型,并通过利用整个场景的全局上下文来理解它们之间的关系。具体来说,我们提出了一个对象感知原型学习(OAPL)模块,该模块关注包含对象的区域来增强对象感知,并选择最自信的区域来学习对象原型。此外,还设计了类关联约束(CAC)来促进这些原型的学习。为了理解对象之间的关系,我们进一步提出了一个对象-上下文集成(OCI)模块,该模块将全局上下文与局部对象原型集成在一起,增强了对图像内容的理解,并改进了生成的图像标题。我们在流行的MSCOCO、Flickr8k和Flickr30k数据集上进行了大量的实验,结果表明,将全局上下文与局部对象细节相结合可以显著提高生成字幕的质量,验证了所提出的OACI方法的有效性。
{"title":"OACI: Object-aware contextual integration for image captioning","authors":"Shuhan Xu ,&nbsp;Mengya Han ,&nbsp;Wei Yu ,&nbsp;Zheng He ,&nbsp;Xin Zhou ,&nbsp;Yong Luo","doi":"10.1016/j.knosys.2026.115374","DOIUrl":"10.1016/j.knosys.2026.115374","url":null,"abstract":"<div><div>Image captioning is a fundamental task in visual understanding, aiming to generate textual descriptions for given images. Current image captioning methods are gradually shifting towards a fully end-to-end paradigm, which leverages pre-trained vision models to process images directly and generate captions, eliminating the need for separating object detectors. These methods typically rely on global features, neglecting the precise perception of local ones. The lack of fine-grained focus on the object may result in suboptimal prototype features contaminated by surrounding noise, and thus negatively affect the generation of object-related captions. To address this issue, we propose a novel method termed object-aware context integration (OACI), which captures the salient prototypes of individual objects and understands their relationships by leveraging the global context of the entire scene. Specifically, we propose an object-aware prototype learning (OAPL) module that focuses on regions containing objects to enhance object perception and selects the most confident regions for learning object prototypes. Moreover, a class affinity constraint (CAC) is designed to facilitate the learning of these prototypes. To understand the relationships between objects, we further propose an object-context integration (OCI) module that integrates global context with local object prototypes, enhancing the understanding of image content and improving the generated image captions. We conduct extensive experiments on the popular MSCOCO, Flickr8k and Flickr30k datasets, and the results demonstrate that integrating global context with local object details significantly improves the quality of generated captions, validating the effectiveness of the proposed OACI method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115374"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-domain time-frequency Mamba: A more effective model for long-term time series forecasting 跨域时频曼巴:一种更有效的长期时间序列预测模型
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.knosys.2026.115341
Yuhang Duan, Lin Lin, Jinyuan Liu, Qing Zhang, Xin Fan
Long-term time series forecasting (LTSF) is crucial in domains such as smart energy systems and industrial Internet of Things. Existing methods face intertwined challenges in LTSF. Single-domain modeling often fails to capture local fluctuations and global trends, resulting in incomplete temporal representations. While attention-based models effectively capture long-range dependencies, their quadratic computational complexity limits their efficiency and scalability. Moreover, cross-scale conflicts frequently occur in long-term forecasting. Short-term patterns may interfere with long-term trends, thereby degrading prediction accuracy. To address these issues, we propose cross-domain time-frequency Mamba (CDTF-Mamba), which synergistically models time series in both the time and frequency domains. CDTF-Mamba’s time-domain pyramid Mamba component disentangles multiscale patterns, while the frequency-domain decomposition Mamba component stabilizes state evolution while mitigating nonstationarity. We perform extensive experiments on 13 widely used benchmark datasets. Experimental results demonstrate that CDTF-Mamba achieves superior accuracy while maintaining high efficiency and strong scalability compared with state-of-the-art methods.
长期时间序列预测(LTSF)在智能能源系统和工业物联网等领域至关重要。现有方法在LTSF中面临着错综复杂的挑战。单域建模经常不能捕获局部波动和全局趋势,导致不完整的时间表示。虽然基于注意力的模型可以有效地捕获远程依赖关系,但它们的二次计算复杂性限制了它们的效率和可扩展性。此外,在长期预测中经常出现跨尺度冲突。短期模式可能干扰长期趋势,从而降低预测的准确性。为了解决这些问题,我们提出了跨域时频曼巴(CDTF-Mamba),它在时间和频率域协同建模时间序列。CDTF-Mamba的时域金字塔曼巴分量解开了多尺度模式,而频域分解曼巴分量稳定了状态演变,同时减轻了非平稳性。我们在13个广泛使用的基准数据集上进行了广泛的实验。实验结果表明,与现有方法相比,CDTF-Mamba在保持高效率和较强可扩展性的同时,具有优越的精度。
{"title":"Cross-domain time-frequency Mamba: A more effective model for long-term time series forecasting","authors":"Yuhang Duan,&nbsp;Lin Lin,&nbsp;Jinyuan Liu,&nbsp;Qing Zhang,&nbsp;Xin Fan","doi":"10.1016/j.knosys.2026.115341","DOIUrl":"10.1016/j.knosys.2026.115341","url":null,"abstract":"<div><div>Long-term time series forecasting (LTSF) is crucial in domains such as smart energy systems and industrial Internet of Things. Existing methods face intertwined challenges in LTSF. Single-domain modeling often fails to capture local fluctuations and global trends, resulting in incomplete temporal representations. While attention-based models effectively capture long-range dependencies, their quadratic computational complexity limits their efficiency and scalability. Moreover, cross-scale conflicts frequently occur in long-term forecasting. Short-term patterns may interfere with long-term trends, thereby degrading prediction accuracy. To address these issues, we propose cross-domain time-frequency Mamba (CDTF-Mamba), which synergistically models time series in both the time and frequency domains. CDTF-Mamba’s time-domain pyramid Mamba component disentangles multiscale patterns, while the frequency-domain decomposition Mamba component stabilizes state evolution while mitigating nonstationarity. We perform extensive experiments on 13 widely used benchmark datasets. Experimental results demonstrate that CDTF-Mamba achieves superior accuracy while maintaining high efficiency and strong scalability compared with state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115341"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaptTrack: Perception field adaptation with contrastive attention for robust visual tracking 基于对比注意的感知场自适应鲁棒视觉跟踪
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.knosys.2026.115369
Yongjun Wang, Xiaohui Hao
While transformer-based methods have advanced visual object tracking, existing approaches often struggle with complex scenarios due to their reliance on fixed perception fields, limited discriminative capabilities, and insufficient predictive modeling. Current solutions utilizing attention mechanisms and feature learning techniques have made progress but face inherent limitations in adapting to dynamic scenes and maintaining robust target discrimination. We propose AdaptTrack, an innovative Transformer-based tracking framework that systematically addresses three critical limitations in existing approaches: suboptimal perception field adaptation for capturing target-specific information, insufficient target-background discrimination in cluttered environments, and inadequate predictive modeling during challenging scenarios. The framework introduces three key technical components: (1) an Adaptive Perception Field Guidance Network that dynamically optimizes feature extraction through scene-aware field configuration, (2) a Contrastive-Guided Contextual Attention mechanism that enhances discrimination through structured contrast learning, and (3) a Predictive State Transition Network that improves robustness via probabilistic state modeling. Through these innovations, our approach effectively addresses the limitations of current methods through dynamic field adaptation, explicit contrast modeling, and robust state prediction. Extensive evaluations demonstrate state-of-the-art performance on seven benchmarks (77.3% AO on GOT-10k, 73.3% AUC on LaSOT, 85.4% AUC on TrackingNet) while maintaining real-time efficiency at 32.6 FPS.
虽然基于变压器的方法具有先进的视觉目标跟踪,但现有的方法往往难以处理复杂的场景,因为它们依赖于固定的感知场,有限的判别能力和不足的预测建模。目前利用注意机制和特征学习技术的解决方案取得了进展,但在适应动态场景和保持强大的目标识别方面存在固有的局限性。我们提出了AdaptTrack,一个创新的基于变压器的跟踪框架,系统地解决了现有方法中的三个关键限制:捕获目标特定信息的次优感知场适应,混乱环境中的目标背景区分不足,以及在具有挑战性的场景中不充分的预测建模。该框架引入了三个关键技术组件:(1)自适应感知场引导网络,通过场景感知场配置动态优化特征提取;(2)对比引导上下文注意机制,通过结构化对比学习增强辨别能力;(3)预测状态转移网络,通过概率状态建模提高鲁棒性。通过这些创新,我们的方法通过动态场适应、显式对比建模和鲁棒状态预测有效地解决了当前方法的局限性。广泛的评估表明,在7个基准测试(GOT-10k上77.3%的AO, LaSOT上73.3%的AUC, TrackingNet上85.4%的AUC)上,最先进的性能,同时保持了32.6 FPS的实时效率。
{"title":"AdaptTrack: Perception field adaptation with contrastive attention for robust visual tracking","authors":"Yongjun Wang,&nbsp;Xiaohui Hao","doi":"10.1016/j.knosys.2026.115369","DOIUrl":"10.1016/j.knosys.2026.115369","url":null,"abstract":"<div><div>While transformer-based methods have advanced visual object tracking, existing approaches often struggle with complex scenarios due to their reliance on fixed perception fields, limited discriminative capabilities, and insufficient predictive modeling. Current solutions utilizing attention mechanisms and feature learning techniques have made progress but face inherent limitations in adapting to dynamic scenes and maintaining robust target discrimination. We propose AdaptTrack, an innovative Transformer-based tracking framework that systematically addresses three critical limitations in existing approaches: suboptimal perception field adaptation for capturing target-specific information, insufficient target-background discrimination in cluttered environments, and inadequate predictive modeling during challenging scenarios. The framework introduces three key technical components: (1) an Adaptive Perception Field Guidance Network that dynamically optimizes feature extraction through scene-aware field configuration, (2) a Contrastive-Guided Contextual Attention mechanism that enhances discrimination through structured contrast learning, and (3) a Predictive State Transition Network that improves robustness via probabilistic state modeling. Through these innovations, our approach effectively addresses the limitations of current methods through dynamic field adaptation, explicit contrast modeling, and robust state prediction. Extensive evaluations demonstrate state-of-the-art performance on seven benchmarks (77.3% AO on GOT-10k, 73.3% AUC on LaSOT, 85.4% AUC on TrackingNet) while maintaining real-time efficiency at 32.6 FPS.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115369"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiscale scattering forests: A domain-generalizing approach for fault diagnosis under data constraints 多尺度散射森林:数据约束下故障诊断的域泛化方法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.knosys.2026.115389
Zhuyun Chen , Hongqi Lin , Youpeng Gao , Jingke He , Zehao Li , Weihua Li , Qiang Liu
Currently, deep learning-based intelligent fault diagnosis techniques have been widely used in the manufacturing industry. However, due to various constraints, fault data for rotating machinery is often limited. Moreover, in real industrial environments, operating conditions of rotating machinery vary based on task requirements, leading to significant data variability across different operating conditions. This variability presents a major challenge for few-shot fault diagnosis, especially in scenarios requiring domain generalization across diverse operating conditions. To address this challenge, this paper proposes multiscale scattering forests (MSF): a domain-generalizing approach for fault diagnosis under data constraints. Firstly, a multiscale wavelet scattering predefined layer is designed to extract robust invariant features from input samples, where these scattering coefficients are concatenated and then used as new samples resulting from the data enhancement of the original samples. Then, a deep stacked ensemble forests with skip connection is designed to handle the transformed multiscale samples, allowing earlier information to jump over layers and improving the model’s feature representation capabilities. Finally, a similarity metric-based weighting learning strategy is developed to implement diagnostic results of each forest, integrating the models of assigning weights into an ensemble framework to enhance domain generalization performance under various operation conditions. The MSF model is comprehensively evaluated using a computer numerical control (CNC) machine tool spindle bearing dataset in an industrial environment. Experimental results demonstrate that the proposed approach not only exhibits strong diagnostic and generalization performance in few-shot scenarios without the support of additional source domains but also outperforms other state-of-the-art few-shot fault diagnosis methods.
目前,基于深度学习的智能故障诊断技术已广泛应用于制造业。然而,由于各种限制,旋转机械的故障数据往往是有限的。此外,在实际工业环境中,旋转机械的操作条件因任务要求而异,导致不同操作条件下的数据存在显著差异。这种可变性对少量故障诊断提出了主要挑战,特别是在需要跨不同操作条件进行域泛化的情况下。为了解决这一问题,本文提出了多尺度散射森林(MSF)——一种数据约束下的故障诊断域泛化方法。首先,设计一个多尺度小波散射预定义层,从输入样本中提取鲁棒不变特征,将这些散射系数串联起来,作为原始样本数据增强后的新样本;然后,设计了一个具有跳跃连接的深度堆叠集成森林来处理转换后的多尺度样本,允许早期信息跳过层,提高模型的特征表示能力。最后,提出了一种基于相似度度量的加权学习策略来实现每个森林的诊断结果,并将权重分配模型集成到一个集成框架中,以提高不同操作条件下的领域泛化性能。利用工业环境下的计算机数控机床主轴轴承数据集对MSF模型进行了综合评估。实验结果表明,该方法不仅在没有额外源域支持的情况下具有较强的诊断性能和泛化性能,而且优于其他先进的少弹故障诊断方法。
{"title":"Multiscale scattering forests: A domain-generalizing approach for fault diagnosis under data constraints","authors":"Zhuyun Chen ,&nbsp;Hongqi Lin ,&nbsp;Youpeng Gao ,&nbsp;Jingke He ,&nbsp;Zehao Li ,&nbsp;Weihua Li ,&nbsp;Qiang Liu","doi":"10.1016/j.knosys.2026.115389","DOIUrl":"10.1016/j.knosys.2026.115389","url":null,"abstract":"<div><div>Currently, deep learning-based intelligent fault diagnosis techniques have been widely used in the manufacturing industry. However, due to various constraints, fault data for rotating machinery is often limited. Moreover, in real industrial environments, operating conditions of rotating machinery vary based on task requirements, leading to significant data variability across different operating conditions. This variability presents a major challenge for few-shot fault diagnosis, especially in scenarios requiring domain generalization across diverse operating conditions. To address this challenge, this paper proposes multiscale scattering forests (MSF): a domain-generalizing approach for fault diagnosis under data constraints. Firstly, a multiscale wavelet scattering predefined layer is designed to extract robust invariant features from input samples, where these scattering coefficients are concatenated and then used as new samples resulting from the data enhancement of the original samples. Then, a deep stacked ensemble forests with skip connection is designed to handle the transformed multiscale samples, allowing earlier information to jump over layers and improving the model’s feature representation capabilities. Finally, a similarity metric-based weighting learning strategy is developed to implement diagnostic results of each forest, integrating the models of assigning weights into an ensemble framework to enhance domain generalization performance under various operation conditions. The MSF model is comprehensively evaluated using a computer numerical control (CNC) machine tool spindle bearing dataset in an industrial environment. Experimental results demonstrate that the proposed approach not only exhibits strong diagnostic and generalization performance in few-shot scenarios without the support of additional source domains but also outperforms other state-of-the-art few-shot fault diagnosis methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115389"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Omniscient bottom-up double-stream symmetric network for image captioning 全知自底向上双流对称网络图像字幕
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.knosys.2026.115366
Jianchao Li, Wei Zhou, Kai Wang, Haifeng Hu
Transformer-based image captioning models have achieved promising performance through various effective learning schemes. We contend that a truly comprehensive learning schema, defined as omniscient learning, encompasses two components: 1) a hierarchical knowledge base with low redundancy as input, and 2) a bottom-up layer-wise network as architecture. While previous captioning models primarily focus on network design and neglect the construction of knowledge base. In this paper, our hierarchical knowledge base is constituted by personalized knowledge of real-time features and contextual knowledge of consensus. Simultaneously, we devise a bottom-up double-stream symmetric network (BuNet) to progressively learn layered features. Specifically, the hierarchical knowledge base includes single-image region and grid features from the local-domain and contextual knowledge tokens from the broad-domain. Correspondingly, BuNet is divided into local-domain self-learning (LDS) stage and broad-domain consensus-learning (BDC) stage. Besides, we explore noise decoupling strategies to illustrate the extraction of contextual knowledge tokens. Furthermore, the knowledge disparity between region and grid reveals that the purely “symmetric network” of BuNet cannot effectively capture additional spatial relationships present in the region stream. Consequently, we design relative spatial encoding in LDS stage of BuNet to learn regional spatial knowledge. In addition, we employ a lightweight backbone to reduce computational complexity while providing a simple paradigm for omniscient learning. Our method is extensively tested on MS-COCO and Flickr30K, where it achieves better performance than some captioning models.
通过各种有效的学习方案,基于变压器的图像字幕模型取得了令人满意的效果。我们认为,一个真正全面的学习模式,定义为全知学习,包括两个组成部分:1)一个具有低冗余的分层知识库作为输入,2)一个自下而上的分层网络作为架构。而以往的字幕模型主要关注网络设计,忽略了知识库的构建。在本文中,我们的分层知识库由实时特征的个性化知识和共识的语境知识组成。同时,我们设计了一个自下而上的双流对称网络(BuNet)来逐步学习分层特征。具体来说,层次知识库包括来自局部域的单个图像区域和网格特征以及来自广泛域的上下文知识标记。相应地,BuNet分为局域自学习(LDS)阶段和广域共识学习(BDC)阶段。此外,我们还探讨了噪声解耦策略来说明上下文知识令牌的提取。此外,区域和网格之间的知识差异表明,纯“对称网络”的BuNet不能有效捕获区域流中存在的额外空间关系。因此,我们在BuNet的LDS阶段设计了相对空间编码来学习区域空间知识。此外,我们采用轻量级主干来降低计算复杂性,同时为全知学习提供简单的范例。我们的方法在MS-COCO和Flickr30K上进行了广泛的测试,取得了比某些字幕模型更好的性能。
{"title":"Omniscient bottom-up double-stream symmetric network for image captioning","authors":"Jianchao Li,&nbsp;Wei Zhou,&nbsp;Kai Wang,&nbsp;Haifeng Hu","doi":"10.1016/j.knosys.2026.115366","DOIUrl":"10.1016/j.knosys.2026.115366","url":null,"abstract":"<div><div>Transformer-based image captioning models have achieved promising performance through various effective learning schemes. We contend that a truly comprehensive learning schema, defined as omniscient learning, encompasses two components: 1) a hierarchical knowledge base with low redundancy as input, and 2) a bottom-up layer-wise network as architecture. While previous captioning models primarily focus on network design and neglect the construction of knowledge base. In this paper, our hierarchical knowledge base is constituted by personalized knowledge of real-time features and contextual knowledge of consensus. Simultaneously, we devise a bottom-up double-stream symmetric network (BuNet) to progressively learn layered features. Specifically, the hierarchical knowledge base includes single-image region and grid features from the local-domain and contextual knowledge tokens from the broad-domain. Correspondingly, BuNet is divided into local-domain self-learning (LDS) stage and broad-domain consensus-learning (BDC) stage. Besides, we explore noise decoupling strategies to illustrate the extraction of contextual knowledge tokens. Furthermore, the knowledge disparity between region and grid reveals that the purely “symmetric network” of BuNet cannot effectively capture additional spatial relationships present in the region stream. Consequently, we design relative spatial encoding in LDS stage of BuNet to learn regional spatial knowledge. In addition, we employ a lightweight backbone to reduce computational complexity while providing a simple paradigm for omniscient learning. Our method is extensively tested on MS-COCO and Flickr30K, where it achieves better performance than some captioning models.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115366"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UECNet: A unified framework for exposure correction utilizing region-level prompts UECNet:使用区域级提示进行曝光校正的统一框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115365
Shucheng Xia , Kan Chang , Yuqing Li , Mingyang Ling , Xuxin Tai , Yehua Ling , Yujian Yuan , Zan Gao
In real-world scenarios, complex illumination often causes improper exposure in images. Most existing correction methods assume uniform exposure degradation across the entire image, leading to suboptimal performance when multiple exposure degradations coexist in a single image. To address this limitation, we propose UECNet, a Unified Exposure Correction Network guided by region-level prompts. Specifically, we first derive five degradation-specific text prompts through prompt tuning. These prompts are fed into our Exposure Prompts Generation (EPG) module, which generates spatially adaptive, region-level descriptors to characterize local exposure properties. To effectively integrate these region-specific descriptors into the exposure correction pipeline, we design a Prompt-guided Token Mixer (PTM) module. The PTM enables global interactive modeling between high-dimensional visual features and region-level prompts, thereby dynamically steering the correction process. UECNet is built by incorporating EPG and PTM into a U-shaped Transformer backbone. Furthermore, we introduce SICE-DE (SICE-based Diverse Exposure), a new benchmark dataset reorganized from the well-known SICE dataset, to facilitate effective training and comprehensive evaluation. SICE-DE covers six distinct exposure conditions, including challenging severe over/underexposure and non-uniform exposure. Extensive experiments demonstrate that the proposed UECNet consistently outperforms state-of-the-art methods on multiple exposure correction benchmarks. Our code and the SICE-DE dataset will be available at https://github.com/ShuchengXia/UECNet.
在现实世界中,复杂的照明经常会导致图像曝光不当。大多数现有的校正方法假设整个图像的曝光退化是均匀的,当多个曝光退化共存于单个图像时,导致性能不理想。为了解决这一限制,我们提出了UECNet,一个由区域级提示引导的统一曝光校正网络。具体来说,我们首先通过提示调优获得5个特定于降级的文本提示。这些提示被输入到我们的暴露提示生成(EPG)模块中,该模块生成空间自适应的区域级描述符,以表征局部暴露属性。为了有效地将这些特定区域的描述符集成到曝光校正管道中,我们设计了一个提示引导的令牌混合器(PTM)模块。PTM支持高维视觉特征和区域级提示之间的全局交互建模,从而动态地指导校正过程。UECNet是通过将EPG和PTM整合到u型Transformer主干中而构建的。此外,我们引入了SICE- de (SICE-based Diverse Exposure),这是一个从著名的SICE数据集重组的新的基准数据集,以促进有效的训练和综合评估。SICE-DE涵盖六种不同的曝光条件,包括具有挑战性的严重过度/不足曝光和不均匀曝光。广泛的实验表明,所提出的UECNet在多重曝光校正基准上始终优于最先进的方法。我们的代码和SICE-DE数据集可以在https://github.com/ShuchengXia/UECNet上获得。
{"title":"UECNet: A unified framework for exposure correction utilizing region-level prompts","authors":"Shucheng Xia ,&nbsp;Kan Chang ,&nbsp;Yuqing Li ,&nbsp;Mingyang Ling ,&nbsp;Xuxin Tai ,&nbsp;Yehua Ling ,&nbsp;Yujian Yuan ,&nbsp;Zan Gao","doi":"10.1016/j.knosys.2026.115365","DOIUrl":"10.1016/j.knosys.2026.115365","url":null,"abstract":"<div><div>In real-world scenarios, complex illumination often causes improper exposure in images. Most existing correction methods assume uniform exposure degradation across the entire image, leading to suboptimal performance when multiple exposure degradations coexist in a single image. To address this limitation, we propose UECNet, a Unified Exposure Correction Network guided by region-level prompts. Specifically, we first derive five degradation-specific text prompts through prompt tuning. These prompts are fed into our Exposure Prompts Generation (EPG) module, which generates spatially adaptive, region-level descriptors to characterize local exposure properties. To effectively integrate these region-specific descriptors into the exposure correction pipeline, we design a Prompt-guided Token Mixer (PTM) module. The PTM enables global interactive modeling between high-dimensional visual features and region-level prompts, thereby dynamically steering the correction process. UECNet is built by incorporating EPG and PTM into a U-shaped Transformer backbone. Furthermore, we introduce SICE-DE (SICE-based Diverse Exposure), a new benchmark dataset reorganized from the well-known SICE dataset, to facilitate effective training and comprehensive evaluation. SICE-DE covers six distinct exposure conditions, including challenging severe over/underexposure and non-uniform exposure. Extensive experiments demonstrate that the proposed UECNet consistently outperforms state-of-the-art methods on multiple exposure correction benchmarks. Our code and the SICE-DE dataset will be available at <span><span>https://github.com/ShuchengXia/UECNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115365"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1