Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115359
Laura Menotti, Stefano Marchesin, Gianmaria Silvello
Document-Level Relation Extraction (DocRE) presents significant challenges due to its reliance on cross-sentence context and the long-tail distribution of relation types, where many relations have scarce training examples. In this work, we introduce DOcument-level Relation Extraction optiMizing the long taIl (DOREMI), an iterative framework that enhances underrepresented relations through minimal yet targeted manual annotations. Unlike previous approaches that rely on large-scale noisy data or heuristic denoising, DOREMI actively selects the most informative examples to improve training efficiency and robustness. DOREMI can be applied to any existing DocRE model and is effective at mitigating long-tail biases, offering a scalable solution to improve generalization on rare relations.
{"title":"DOREMI: Optimizing long tail predictions in document-level relation extraction","authors":"Laura Menotti, Stefano Marchesin, Gianmaria Silvello","doi":"10.1016/j.knosys.2026.115359","DOIUrl":"10.1016/j.knosys.2026.115359","url":null,"abstract":"<div><div>Document-Level Relation Extraction (DocRE) presents significant challenges due to its reliance on cross-sentence context and the long-tail distribution of relation types, where many relations have scarce training examples. In this work, we introduce <strong>DO</strong>cument-level <strong>R</strong>elation <strong>E</strong>xtraction opti<strong>M</strong>izing the long ta<strong>I</strong>l (DOREMI), an iterative framework that enhances underrepresented relations through minimal yet targeted manual annotations. Unlike previous approaches that rely on large-scale noisy data or heuristic denoising, DOREMI actively selects the most informative examples to improve training efficiency and robustness. DOREMI can be applied to any existing DocRE model and is effective at mitigating long-tail biases, offering a scalable solution to improve generalization on rare relations.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115359"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115390
Wei Zhang , Changhong Jiang , Ming Xia , Lulu Wang , Zhongtian Hu , Jiashi Lin , Ronghan Li
Maintaining personality consistency is essential for improving the performance of empathetic dialogue systems. However, existing approaches to persona-aware empathetic response generation commonly exhibit two fundamental limitations in persona information extraction: (1) an inherent trade-off between the richness of information and contextual consistency, and (2) a unidirectional extraction strategy that considers only one interlocutor in the dialogue history. To address these limitations, this study proposes a method that utilizes Pre-trained Language Models (PLMs) and Large Language Models (LLMs) to extract dense persona information from all historical utterances of each participant in the training set, based on their participant IDs. Building on this, we introduce PDPA, a prompt-driven framework that jointly models user and agent perspectives. Specifically, a novel prompt template with three special tokens is designed to explicitly distinguish persona information from dialogue history during feature extraction. Furthermore, a persona-aware heterogeneous graph is constructed to enhance the aggregation of discourse structure, personality traits, complete dialogue history, and external knowledge. Finally, to ensure the effective use of refined persona information together with essential contextual details during generation, a dialogue decoder equipped with a dynamic pointer network is employed. Experimental evaluations demonstrate that the proposed model consistently outperforms strong baselines on two datasets derived from the EMPATHETICDIALOGUES benchmark. In particular, compared with its backbone BART, PDPA achieves notable improvements in emotion classification accuracy, with an increase of 4.73% when assisted by LLM-generated persona information and 4.36% when assisted by PLM-generated persona information, highlighting the effectiveness of our approach.
{"title":"PDPA: A prompt-based dual persona-aware approach for empathetic response generation","authors":"Wei Zhang , Changhong Jiang , Ming Xia , Lulu Wang , Zhongtian Hu , Jiashi Lin , Ronghan Li","doi":"10.1016/j.knosys.2026.115390","DOIUrl":"10.1016/j.knosys.2026.115390","url":null,"abstract":"<div><div>Maintaining personality consistency is essential for improving the performance of empathetic dialogue systems. However, existing approaches to persona-aware empathetic response generation commonly exhibit two fundamental limitations in persona information extraction: (1) an inherent trade-off between the richness of information and contextual consistency, and (2) a unidirectional extraction strategy that considers only one interlocutor in the dialogue history. To address these limitations, this study proposes a method that utilizes Pre-trained Language Models (PLMs) and Large Language Models (LLMs) to extract dense persona information from all historical utterances of each participant in the training set, based on their participant IDs. Building on this, we introduce PDPA, a prompt-driven framework that jointly models user and agent perspectives. Specifically, a novel prompt template with three special tokens is designed to explicitly distinguish persona information from dialogue history during feature extraction. Furthermore, a persona-aware heterogeneous graph is constructed to enhance the aggregation of discourse structure, personality traits, complete dialogue history, and external knowledge. Finally, to ensure the effective use of refined persona information together with essential contextual details during generation, a dialogue decoder equipped with a dynamic pointer network is employed. Experimental evaluations demonstrate that the proposed model consistently outperforms strong baselines on two datasets derived from the EMPATHETICDIALOGUES benchmark. In particular, compared with its backbone BART, PDPA achieves notable improvements in emotion classification accuracy, with an increase of 4.73% when assisted by LLM-generated persona information and 4.36% when assisted by PLM-generated persona information, highlighting the effectiveness of our approach.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115390"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115342
Liyu Fang , Wu Wen , Xiaolin Zheng
Federated Learning (FL) with concept drift faces three fundamental challenges. First, existing methods lack a drift-aware client representation that can directly reflect changes in data distributions. Second, clustering with drifting clients often causes collaborative instability by contaminating the structure of client groups. Third, many approaches suffer from a methodological disconnect between drift detection and adaptation.
To address these challenges, we propose FedDCA, a stable and unified framework for federated concept drift adaptation. FedDCA introduces a Label Profile (LP), a compact distributional representation that captures each client’s current data concept and enables principled drift-aware similarity measurement. Based on LPs, FedDCA employs Drift-Aware Anchor Clustering, which performs Variational Wasserstein Clustering exclusively on stable clients to form robust anchor centroids, thereby preserving collaborative stability. Drifting clients are then assigned to the nearest anchor, allowing rapid adaptation without destabilizing the overall system. By unifying drift detection and clustering adaptation within the same Wasserstein metric space, FedDCA provides a consistent and effective response to dynamic environments. Extensive experiments demonstrate that FedDCA significantly outperforms state-of-the-art methods in both accuracy and adaptation speed under various concept drift scenarios.
{"title":"FedDCA : Stable and unified Wasserstein adaptation to federated concept drift","authors":"Liyu Fang , Wu Wen , Xiaolin Zheng","doi":"10.1016/j.knosys.2026.115342","DOIUrl":"10.1016/j.knosys.2026.115342","url":null,"abstract":"<div><div>Federated Learning (FL) with concept drift faces three fundamental challenges. First, existing methods lack a drift-aware client representation that can directly reflect changes in data distributions. Second, clustering with drifting clients often causes collaborative instability by contaminating the structure of client groups. Third, many approaches suffer from a methodological disconnect between drift detection and adaptation.</div><div>To address these challenges, we propose FedDCA, a stable and unified framework for federated concept drift adaptation. FedDCA introduces a Label Profile (LP), a compact distributional representation that captures each client’s current data concept and enables principled drift-aware similarity measurement. Based on LPs, FedDCA employs Drift-Aware Anchor Clustering, which performs Variational Wasserstein Clustering exclusively on stable clients to form robust anchor centroids, thereby preserving collaborative stability. Drifting clients are then assigned to the nearest anchor, allowing rapid adaptation without destabilizing the overall system. By unifying drift detection and clustering adaptation within the same Wasserstein metric space, FedDCA provides a consistent and effective response to dynamic environments. Extensive experiments demonstrate that FedDCA significantly outperforms state-of-the-art methods in both accuracy and adaptation speed under various concept drift scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115342"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115384
Uzma Hasan, Md Osman Gani
Efficient causal discovery is essential for constructing reliable causal graphs that provide actionable insights in domains where randomized experiments are infeasible. This study introduces DKC, a novel causal discovery algorithm that utilizes both observational data and prior knowledge to enable reliable learning of causal graphs that supports decision-making in complex domains such as healthcare. Traditional causal discovery methods often rely exclusively on observational data, which reduces their effectiveness when datasets are noisy, limited in size, or involve intricate causal relationships. Moreover, existing approaches seldom incorporate prior knowledge in a flexible manner, limiting their applicability in real-world scenarios. DKC addresses these challenges by efficiently incorporating causal priors into the discovery process through a tailored scoring criterion that supports both hard and soft constraints. The framework operates in three stages: (i) estimation of a topological ordering of variables, (ii) ranking candidate edges according to likelihood, and (iii) performing a constrained causal search using the proposed score to balance model fit, complexity, and prior knowledge. We establish theoretical guarantees demonstrating that the score is statistically consistent, converging to the true causal structure as sample size grows. Extensive experiments on synthetic datasets of varying scales, as well as real-world healthcare data, confirm that DKC outperforms state-of-the-art baselines in terms of structural accuracy and robustness. By harmonizing data-driven insights with prior knowledge, DKC provides a trustworthy foundation for causal inference across diverse fields. Its application to a clinical problem highlights its potential to guide critical decision-making, while its general framework ensures broad utility in any domains requiring reliable, knowledge-informed causal reasoning.
{"title":"DKC: Data-driven and knowledge-guided causal discovery with application to healthcare data","authors":"Uzma Hasan, Md Osman Gani","doi":"10.1016/j.knosys.2026.115384","DOIUrl":"10.1016/j.knosys.2026.115384","url":null,"abstract":"<div><div>Efficient causal discovery is essential for constructing reliable causal graphs that provide actionable insights in domains where randomized experiments are infeasible. This study introduces DKC, a novel causal discovery algorithm that utilizes both observational data and prior knowledge to enable reliable learning of causal graphs that supports decision-making in complex domains such as healthcare. Traditional causal discovery methods often rely exclusively on observational data, which reduces their effectiveness when datasets are noisy, limited in size, or involve intricate causal relationships. Moreover, existing approaches seldom incorporate prior knowledge in a flexible manner, limiting their applicability in real-world scenarios. DKC addresses these challenges by efficiently incorporating causal priors into the discovery process through a tailored scoring criterion that supports both hard and soft constraints. The framework operates in three stages: (i) estimation of a topological ordering of variables, (ii) ranking candidate edges according to likelihood, and (iii) performing a constrained causal search using the proposed score to balance model fit, complexity, and prior knowledge. We establish theoretical guarantees demonstrating that the score is statistically consistent, converging to the true causal structure as sample size grows. Extensive experiments on synthetic datasets of varying scales, as well as real-world healthcare data, confirm that DKC outperforms state-of-the-art baselines in terms of structural accuracy and robustness. By harmonizing data-driven insights with prior knowledge, DKC provides a trustworthy foundation for causal inference across diverse fields. Its application to a clinical problem highlights its potential to guide critical decision-making, while its general framework ensures broad utility in any domains requiring reliable, knowledge-informed causal reasoning.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115384"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115383
Yuankun Xia, Hui Wang, Yufeng Zhou
Federated learning (FL) for multi-domain visual recognition confronts significant challenges due to heterogeneous data distributions and domain shifts, which severely impair the semantic generalization capability of existing methods. To address these challenges, we propose FedCLIP-Distill, a novel framework that employs dual-domain knowledge distillation (KD) and contrastive relational distillation (CRD) to leverage the powerful visual-language alignment of CLIP in heterogeneous FL environments. Our approach employs a centralized CLIP teacher model to distill robust visual-textual semantics into lightweight client-side student models, thereby enabling effective local domain adaptation. We provide a theoretical convergence analysis proving that our distillation mechanism effectively mitigates domain gaps and facilitates robust convergence under non-IID settings. Extensive experiments on Office-Caltech10 and DomainNet benchmarks show that FedCLIP-Distill outperforms other methods: it achieves an average cross-domain accuracy of 98.5% on Office-Caltech10 and 80.50% on DomainNet. In different heterogeneous situations (e.g., Dirichlet α = 0.5, 9.52% higher than FedCLIP), demonstrating significant improvements in accuracy and generalization under heterogeneous scenarios. The source code is available at https://github.com/Yuankun-Xia/FedCLIP-Distill.
{"title":"FedCLIP-Distill: Heterogeneous federated cross-modal knowledge distillation for multi-domain visual recognition","authors":"Yuankun Xia, Hui Wang, Yufeng Zhou","doi":"10.1016/j.knosys.2026.115383","DOIUrl":"10.1016/j.knosys.2026.115383","url":null,"abstract":"<div><div>Federated learning (FL) for multi-domain visual recognition confronts significant challenges due to heterogeneous data distributions and domain shifts, which severely impair the semantic generalization capability of existing methods. To address these challenges, we propose FedCLIP-Distill, a novel framework that employs dual-domain knowledge distillation (KD) and contrastive relational distillation (CRD) to leverage the powerful visual-language alignment of CLIP in heterogeneous FL environments. Our approach employs a centralized CLIP teacher model to distill robust visual-textual semantics into lightweight client-side student models, thereby enabling effective local domain adaptation. We provide a theoretical convergence analysis proving that our distillation mechanism effectively mitigates domain gaps and facilitates robust convergence under non-IID settings. Extensive experiments on Office-Caltech10 and DomainNet benchmarks show that FedCLIP-Distill outperforms other methods: it achieves an average cross-domain accuracy of 98.5% on Office-Caltech10 and 80.50% on DomainNet. In different heterogeneous situations (e.g., Dirichlet <em>α</em> = 0.5, 9.52% higher than FedCLIP), demonstrating significant improvements in accuracy and generalization under heterogeneous scenarios. The source code is available at <span><span>https://github.com/Yuankun-Xia/FedCLIP-Distill</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115383"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115385
Guanghua Ding , Rui Tang , Xian Mo
Heterogeneous graph learning aims to extract semantic and structural information from multiple node types, edges, and meta-paths, learning low-dimensional embeddings that preserve core characteristics to support downstream tasks. To address the core challenges of insufficient semantic mining and weak learning synergy in heterogeneous graph learning, this paper proposes a heterogeneous graph learning method integrating Semantic-aware Meta-path perturbation and Collaborative Dual-learning optimization(SMCD). First, the method constructs auxiliary meta-paths based on the original meta-paths, and then designs two augmentation schemes to generate augmented views: For semantic-level augmentation, it performs edge perturbation based on semantic similarity, and enhances the semantics of core meta-paths with the semantics of auxiliary meta-paths via a diffusion model; For task-level augmentation, it utilizes a diffusion model and semantic weights to select the top-k semantically relevant nodes for each node in the core meta-path graph, reconstructing the meta-paths graph structure. Then, a two-stage attention aggregation graph encoder is adopted to output the final node embeddings. Finally, a self-supervised and supervised (i.e., Dual-learning) collaborative optimization strategy that flexibly adapts to label distribution is used to optimize the objective-this not only balances the discriminability and generality of representations but also adapts to scenarios with different degrees of label scarcity. Experimental results on three public datasets illustrate that our proposed method achieves remarkable advantages in both node classification and node clustering tasks. Our datasets and source code are available.1
{"title":"Enhancing Heterogeneous Graph Learning with Semantic-Aware Meta-Path Diffusion and Dual Optimization","authors":"Guanghua Ding , Rui Tang , Xian Mo","doi":"10.1016/j.knosys.2026.115385","DOIUrl":"10.1016/j.knosys.2026.115385","url":null,"abstract":"<div><div>Heterogeneous graph learning aims to extract semantic and structural information from multiple node types, edges, and meta-paths, learning low-dimensional embeddings that preserve core characteristics to support downstream tasks. To address the core challenges of insufficient semantic mining and weak learning synergy in heterogeneous graph learning, this paper proposes a heterogeneous graph learning method integrating <u>S</u>emantic-aware <u>M</u>eta-path perturbation and <u>C</u>ollaborative <u>D</u>ual-learning optimization(SMCD). First, the method constructs auxiliary meta-paths based on the original meta-paths, and then designs two augmentation schemes to generate augmented views: For semantic-level augmentation, it performs edge perturbation based on semantic similarity, and enhances the semantics of core meta-paths with the semantics of auxiliary meta-paths via a diffusion model; For task-level augmentation, it utilizes a diffusion model and semantic weights to select the top-k semantically relevant nodes for each node in the core meta-path graph, reconstructing the meta-paths graph structure. Then, a two-stage attention aggregation graph encoder is adopted to output the final node embeddings. Finally, a self-supervised and supervised (i.e., Dual-learning) collaborative optimization strategy that flexibly adapts to label distribution is used to optimize the objective-this not only balances the discriminability and generality of representations but also adapts to scenarios with different degrees of label scarcity. Experimental results on three public datasets illustrate that our proposed method achieves remarkable advantages in both node classification and node clustering tasks. Our datasets and source code are available.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115385"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115407
Adnan Saeed , Khurram Shehzad , Muhammad Ghulam Abbas Malik , Saim Ahmed , Ahmad Taher Azar
Accurate early-stage diagnosis of skin lesions remains challenging for dermatologists due to visual complexity and subtle inter-class differences. Traditional computer-assisted diagnostic tools struggle to capture detailed patterns and contextual relationships, especially under varying imaging conditions. In this study, we introduce TransXV2S-Net, a new hybrid deep-learning model based on multiple branches designed for automated skin lesion classification. These branches enable to extract features at different stages from skin lesions separately and learn complex combinations between them. These branches include an EfficientNetV2S, Swin Transformer, and a modified Xception architecture, a new feature extraction method, as well as a Dual-Contextual Graph Attention Network (DCGAN) that is proposed to make the network focus on discriminative parts of skin lesions. A novel Dual-Contextual Graph Attention Network (DCGAN) enhances discriminative feature learning through dual-path attention mechanisms and graph-based operations that effectively capture both local textural details and global contextual patterns. The Gray World Standard Deviation (GWSD) preprocessing algorithm improves lesion visibility and removes imaging artifacts Benchmarking against an 8-class skin cancer dataset confirmed the model's efficacy, yielding 95.26% accuracy, 94.30% recall, and an AUC-ROC of 99.62%. Further validation on the HAM10000 dataset demonstrates exceptional performance with 95% accuracy, confirming the model's robustness and generalization capability.
{"title":"TransXV2S-NET: A novel hybrid deep learning architecture with dual-contextual graph attention for multi-class skin lesion classification","authors":"Adnan Saeed , Khurram Shehzad , Muhammad Ghulam Abbas Malik , Saim Ahmed , Ahmad Taher Azar","doi":"10.1016/j.knosys.2026.115407","DOIUrl":"10.1016/j.knosys.2026.115407","url":null,"abstract":"<div><div>Accurate early-stage diagnosis of skin lesions remains challenging for dermatologists due to visual complexity and subtle inter-class differences. Traditional computer-assisted diagnostic tools struggle to capture detailed patterns and contextual relationships, especially under varying imaging conditions. In this study, we introduce TransXV2S-Net, a new hybrid deep-learning model based on multiple branches designed for automated skin lesion classification. These branches enable to extract features at different stages from skin lesions separately and learn complex combinations between them. These branches include an EfficientNetV2S, Swin Transformer, and a modified Xception architecture, a new feature extraction method, as well as a Dual-Contextual Graph Attention Network (DCGAN) that is proposed to make the network focus on discriminative parts of skin lesions. A novel Dual-Contextual Graph Attention Network (DCGAN) enhances discriminative feature learning through dual-path attention mechanisms and graph-based operations that effectively capture both local textural details and global contextual patterns. The Gray World Standard Deviation (GWSD) preprocessing algorithm improves lesion visibility and removes imaging artifacts Benchmarking against an 8-class skin cancer dataset confirmed the model's efficacy, yielding 95.26% accuracy, 94.30% recall, and an AUC-ROC of 99.62%. Further validation on the HAM10000 dataset demonstrates exceptional performance with 95% accuracy, confirming the model's robustness and generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115407"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115374
Shuhan Xu , Mengya Han , Wei Yu , Zheng He , Xin Zhou , Yong Luo
Image captioning is a fundamental task in visual understanding, aiming to generate textual descriptions for given images. Current image captioning methods are gradually shifting towards a fully end-to-end paradigm, which leverages pre-trained vision models to process images directly and generate captions, eliminating the need for separating object detectors. These methods typically rely on global features, neglecting the precise perception of local ones. The lack of fine-grained focus on the object may result in suboptimal prototype features contaminated by surrounding noise, and thus negatively affect the generation of object-related captions. To address this issue, we propose a novel method termed object-aware context integration (OACI), which captures the salient prototypes of individual objects and understands their relationships by leveraging the global context of the entire scene. Specifically, we propose an object-aware prototype learning (OAPL) module that focuses on regions containing objects to enhance object perception and selects the most confident regions for learning object prototypes. Moreover, a class affinity constraint (CAC) is designed to facilitate the learning of these prototypes. To understand the relationships between objects, we further propose an object-context integration (OCI) module that integrates global context with local object prototypes, enhancing the understanding of image content and improving the generated image captions. We conduct extensive experiments on the popular MSCOCO, Flickr8k and Flickr30k datasets, and the results demonstrate that integrating global context with local object details significantly improves the quality of generated captions, validating the effectiveness of the proposed OACI method.
{"title":"OACI: Object-aware contextual integration for image captioning","authors":"Shuhan Xu , Mengya Han , Wei Yu , Zheng He , Xin Zhou , Yong Luo","doi":"10.1016/j.knosys.2026.115374","DOIUrl":"10.1016/j.knosys.2026.115374","url":null,"abstract":"<div><div>Image captioning is a fundamental task in visual understanding, aiming to generate textual descriptions for given images. Current image captioning methods are gradually shifting towards a fully end-to-end paradigm, which leverages pre-trained vision models to process images directly and generate captions, eliminating the need for separating object detectors. These methods typically rely on global features, neglecting the precise perception of local ones. The lack of fine-grained focus on the object may result in suboptimal prototype features contaminated by surrounding noise, and thus negatively affect the generation of object-related captions. To address this issue, we propose a novel method termed object-aware context integration (OACI), which captures the salient prototypes of individual objects and understands their relationships by leveraging the global context of the entire scene. Specifically, we propose an object-aware prototype learning (OAPL) module that focuses on regions containing objects to enhance object perception and selects the most confident regions for learning object prototypes. Moreover, a class affinity constraint (CAC) is designed to facilitate the learning of these prototypes. To understand the relationships between objects, we further propose an object-context integration (OCI) module that integrates global context with local object prototypes, enhancing the understanding of image content and improving the generated image captions. We conduct extensive experiments on the popular MSCOCO, Flickr8k and Flickr30k datasets, and the results demonstrate that integrating global context with local object details significantly improves the quality of generated captions, validating the effectiveness of the proposed OACI method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115374"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.knosys.2026.115341
Yuhang Duan, Lin Lin, Jinyuan Liu, Qing Zhang, Xin Fan
Long-term time series forecasting (LTSF) is crucial in domains such as smart energy systems and industrial Internet of Things. Existing methods face intertwined challenges in LTSF. Single-domain modeling often fails to capture local fluctuations and global trends, resulting in incomplete temporal representations. While attention-based models effectively capture long-range dependencies, their quadratic computational complexity limits their efficiency and scalability. Moreover, cross-scale conflicts frequently occur in long-term forecasting. Short-term patterns may interfere with long-term trends, thereby degrading prediction accuracy. To address these issues, we propose cross-domain time-frequency Mamba (CDTF-Mamba), which synergistically models time series in both the time and frequency domains. CDTF-Mamba’s time-domain pyramid Mamba component disentangles multiscale patterns, while the frequency-domain decomposition Mamba component stabilizes state evolution while mitigating nonstationarity. We perform extensive experiments on 13 widely used benchmark datasets. Experimental results demonstrate that CDTF-Mamba achieves superior accuracy while maintaining high efficiency and strong scalability compared with state-of-the-art methods.
{"title":"Cross-domain time-frequency Mamba: A more effective model for long-term time series forecasting","authors":"Yuhang Duan, Lin Lin, Jinyuan Liu, Qing Zhang, Xin Fan","doi":"10.1016/j.knosys.2026.115341","DOIUrl":"10.1016/j.knosys.2026.115341","url":null,"abstract":"<div><div>Long-term time series forecasting (LTSF) is crucial in domains such as smart energy systems and industrial Internet of Things. Existing methods face intertwined challenges in LTSF. Single-domain modeling often fails to capture local fluctuations and global trends, resulting in incomplete temporal representations. While attention-based models effectively capture long-range dependencies, their quadratic computational complexity limits their efficiency and scalability. Moreover, cross-scale conflicts frequently occur in long-term forecasting. Short-term patterns may interfere with long-term trends, thereby degrading prediction accuracy. To address these issues, we propose cross-domain time-frequency Mamba (CDTF-Mamba), which synergistically models time series in both the time and frequency domains. CDTF-Mamba’s time-domain pyramid Mamba component disentangles multiscale patterns, while the frequency-domain decomposition Mamba component stabilizes state evolution while mitigating nonstationarity. We perform extensive experiments on 13 widely used benchmark datasets. Experimental results demonstrate that CDTF-Mamba achieves superior accuracy while maintaining high efficiency and strong scalability compared with state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115341"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.knosys.2026.115369
Yongjun Wang, Xiaohui Hao
While transformer-based methods have advanced visual object tracking, existing approaches often struggle with complex scenarios due to their reliance on fixed perception fields, limited discriminative capabilities, and insufficient predictive modeling. Current solutions utilizing attention mechanisms and feature learning techniques have made progress but face inherent limitations in adapting to dynamic scenes and maintaining robust target discrimination. We propose AdaptTrack, an innovative Transformer-based tracking framework that systematically addresses three critical limitations in existing approaches: suboptimal perception field adaptation for capturing target-specific information, insufficient target-background discrimination in cluttered environments, and inadequate predictive modeling during challenging scenarios. The framework introduces three key technical components: (1) an Adaptive Perception Field Guidance Network that dynamically optimizes feature extraction through scene-aware field configuration, (2) a Contrastive-Guided Contextual Attention mechanism that enhances discrimination through structured contrast learning, and (3) a Predictive State Transition Network that improves robustness via probabilistic state modeling. Through these innovations, our approach effectively addresses the limitations of current methods through dynamic field adaptation, explicit contrast modeling, and robust state prediction. Extensive evaluations demonstrate state-of-the-art performance on seven benchmarks (77.3% AO on GOT-10k, 73.3% AUC on LaSOT, 85.4% AUC on TrackingNet) while maintaining real-time efficiency at 32.6 FPS.
{"title":"AdaptTrack: Perception field adaptation with contrastive attention for robust visual tracking","authors":"Yongjun Wang, Xiaohui Hao","doi":"10.1016/j.knosys.2026.115369","DOIUrl":"10.1016/j.knosys.2026.115369","url":null,"abstract":"<div><div>While transformer-based methods have advanced visual object tracking, existing approaches often struggle with complex scenarios due to their reliance on fixed perception fields, limited discriminative capabilities, and insufficient predictive modeling. Current solutions utilizing attention mechanisms and feature learning techniques have made progress but face inherent limitations in adapting to dynamic scenes and maintaining robust target discrimination. We propose AdaptTrack, an innovative Transformer-based tracking framework that systematically addresses three critical limitations in existing approaches: suboptimal perception field adaptation for capturing target-specific information, insufficient target-background discrimination in cluttered environments, and inadequate predictive modeling during challenging scenarios. The framework introduces three key technical components: (1) an Adaptive Perception Field Guidance Network that dynamically optimizes feature extraction through scene-aware field configuration, (2) a Contrastive-Guided Contextual Attention mechanism that enhances discrimination through structured contrast learning, and (3) a Predictive State Transition Network that improves robustness via probabilistic state modeling. Through these innovations, our approach effectively addresses the limitations of current methods through dynamic field adaptation, explicit contrast modeling, and robust state prediction. Extensive evaluations demonstrate state-of-the-art performance on seven benchmarks (77.3% AO on GOT-10k, 73.3% AUC on LaSOT, 85.4% AUC on TrackingNet) while maintaining real-time efficiency at 32.6 FPS.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115369"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}