Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115342
Liyu Fang , Wu Wen , Xiaolin Zheng
Federated Learning (FL) with concept drift faces three fundamental challenges. First, existing methods lack a drift-aware client representation that can directly reflect changes in data distributions. Second, clustering with drifting clients often causes collaborative instability by contaminating the structure of client groups. Third, many approaches suffer from a methodological disconnect between drift detection and adaptation.
To address these challenges, we propose FedDCA, a stable and unified framework for federated concept drift adaptation. FedDCA introduces a Label Profile (LP), a compact distributional representation that captures each client’s current data concept and enables principled drift-aware similarity measurement. Based on LPs, FedDCA employs Drift-Aware Anchor Clustering, which performs Variational Wasserstein Clustering exclusively on stable clients to form robust anchor centroids, thereby preserving collaborative stability. Drifting clients are then assigned to the nearest anchor, allowing rapid adaptation without destabilizing the overall system. By unifying drift detection and clustering adaptation within the same Wasserstein metric space, FedDCA provides a consistent and effective response to dynamic environments. Extensive experiments demonstrate that FedDCA significantly outperforms state-of-the-art methods in both accuracy and adaptation speed under various concept drift scenarios.
{"title":"FedDCA : Stable and unified Wasserstein adaptation to federated concept drift","authors":"Liyu Fang , Wu Wen , Xiaolin Zheng","doi":"10.1016/j.knosys.2026.115342","DOIUrl":"10.1016/j.knosys.2026.115342","url":null,"abstract":"<div><div>Federated Learning (FL) with concept drift faces three fundamental challenges. First, existing methods lack a drift-aware client representation that can directly reflect changes in data distributions. Second, clustering with drifting clients often causes collaborative instability by contaminating the structure of client groups. Third, many approaches suffer from a methodological disconnect between drift detection and adaptation.</div><div>To address these challenges, we propose FedDCA, a stable and unified framework for federated concept drift adaptation. FedDCA introduces a Label Profile (LP), a compact distributional representation that captures each client’s current data concept and enables principled drift-aware similarity measurement. Based on LPs, FedDCA employs Drift-Aware Anchor Clustering, which performs Variational Wasserstein Clustering exclusively on stable clients to form robust anchor centroids, thereby preserving collaborative stability. Drifting clients are then assigned to the nearest anchor, allowing rapid adaptation without destabilizing the overall system. By unifying drift detection and clustering adaptation within the same Wasserstein metric space, FedDCA provides a consistent and effective response to dynamic environments. Extensive experiments demonstrate that FedDCA significantly outperforms state-of-the-art methods in both accuracy and adaptation speed under various concept drift scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115342"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115384
Uzma Hasan, Md Osman Gani
Efficient causal discovery is essential for constructing reliable causal graphs that provide actionable insights in domains where randomized experiments are infeasible. This study introduces DKC, a novel causal discovery algorithm that utilizes both observational data and prior knowledge to enable reliable learning of causal graphs that supports decision-making in complex domains such as healthcare. Traditional causal discovery methods often rely exclusively on observational data, which reduces their effectiveness when datasets are noisy, limited in size, or involve intricate causal relationships. Moreover, existing approaches seldom incorporate prior knowledge in a flexible manner, limiting their applicability in real-world scenarios. DKC addresses these challenges by efficiently incorporating causal priors into the discovery process through a tailored scoring criterion that supports both hard and soft constraints. The framework operates in three stages: (i) estimation of a topological ordering of variables, (ii) ranking candidate edges according to likelihood, and (iii) performing a constrained causal search using the proposed score to balance model fit, complexity, and prior knowledge. We establish theoretical guarantees demonstrating that the score is statistically consistent, converging to the true causal structure as sample size grows. Extensive experiments on synthetic datasets of varying scales, as well as real-world healthcare data, confirm that DKC outperforms state-of-the-art baselines in terms of structural accuracy and robustness. By harmonizing data-driven insights with prior knowledge, DKC provides a trustworthy foundation for causal inference across diverse fields. Its application to a clinical problem highlights its potential to guide critical decision-making, while its general framework ensures broad utility in any domains requiring reliable, knowledge-informed causal reasoning.
{"title":"DKC: Data-driven and knowledge-guided causal discovery with application to healthcare data","authors":"Uzma Hasan, Md Osman Gani","doi":"10.1016/j.knosys.2026.115384","DOIUrl":"10.1016/j.knosys.2026.115384","url":null,"abstract":"<div><div>Efficient causal discovery is essential for constructing reliable causal graphs that provide actionable insights in domains where randomized experiments are infeasible. This study introduces DKC, a novel causal discovery algorithm that utilizes both observational data and prior knowledge to enable reliable learning of causal graphs that supports decision-making in complex domains such as healthcare. Traditional causal discovery methods often rely exclusively on observational data, which reduces their effectiveness when datasets are noisy, limited in size, or involve intricate causal relationships. Moreover, existing approaches seldom incorporate prior knowledge in a flexible manner, limiting their applicability in real-world scenarios. DKC addresses these challenges by efficiently incorporating causal priors into the discovery process through a tailored scoring criterion that supports both hard and soft constraints. The framework operates in three stages: (i) estimation of a topological ordering of variables, (ii) ranking candidate edges according to likelihood, and (iii) performing a constrained causal search using the proposed score to balance model fit, complexity, and prior knowledge. We establish theoretical guarantees demonstrating that the score is statistically consistent, converging to the true causal structure as sample size grows. Extensive experiments on synthetic datasets of varying scales, as well as real-world healthcare data, confirm that DKC outperforms state-of-the-art baselines in terms of structural accuracy and robustness. By harmonizing data-driven insights with prior knowledge, DKC provides a trustworthy foundation for causal inference across diverse fields. Its application to a clinical problem highlights its potential to guide critical decision-making, while its general framework ensures broad utility in any domains requiring reliable, knowledge-informed causal reasoning.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115384"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115383
Yuankun Xia, Hui Wang, Yufeng Zhou
Federated learning (FL) for multi-domain visual recognition confronts significant challenges due to heterogeneous data distributions and domain shifts, which severely impair the semantic generalization capability of existing methods. To address these challenges, we propose FedCLIP-Distill, a novel framework that employs dual-domain knowledge distillation (KD) and contrastive relational distillation (CRD) to leverage the powerful visual-language alignment of CLIP in heterogeneous FL environments. Our approach employs a centralized CLIP teacher model to distill robust visual-textual semantics into lightweight client-side student models, thereby enabling effective local domain adaptation. We provide a theoretical convergence analysis proving that our distillation mechanism effectively mitigates domain gaps and facilitates robust convergence under non-IID settings. Extensive experiments on Office-Caltech10 and DomainNet benchmarks show that FedCLIP-Distill outperforms other methods: it achieves an average cross-domain accuracy of 98.5% on Office-Caltech10 and 80.50% on DomainNet. In different heterogeneous situations (e.g., Dirichlet α = 0.5, 9.52% higher than FedCLIP), demonstrating significant improvements in accuracy and generalization under heterogeneous scenarios. The source code is available at https://github.com/Yuankun-Xia/FedCLIP-Distill.
{"title":"FedCLIP-Distill: Heterogeneous federated cross-modal knowledge distillation for multi-domain visual recognition","authors":"Yuankun Xia, Hui Wang, Yufeng Zhou","doi":"10.1016/j.knosys.2026.115383","DOIUrl":"10.1016/j.knosys.2026.115383","url":null,"abstract":"<div><div>Federated learning (FL) for multi-domain visual recognition confronts significant challenges due to heterogeneous data distributions and domain shifts, which severely impair the semantic generalization capability of existing methods. To address these challenges, we propose FedCLIP-Distill, a novel framework that employs dual-domain knowledge distillation (KD) and contrastive relational distillation (CRD) to leverage the powerful visual-language alignment of CLIP in heterogeneous FL environments. Our approach employs a centralized CLIP teacher model to distill robust visual-textual semantics into lightweight client-side student models, thereby enabling effective local domain adaptation. We provide a theoretical convergence analysis proving that our distillation mechanism effectively mitigates domain gaps and facilitates robust convergence under non-IID settings. Extensive experiments on Office-Caltech10 and DomainNet benchmarks show that FedCLIP-Distill outperforms other methods: it achieves an average cross-domain accuracy of 98.5% on Office-Caltech10 and 80.50% on DomainNet. In different heterogeneous situations (e.g., Dirichlet <em>α</em> = 0.5, 9.52% higher than FedCLIP), demonstrating significant improvements in accuracy and generalization under heterogeneous scenarios. The source code is available at <span><span>https://github.com/Yuankun-Xia/FedCLIP-Distill</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115383"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115385
Guanghua Ding , Rui Tang , Xian Mo
Heterogeneous graph learning aims to extract semantic and structural information from multiple node types, edges, and meta-paths, learning low-dimensional embeddings that preserve core characteristics to support downstream tasks. To address the core challenges of insufficient semantic mining and weak learning synergy in heterogeneous graph learning, this paper proposes a heterogeneous graph learning method integrating Semantic-aware Meta-path perturbation and Collaborative Dual-learning optimization(SMCD). First, the method constructs auxiliary meta-paths based on the original meta-paths, and then designs two augmentation schemes to generate augmented views: For semantic-level augmentation, it performs edge perturbation based on semantic similarity, and enhances the semantics of core meta-paths with the semantics of auxiliary meta-paths via a diffusion model; For task-level augmentation, it utilizes a diffusion model and semantic weights to select the top-k semantically relevant nodes for each node in the core meta-path graph, reconstructing the meta-paths graph structure. Then, a two-stage attention aggregation graph encoder is adopted to output the final node embeddings. Finally, a self-supervised and supervised (i.e., Dual-learning) collaborative optimization strategy that flexibly adapts to label distribution is used to optimize the objective-this not only balances the discriminability and generality of representations but also adapts to scenarios with different degrees of label scarcity. Experimental results on three public datasets illustrate that our proposed method achieves remarkable advantages in both node classification and node clustering tasks. Our datasets and source code are available.1
{"title":"Enhancing Heterogeneous Graph Learning with Semantic-Aware Meta-Path Diffusion and Dual Optimization","authors":"Guanghua Ding , Rui Tang , Xian Mo","doi":"10.1016/j.knosys.2026.115385","DOIUrl":"10.1016/j.knosys.2026.115385","url":null,"abstract":"<div><div>Heterogeneous graph learning aims to extract semantic and structural information from multiple node types, edges, and meta-paths, learning low-dimensional embeddings that preserve core characteristics to support downstream tasks. To address the core challenges of insufficient semantic mining and weak learning synergy in heterogeneous graph learning, this paper proposes a heterogeneous graph learning method integrating <u>S</u>emantic-aware <u>M</u>eta-path perturbation and <u>C</u>ollaborative <u>D</u>ual-learning optimization(SMCD). First, the method constructs auxiliary meta-paths based on the original meta-paths, and then designs two augmentation schemes to generate augmented views: For semantic-level augmentation, it performs edge perturbation based on semantic similarity, and enhances the semantics of core meta-paths with the semantics of auxiliary meta-paths via a diffusion model; For task-level augmentation, it utilizes a diffusion model and semantic weights to select the top-k semantically relevant nodes for each node in the core meta-path graph, reconstructing the meta-paths graph structure. Then, a two-stage attention aggregation graph encoder is adopted to output the final node embeddings. Finally, a self-supervised and supervised (i.e., Dual-learning) collaborative optimization strategy that flexibly adapts to label distribution is used to optimize the objective-this not only balances the discriminability and generality of representations but also adapts to scenarios with different degrees of label scarcity. Experimental results on three public datasets illustrate that our proposed method achieves remarkable advantages in both node classification and node clustering tasks. Our datasets and source code are available.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115385"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115407
Adnan Saeed , Khurram Shehzad , Muhammad Ghulam Abbas Malik , Saim Ahmed , Ahmad Taher Azar
Accurate early-stage diagnosis of skin lesions remains challenging for dermatologists due to visual complexity and subtle inter-class differences. Traditional computer-assisted diagnostic tools struggle to capture detailed patterns and contextual relationships, especially under varying imaging conditions. In this study, we introduce TransXV2S-Net, a new hybrid deep-learning model based on multiple branches designed for automated skin lesion classification. These branches enable to extract features at different stages from skin lesions separately and learn complex combinations between them. These branches include an EfficientNetV2S, Swin Transformer, and a modified Xception architecture, a new feature extraction method, as well as a Dual-Contextual Graph Attention Network (DCGAN) that is proposed to make the network focus on discriminative parts of skin lesions. A novel Dual-Contextual Graph Attention Network (DCGAN) enhances discriminative feature learning through dual-path attention mechanisms and graph-based operations that effectively capture both local textural details and global contextual patterns. The Gray World Standard Deviation (GWSD) preprocessing algorithm improves lesion visibility and removes imaging artifacts Benchmarking against an 8-class skin cancer dataset confirmed the model's efficacy, yielding 95.26% accuracy, 94.30% recall, and an AUC-ROC of 99.62%. Further validation on the HAM10000 dataset demonstrates exceptional performance with 95% accuracy, confirming the model's robustness and generalization capability.
{"title":"TransXV2S-NET: A novel hybrid deep learning architecture with dual-contextual graph attention for multi-class skin lesion classification","authors":"Adnan Saeed , Khurram Shehzad , Muhammad Ghulam Abbas Malik , Saim Ahmed , Ahmad Taher Azar","doi":"10.1016/j.knosys.2026.115407","DOIUrl":"10.1016/j.knosys.2026.115407","url":null,"abstract":"<div><div>Accurate early-stage diagnosis of skin lesions remains challenging for dermatologists due to visual complexity and subtle inter-class differences. Traditional computer-assisted diagnostic tools struggle to capture detailed patterns and contextual relationships, especially under varying imaging conditions. In this study, we introduce TransXV2S-Net, a new hybrid deep-learning model based on multiple branches designed for automated skin lesion classification. These branches enable to extract features at different stages from skin lesions separately and learn complex combinations between them. These branches include an EfficientNetV2S, Swin Transformer, and a modified Xception architecture, a new feature extraction method, as well as a Dual-Contextual Graph Attention Network (DCGAN) that is proposed to make the network focus on discriminative parts of skin lesions. A novel Dual-Contextual Graph Attention Network (DCGAN) enhances discriminative feature learning through dual-path attention mechanisms and graph-based operations that effectively capture both local textural details and global contextual patterns. The Gray World Standard Deviation (GWSD) preprocessing algorithm improves lesion visibility and removes imaging artifacts Benchmarking against an 8-class skin cancer dataset confirmed the model's efficacy, yielding 95.26% accuracy, 94.30% recall, and an AUC-ROC of 99.62%. Further validation on the HAM10000 dataset demonstrates exceptional performance with 95% accuracy, confirming the model's robustness and generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115407"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.knosys.2026.115374
Shuhan Xu , Mengya Han , Wei Yu , Zheng He , Xin Zhou , Yong Luo
Image captioning is a fundamental task in visual understanding, aiming to generate textual descriptions for given images. Current image captioning methods are gradually shifting towards a fully end-to-end paradigm, which leverages pre-trained vision models to process images directly and generate captions, eliminating the need for separating object detectors. These methods typically rely on global features, neglecting the precise perception of local ones. The lack of fine-grained focus on the object may result in suboptimal prototype features contaminated by surrounding noise, and thus negatively affect the generation of object-related captions. To address this issue, we propose a novel method termed object-aware context integration (OACI), which captures the salient prototypes of individual objects and understands their relationships by leveraging the global context of the entire scene. Specifically, we propose an object-aware prototype learning (OAPL) module that focuses on regions containing objects to enhance object perception and selects the most confident regions for learning object prototypes. Moreover, a class affinity constraint (CAC) is designed to facilitate the learning of these prototypes. To understand the relationships between objects, we further propose an object-context integration (OCI) module that integrates global context with local object prototypes, enhancing the understanding of image content and improving the generated image captions. We conduct extensive experiments on the popular MSCOCO, Flickr8k and Flickr30k datasets, and the results demonstrate that integrating global context with local object details significantly improves the quality of generated captions, validating the effectiveness of the proposed OACI method.
{"title":"OACI: Object-aware contextual integration for image captioning","authors":"Shuhan Xu , Mengya Han , Wei Yu , Zheng He , Xin Zhou , Yong Luo","doi":"10.1016/j.knosys.2026.115374","DOIUrl":"10.1016/j.knosys.2026.115374","url":null,"abstract":"<div><div>Image captioning is a fundamental task in visual understanding, aiming to generate textual descriptions for given images. Current image captioning methods are gradually shifting towards a fully end-to-end paradigm, which leverages pre-trained vision models to process images directly and generate captions, eliminating the need for separating object detectors. These methods typically rely on global features, neglecting the precise perception of local ones. The lack of fine-grained focus on the object may result in suboptimal prototype features contaminated by surrounding noise, and thus negatively affect the generation of object-related captions. To address this issue, we propose a novel method termed object-aware context integration (OACI), which captures the salient prototypes of individual objects and understands their relationships by leveraging the global context of the entire scene. Specifically, we propose an object-aware prototype learning (OAPL) module that focuses on regions containing objects to enhance object perception and selects the most confident regions for learning object prototypes. Moreover, a class affinity constraint (CAC) is designed to facilitate the learning of these prototypes. To understand the relationships between objects, we further propose an object-context integration (OCI) module that integrates global context with local object prototypes, enhancing the understanding of image content and improving the generated image captions. We conduct extensive experiments on the popular MSCOCO, Flickr8k and Flickr30k datasets, and the results demonstrate that integrating global context with local object details significantly improves the quality of generated captions, validating the effectiveness of the proposed OACI method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115374"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.knosys.2026.115341
Yuhang Duan, Lin Lin, Jinyuan Liu, Qing Zhang, Xin Fan
Long-term time series forecasting (LTSF) is crucial in domains such as smart energy systems and industrial Internet of Things. Existing methods face intertwined challenges in LTSF. Single-domain modeling often fails to capture local fluctuations and global trends, resulting in incomplete temporal representations. While attention-based models effectively capture long-range dependencies, their quadratic computational complexity limits their efficiency and scalability. Moreover, cross-scale conflicts frequently occur in long-term forecasting. Short-term patterns may interfere with long-term trends, thereby degrading prediction accuracy. To address these issues, we propose cross-domain time-frequency Mamba (CDTF-Mamba), which synergistically models time series in both the time and frequency domains. CDTF-Mamba’s time-domain pyramid Mamba component disentangles multiscale patterns, while the frequency-domain decomposition Mamba component stabilizes state evolution while mitigating nonstationarity. We perform extensive experiments on 13 widely used benchmark datasets. Experimental results demonstrate that CDTF-Mamba achieves superior accuracy while maintaining high efficiency and strong scalability compared with state-of-the-art methods.
{"title":"Cross-domain time-frequency Mamba: A more effective model for long-term time series forecasting","authors":"Yuhang Duan, Lin Lin, Jinyuan Liu, Qing Zhang, Xin Fan","doi":"10.1016/j.knosys.2026.115341","DOIUrl":"10.1016/j.knosys.2026.115341","url":null,"abstract":"<div><div>Long-term time series forecasting (LTSF) is crucial in domains such as smart energy systems and industrial Internet of Things. Existing methods face intertwined challenges in LTSF. Single-domain modeling often fails to capture local fluctuations and global trends, resulting in incomplete temporal representations. While attention-based models effectively capture long-range dependencies, their quadratic computational complexity limits their efficiency and scalability. Moreover, cross-scale conflicts frequently occur in long-term forecasting. Short-term patterns may interfere with long-term trends, thereby degrading prediction accuracy. To address these issues, we propose cross-domain time-frequency Mamba (CDTF-Mamba), which synergistically models time series in both the time and frequency domains. CDTF-Mamba’s time-domain pyramid Mamba component disentangles multiscale patterns, while the frequency-domain decomposition Mamba component stabilizes state evolution while mitigating nonstationarity. We perform extensive experiments on 13 widely used benchmark datasets. Experimental results demonstrate that CDTF-Mamba achieves superior accuracy while maintaining high efficiency and strong scalability compared with state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115341"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.knosys.2026.115369
Yongjun Wang, Xiaohui Hao
While transformer-based methods have advanced visual object tracking, existing approaches often struggle with complex scenarios due to their reliance on fixed perception fields, limited discriminative capabilities, and insufficient predictive modeling. Current solutions utilizing attention mechanisms and feature learning techniques have made progress but face inherent limitations in adapting to dynamic scenes and maintaining robust target discrimination. We propose AdaptTrack, an innovative Transformer-based tracking framework that systematically addresses three critical limitations in existing approaches: suboptimal perception field adaptation for capturing target-specific information, insufficient target-background discrimination in cluttered environments, and inadequate predictive modeling during challenging scenarios. The framework introduces three key technical components: (1) an Adaptive Perception Field Guidance Network that dynamically optimizes feature extraction through scene-aware field configuration, (2) a Contrastive-Guided Contextual Attention mechanism that enhances discrimination through structured contrast learning, and (3) a Predictive State Transition Network that improves robustness via probabilistic state modeling. Through these innovations, our approach effectively addresses the limitations of current methods through dynamic field adaptation, explicit contrast modeling, and robust state prediction. Extensive evaluations demonstrate state-of-the-art performance on seven benchmarks (77.3% AO on GOT-10k, 73.3% AUC on LaSOT, 85.4% AUC on TrackingNet) while maintaining real-time efficiency at 32.6 FPS.
{"title":"AdaptTrack: Perception field adaptation with contrastive attention for robust visual tracking","authors":"Yongjun Wang, Xiaohui Hao","doi":"10.1016/j.knosys.2026.115369","DOIUrl":"10.1016/j.knosys.2026.115369","url":null,"abstract":"<div><div>While transformer-based methods have advanced visual object tracking, existing approaches often struggle with complex scenarios due to their reliance on fixed perception fields, limited discriminative capabilities, and insufficient predictive modeling. Current solutions utilizing attention mechanisms and feature learning techniques have made progress but face inherent limitations in adapting to dynamic scenes and maintaining robust target discrimination. We propose AdaptTrack, an innovative Transformer-based tracking framework that systematically addresses three critical limitations in existing approaches: suboptimal perception field adaptation for capturing target-specific information, insufficient target-background discrimination in cluttered environments, and inadequate predictive modeling during challenging scenarios. The framework introduces three key technical components: (1) an Adaptive Perception Field Guidance Network that dynamically optimizes feature extraction through scene-aware field configuration, (2) a Contrastive-Guided Contextual Attention mechanism that enhances discrimination through structured contrast learning, and (3) a Predictive State Transition Network that improves robustness via probabilistic state modeling. Through these innovations, our approach effectively addresses the limitations of current methods through dynamic field adaptation, explicit contrast modeling, and robust state prediction. Extensive evaluations demonstrate state-of-the-art performance on seven benchmarks (77.3% AO on GOT-10k, 73.3% AUC on LaSOT, 85.4% AUC on TrackingNet) while maintaining real-time efficiency at 32.6 FPS.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115369"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.knosys.2026.115389
Zhuyun Chen , Hongqi Lin , Youpeng Gao , Jingke He , Zehao Li , Weihua Li , Qiang Liu
Currently, deep learning-based intelligent fault diagnosis techniques have been widely used in the manufacturing industry. However, due to various constraints, fault data for rotating machinery is often limited. Moreover, in real industrial environments, operating conditions of rotating machinery vary based on task requirements, leading to significant data variability across different operating conditions. This variability presents a major challenge for few-shot fault diagnosis, especially in scenarios requiring domain generalization across diverse operating conditions. To address this challenge, this paper proposes multiscale scattering forests (MSF): a domain-generalizing approach for fault diagnosis under data constraints. Firstly, a multiscale wavelet scattering predefined layer is designed to extract robust invariant features from input samples, where these scattering coefficients are concatenated and then used as new samples resulting from the data enhancement of the original samples. Then, a deep stacked ensemble forests with skip connection is designed to handle the transformed multiscale samples, allowing earlier information to jump over layers and improving the model’s feature representation capabilities. Finally, a similarity metric-based weighting learning strategy is developed to implement diagnostic results of each forest, integrating the models of assigning weights into an ensemble framework to enhance domain generalization performance under various operation conditions. The MSF model is comprehensively evaluated using a computer numerical control (CNC) machine tool spindle bearing dataset in an industrial environment. Experimental results demonstrate that the proposed approach not only exhibits strong diagnostic and generalization performance in few-shot scenarios without the support of additional source domains but also outperforms other state-of-the-art few-shot fault diagnosis methods.
{"title":"Multiscale scattering forests: A domain-generalizing approach for fault diagnosis under data constraints","authors":"Zhuyun Chen , Hongqi Lin , Youpeng Gao , Jingke He , Zehao Li , Weihua Li , Qiang Liu","doi":"10.1016/j.knosys.2026.115389","DOIUrl":"10.1016/j.knosys.2026.115389","url":null,"abstract":"<div><div>Currently, deep learning-based intelligent fault diagnosis techniques have been widely used in the manufacturing industry. However, due to various constraints, fault data for rotating machinery is often limited. Moreover, in real industrial environments, operating conditions of rotating machinery vary based on task requirements, leading to significant data variability across different operating conditions. This variability presents a major challenge for few-shot fault diagnosis, especially in scenarios requiring domain generalization across diverse operating conditions. To address this challenge, this paper proposes multiscale scattering forests (MSF): a domain-generalizing approach for fault diagnosis under data constraints. Firstly, a multiscale wavelet scattering predefined layer is designed to extract robust invariant features from input samples, where these scattering coefficients are concatenated and then used as new samples resulting from the data enhancement of the original samples. Then, a deep stacked ensemble forests with skip connection is designed to handle the transformed multiscale samples, allowing earlier information to jump over layers and improving the model’s feature representation capabilities. Finally, a similarity metric-based weighting learning strategy is developed to implement diagnostic results of each forest, integrating the models of assigning weights into an ensemble framework to enhance domain generalization performance under various operation conditions. The MSF model is comprehensively evaluated using a computer numerical control (CNC) machine tool spindle bearing dataset in an industrial environment. Experimental results demonstrate that the proposed approach not only exhibits strong diagnostic and generalization performance in few-shot scenarios without the support of additional source domains but also outperforms other state-of-the-art few-shot fault diagnosis methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115389"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.knosys.2026.115366
Jianchao Li, Wei Zhou, Kai Wang, Haifeng Hu
Transformer-based image captioning models have achieved promising performance through various effective learning schemes. We contend that a truly comprehensive learning schema, defined as omniscient learning, encompasses two components: 1) a hierarchical knowledge base with low redundancy as input, and 2) a bottom-up layer-wise network as architecture. While previous captioning models primarily focus on network design and neglect the construction of knowledge base. In this paper, our hierarchical knowledge base is constituted by personalized knowledge of real-time features and contextual knowledge of consensus. Simultaneously, we devise a bottom-up double-stream symmetric network (BuNet) to progressively learn layered features. Specifically, the hierarchical knowledge base includes single-image region and grid features from the local-domain and contextual knowledge tokens from the broad-domain. Correspondingly, BuNet is divided into local-domain self-learning (LDS) stage and broad-domain consensus-learning (BDC) stage. Besides, we explore noise decoupling strategies to illustrate the extraction of contextual knowledge tokens. Furthermore, the knowledge disparity between region and grid reveals that the purely “symmetric network” of BuNet cannot effectively capture additional spatial relationships present in the region stream. Consequently, we design relative spatial encoding in LDS stage of BuNet to learn regional spatial knowledge. In addition, we employ a lightweight backbone to reduce computational complexity while providing a simple paradigm for omniscient learning. Our method is extensively tested on MS-COCO and Flickr30K, where it achieves better performance than some captioning models.
{"title":"Omniscient bottom-up double-stream symmetric network for image captioning","authors":"Jianchao Li, Wei Zhou, Kai Wang, Haifeng Hu","doi":"10.1016/j.knosys.2026.115366","DOIUrl":"10.1016/j.knosys.2026.115366","url":null,"abstract":"<div><div>Transformer-based image captioning models have achieved promising performance through various effective learning schemes. We contend that a truly comprehensive learning schema, defined as omniscient learning, encompasses two components: 1) a hierarchical knowledge base with low redundancy as input, and 2) a bottom-up layer-wise network as architecture. While previous captioning models primarily focus on network design and neglect the construction of knowledge base. In this paper, our hierarchical knowledge base is constituted by personalized knowledge of real-time features and contextual knowledge of consensus. Simultaneously, we devise a bottom-up double-stream symmetric network (BuNet) to progressively learn layered features. Specifically, the hierarchical knowledge base includes single-image region and grid features from the local-domain and contextual knowledge tokens from the broad-domain. Correspondingly, BuNet is divided into local-domain self-learning (LDS) stage and broad-domain consensus-learning (BDC) stage. Besides, we explore noise decoupling strategies to illustrate the extraction of contextual knowledge tokens. Furthermore, the knowledge disparity between region and grid reveals that the purely “symmetric network” of BuNet cannot effectively capture additional spatial relationships present in the region stream. Consequently, we design relative spatial encoding in LDS stage of BuNet to learn regional spatial knowledge. In addition, we employ a lightweight backbone to reduce computational complexity while providing a simple paradigm for omniscient learning. Our method is extensively tested on MS-COCO and Flickr30K, where it achieves better performance than some captioning models.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115366"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}