Pub Date : 2025-12-16DOI: 10.1016/j.knosys.2025.115105
Lázaro Bustio-Martínez , Vitali Herrera-Semenets , Jorge A. González-Ordiano , Yamel Pérez-Guadarrama , Luis N. Zúñiga-Morales , Daniela Montoya-Godínez , Miguel A. Álvarez-Carmona , Jan van den Berg
Phishing remains one of the most persistent cybersecurity threats, increasingly exploiting not only technical vulnerabilities but also human cognitive biases. Existing detection systems often rely on single-modality features and black-box models, which restrict both generalization and interpretability. This study presents an explainable multimodal framework that combines textual and technical cues, including message content, URL structure, and Principles of Persuasion, to capture both objective and subjective aspects of phishing. Several classifiers were evaluated using 10-fold stratified cross-validation, with Random Forest achieving the best balance between performance and transparency (ROC-AUC = 0.9840), supported by SHAP explanations that identify the most influential linguistic and structural features. Comparative analysis shows that the proposed framework outperforms unimodal baselines while preserving interpretability, enabling a clear rationale for classification outcomes. These results indicate that integrating multimodal representation with explainable learning strengthens phishing detection accuracy, improves user trust, and supports reliable deployment in real-world environments.
{"title":"Enhanced phishing detection using multimodal data","authors":"Lázaro Bustio-Martínez , Vitali Herrera-Semenets , Jorge A. González-Ordiano , Yamel Pérez-Guadarrama , Luis N. Zúñiga-Morales , Daniela Montoya-Godínez , Miguel A. Álvarez-Carmona , Jan van den Berg","doi":"10.1016/j.knosys.2025.115105","DOIUrl":"10.1016/j.knosys.2025.115105","url":null,"abstract":"<div><div>Phishing remains one of the most persistent cybersecurity threats, increasingly exploiting not only technical vulnerabilities but also human cognitive biases. Existing detection systems often rely on single-modality features and black-box models, which restrict both generalization and interpretability. This study presents an explainable multimodal framework that combines textual and technical cues, including message content, URL structure, and Principles of Persuasion, to capture both objective and subjective aspects of phishing. Several classifiers were evaluated using 10-fold stratified cross-validation, with Random Forest achieving the best balance between performance and transparency (ROC-AUC = 0.9840), supported by SHAP explanations that identify the most influential linguistic and structural features. Comparative analysis shows that the proposed framework outperforms unimodal baselines while preserving interpretability, enabling a clear rationale for classification outcomes. These results indicate that integrating multimodal representation with explainable learning strengthens phishing detection accuracy, improves user trust, and supports reliable deployment in real-world environments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115105"},"PeriodicalIF":7.6,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1016/j.knosys.2025.115049
Francesco De Santis, Danilo Giordano, Marco Mellia
In an era of effortless data collection, the impact of machine learning — especially neural networks (NNs) — is undeniable. As datasets grow in size and complexity, efficiently handling mixed data types, including categorical and numerical features, becomes critical. Feature encoding and selection play a key role in improving NN performance, efficiency, interpretability, and generalisation. This paper presents GLEm-Net (Grouped Lasso with Embeddings Network), a novel NN-based approach that seamlessly integrates feature encoding and selection directly into the training process. GLEm-Net uses embedding layers to process categorical features with high cardinality, simplifying the model and improving generalisation. By extending the grouped Lasso regularisation to explicitly consider categorical features, GLEm-Net automatically identifies the most relevant features during training and returns them to the analyst.
We evaluate GLEm-Net on open and proprietary industry datasets and compare it to state-of-the-art feature selection methodologies. Results show that GLEm-Net adapts to each dataset by allowing the NN to directly select subsets of most important features, offering on par performance with the best state-of-the-art feature selection methods, while eliminating the need for the external feature encoding and selection steps that are now incorporated in the NN training stage.
{"title":"GLEm-Net: Unified framework for data reduction with categorical and numerical features","authors":"Francesco De Santis, Danilo Giordano, Marco Mellia","doi":"10.1016/j.knosys.2025.115049","DOIUrl":"10.1016/j.knosys.2025.115049","url":null,"abstract":"<div><div>In an era of effortless data collection, the impact of machine learning — especially neural networks (NNs) — is undeniable. As datasets grow in size and complexity, efficiently handling mixed data types, including categorical and numerical features, becomes critical. Feature encoding and selection play a key role in improving NN performance, efficiency, interpretability, and generalisation. This paper presents GLEm-Net (Grouped Lasso with Embeddings Network), a novel NN-based approach that seamlessly integrates feature encoding and selection directly into the training process. GLEm-Net uses embedding layers to process categorical features with high cardinality, simplifying the model and improving generalisation. By extending the grouped Lasso regularisation to explicitly consider categorical features, GLEm-Net automatically identifies the most relevant features during training and returns them to the analyst.</div><div>We evaluate GLEm-Net on open and proprietary industry datasets and compare it to state-of-the-art feature selection methodologies. Results show that GLEm-Net adapts to each dataset by allowing the NN to directly select subsets of most important features, offering on par performance with the best state-of-the-art feature selection methods, while eliminating the need for the external feature encoding and selection steps that are now incorporated in the NN training stage.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115049"},"PeriodicalIF":7.6,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contrastive graph clustering is an advanced technology in the field of cluster analysis. By leveraging graph neural networks and contrastive learning paradigm, it enables the coupling of topological structure and node semantic information for attributed graph networks. Graph augmentation and positive sample selection are two essentials of contrastive graph clustering. However, existing graph augmentation methods tend to disrupt the cluster structures, and most positive sample selectors suffer from the false negative sample problem. In this paper, we propose a Topological and Semantic Contrastive Graph Clustering (TSCGC) model consisting of three learning components. The representation learning component augments original graph using Ricci curvature to preserve the cluster structure, and introduces hypergraph view to capture high-order relationships. Graph and hypergraph convolutional networks are used to encode the triple-view embeddings. Meanwhile, we develop a dual contrastive learning component to extract the topological and semantic information. To reduce the number of false negatives, it utilizes K-means to generate pseudo cluster labels to guide the selection of positive samples. The self-supervised learning component is leveraged to align the three graph views. The final clustering results are obtained by performing K-means on the aligned embeddings. We demonstrated the effectiveness by comparing the performance of TSCGC with 13 clustering baselines on six real-world networks. Ablations verified the validity of key components and the impact of parameter settings were also analyzed. We further applied TSCGC to identify the function types of 10,370 buildings in ShenZhen City, China based on multi-source geospatial data. It achieved the highest accuracy and exhibit significant potential in handling complex network structures and high-dimensional node features. The code is available at: https://github.com/ZPGuiGroupWhu/TSCGC.
{"title":"Topological and semantic contrastive graph clustering by Ricci curvature augmentation and hypergraph fusion","authors":"Dehua Peng , Guangyao Fang , Zhipeng Gui , Yuhang Liu , Huayi Wu","doi":"10.1016/j.knosys.2025.115130","DOIUrl":"10.1016/j.knosys.2025.115130","url":null,"abstract":"<div><div>Contrastive graph clustering is an advanced technology in the field of cluster analysis. By leveraging graph neural networks and contrastive learning paradigm, it enables the coupling of topological structure and node semantic information for attributed graph networks. Graph augmentation and positive sample selection are two essentials of contrastive graph clustering. However, existing graph augmentation methods tend to disrupt the cluster structures, and most positive sample selectors suffer from the false negative sample problem. In this paper, we propose a Topological and Semantic Contrastive Graph Clustering (TSCGC) model consisting of three learning components. The representation learning component augments original graph using Ricci curvature to preserve the cluster structure, and introduces hypergraph view to capture high-order relationships. Graph and hypergraph convolutional networks are used to encode the triple-view embeddings. Meanwhile, we develop a dual contrastive learning component to extract the topological and semantic information. To reduce the number of false negatives, it utilizes K-means to generate pseudo cluster labels to guide the selection of positive samples. The self-supervised learning component is leveraged to align the three graph views. The final clustering results are obtained by performing K-means on the aligned embeddings. We demonstrated the effectiveness by comparing the performance of TSCGC with 13 clustering baselines on six real-world networks. Ablations verified the validity of key components and the impact of parameter settings were also analyzed. We further applied TSCGC to identify the function types of 10,370 buildings in ShenZhen City, China based on multi-source geospatial data. It achieved the highest accuracy and exhibit significant potential in handling complex network structures and high-dimensional node features. The code is available at: <span><span>https://github.com/ZPGuiGroupWhu/TSCGC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115130"},"PeriodicalIF":7.6,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-14DOI: 10.1016/j.knosys.2025.115123
Zhen Yin , Shenghua Wang
The rapid adoption of large language models (LLMs) in scientific writing raises serious concerns regarding authorship integrity and the reliability of scholarly publications. Existing detection approaches mainly rely on document-level classification or surface-level statistical cues; however, they neglect fine-grained span localization, exhibit weak calibration, and often fail to generalize across disciplines and generators. To address these limitations, we present Sci-SpanDet, a structure-aware framework for detecting AI-generated scholarly texts. The proposed method combines section-conditioned stylistic modeling with multi-level contrastive learning to capture nuanced human-AI differences while mitigating topic dependence, thereby enhancing cross-domain robustness. In addition, it integrates BIO-CRF sequence labeling with pointer-based boundary decoding and confidence calibration to enable precise span-level detection and reliable probability estimates. Extensive experiments on a newly constructed cross-disciplinary dataset of 100,000 annotated samples generated by multiple LLM families (GPT, Qwen, DeepSeek, LLaMA) demonstrate that Sci-SpanDet achieves state-of-the-art performance, with F1(AI) of 80.17, AUROC of 92.63, and Span-F1 of 74.36. Furthermore, it shows strong resilience under adversarial rewriting and maintains balanced accuracy across IMRaD sections and diverse disciplines, substantially surpassing existing baselines. To support reproducibility and encourage future research on AI-generated text detection in scholarly documents, we plan to release the curated dataset and source code at a later stage.
{"title":"Span-level detection of AI-generated scientific text via contrastive learning and structural calibration","authors":"Zhen Yin , Shenghua Wang","doi":"10.1016/j.knosys.2025.115123","DOIUrl":"10.1016/j.knosys.2025.115123","url":null,"abstract":"<div><div>The rapid adoption of large language models (LLMs) in scientific writing raises serious concerns regarding authorship integrity and the reliability of scholarly publications. Existing detection approaches mainly rely on document-level classification or surface-level statistical cues; however, they neglect fine-grained span localization, exhibit weak calibration, and often fail to generalize across disciplines and generators. To address these limitations, we present Sci-SpanDet, a structure-aware framework for detecting AI-generated scholarly texts. The proposed method combines section-conditioned stylistic modeling with multi-level contrastive learning to capture nuanced human-AI differences while mitigating topic dependence, thereby enhancing cross-domain robustness. In addition, it integrates BIO-CRF sequence labeling with pointer-based boundary decoding and confidence calibration to enable precise span-level detection and reliable probability estimates. Extensive experiments on a newly constructed cross-disciplinary dataset of 100,000 annotated samples generated by multiple LLM families (GPT, Qwen, DeepSeek, LLaMA) demonstrate that Sci-SpanDet achieves state-of-the-art performance, with F1(AI) of 80.17, AUROC of 92.63, and Span-F1 of 74.36. Furthermore, it shows strong resilience under adversarial rewriting and maintains balanced accuracy across IMRaD sections and diverse disciplines, substantially surpassing existing baselines. To support reproducibility and encourage future research on AI-generated text detection in scholarly documents, we plan to release the curated dataset and source code at a later stage.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115123"},"PeriodicalIF":7.6,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.knosys.2025.115136
Zihao Zhang , Shuwen Yang , Xingjiao Wu , Jiabao Zhao , Qin Chen , Jing Yang , Liang He
Visual question answering (VQA) in knowledge-intensive scenarios requires integrating of external knowledge to bridge the semantic gap between shallow linguistic queries and complex reasoning requirements. However, existing methods typically rely on single-hop retrieval strategies, which are prone to overlooking intermediate facts essential for accurate reasoning. To address this limitation, we propose adaptive chain retrieval architecture (ACRA), a novel multi-hop retrieval framework based on large-model-generated evidence chain annotations. ACRA constructs structured reasoning paths by progressively selecting key evidence nodes using an adaptive matching mechanism based on an encoder-only transformer. To improve evidence discrimination, we design a hybrid loss optimization strategy that incorporates dynamically mined hard negatives, combining binary cross-entropy and margin-based ranking loss. Furthermore, we introduce a depth-aware adaptive beam search algorithm that models evidence retrieval as a sequential process, gradually increasing the matching threshold with search depth to suppress irrelevant content while maintaining logical coherence. We evaluate ACRA on the WebQA and MultimodalQA. ACRA achieves 55.4 % QA accuracy and 90.2 % F1 score on WebQA, and 78.8 % EM and 82.4 % F1 on MultimodalQA. Experimental results show that ACRA consistently outperforms state-of-the-art baselines in terms of retrieval accuracy and reasoning consistency, demonstrating its effectiveness in mitigating cognitive biases and improving multi-hop reasoning in VQA tasks.
{"title":"ACRA: An adaptive chain retrieval architecture for multi-modal knowledge-Augmented visual question answering","authors":"Zihao Zhang , Shuwen Yang , Xingjiao Wu , Jiabao Zhao , Qin Chen , Jing Yang , Liang He","doi":"10.1016/j.knosys.2025.115136","DOIUrl":"10.1016/j.knosys.2025.115136","url":null,"abstract":"<div><div>Visual question answering (VQA) in knowledge-intensive scenarios requires integrating of external knowledge to bridge the semantic gap between shallow linguistic queries and complex reasoning requirements. However, existing methods typically rely on single-hop retrieval strategies, which are prone to overlooking intermediate facts essential for accurate reasoning. To address this limitation, we propose adaptive chain retrieval architecture (ACRA), a novel multi-hop retrieval framework based on large-model-generated evidence chain annotations. ACRA constructs structured reasoning paths by progressively selecting key evidence nodes using an adaptive matching mechanism based on an encoder-only transformer. To improve evidence discrimination, we design a hybrid loss optimization strategy that incorporates dynamically mined hard negatives, combining binary cross-entropy and margin-based ranking loss. Furthermore, we introduce a depth-aware adaptive beam search algorithm that models evidence retrieval as a sequential process, gradually increasing the matching threshold with search depth to suppress irrelevant content while maintaining logical coherence. We evaluate ACRA on the WebQA and MultimodalQA. ACRA achieves 55.4 % QA accuracy and 90.2 % F1 score on WebQA, and 78.8 % EM and 82.4 % F1 on MultimodalQA. Experimental results show that ACRA consistently outperforms state-of-the-art baselines in terms of retrieval accuracy and reasoning consistency, demonstrating its effectiveness in mitigating cognitive biases and improving multi-hop reasoning in VQA tasks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115136"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.knosys.2025.115095
Kunpeng Du , Xuan Zhang , Chen Gao , Rui Zhu , Shaobo Liu , Tong Li , Zhi Jin
Large Language Models (LLMs) have demonstrated remarkable potential in mathematical reasoning tasks. However, forward chain-of-thought (CoT) reasoning tends to overly rely on the surface description of questions, making it vulnerable to slight modifications of specific numbers or terms, which can significantly impair question-solving performance. Current bidirectional reasoning approaches attempt to alleviate the limitations of unidirectional reasoning by introducing backward reasoning to verify the forward answer. However, LLMs often underperform in backward reasoning, potentially introducing cascading errors during the verification process and thus constraining overall reasoning performances. To address this challenge, we propose the Iterative Verification Correction Method Guided by Dual-Backward (IVC-DB) framework. IVC-DB generates diverse styles of backward question pairs through contextual consistency and a templated method, establishing a dual backward verification mechanism consisting of conclusion verification and premise verification. Furthermore, the framework incorporates iterative modules for reasoning, verification, and correction, which dynamically refine candidate solutions and reduce errors in the verification stage. This design mitigates potential errors in backward reasoning and enhances the mathematical question-solving capabilities of LLMs. Experimental results show that IVC-DB significantly outperforms state-of-the-art methods across seven mathematical reasoning datasets and two non-mathematical tasks, achieving average accuracies of 89.8 % with GPT-3.5-Turbo and 94.5 % with GPT-4. Ablation studies further reveal the complementary nature of different backward question styles and the crucial role of dual verification in reducing cascading errors.
{"title":"IVC-DB: Iterative verification correction method guided by dual-Backward mathematical reasoning in large language models","authors":"Kunpeng Du , Xuan Zhang , Chen Gao , Rui Zhu , Shaobo Liu , Tong Li , Zhi Jin","doi":"10.1016/j.knosys.2025.115095","DOIUrl":"10.1016/j.knosys.2025.115095","url":null,"abstract":"<div><div>Large Language Models (LLMs) have demonstrated remarkable potential in mathematical reasoning tasks. However, forward chain-of-thought (CoT) reasoning tends to overly rely on the surface description of questions, making it vulnerable to slight modifications of specific numbers or terms, which can significantly impair question-solving performance. Current bidirectional reasoning approaches attempt to alleviate the limitations of unidirectional reasoning by introducing backward reasoning to verify the forward answer. However, LLMs often underperform in backward reasoning, potentially introducing cascading errors during the verification process and thus constraining overall reasoning performances. To address this challenge, we propose the Iterative Verification Correction Method Guided by Dual-Backward (IVC-DB) framework. IVC-DB generates diverse styles of backward question pairs through contextual consistency and a templated method, establishing a dual backward verification mechanism consisting of conclusion verification and premise verification. Furthermore, the framework incorporates iterative modules for reasoning, verification, and correction, which dynamically refine candidate solutions and reduce errors in the verification stage. This design mitigates potential errors in backward reasoning and enhances the mathematical question-solving capabilities of LLMs. Experimental results show that IVC-DB significantly outperforms state-of-the-art methods across seven mathematical reasoning datasets and two non-mathematical tasks, achieving average accuracies of 89.8 % with GPT-3.5-Turbo and 94.5 % with GPT-4. Ablation studies further reveal the complementary nature of different backward question styles and the crucial role of dual verification in reducing cascading errors.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115095"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.knosys.2025.115135
Jinjin Ma , Zhuo Zhao , Zhiwen Xie , Yi Zhang , Guangyou Zhou
Artificial intelligence offers unprecedented opportunities for personalized learning experiences, enhanced accessibility, and improved educational outcomes for all. One particularly promising application is the development of course recommendation systems, which aim to help learners navigate the vast array of available resources to identify courses that best align with their individual needs and objectives. Although previous course recommendation models have yielded encouraging results, they still encounter practical limitations due to the sparsity of learner-course interaction data and the isolation of course offerings. The remarkable success of large language models (LLMs) across various fields has inspired us to explore their integration into course recommendation systems to further optimize personalized recommendation capabilities. This paper presents a dual-channel architecture for course recommendation: the LLM-Enhanced Decoupled Learner and Course Modeling (LE-DLCM) framework. It has two synergistic components: (1) knowledge-enhanced LLM-based course modeling that integrates the potential relations between different courses to obtain enhanced embedding vectors, and (2) interaction-enhanced LLM-based learner modeling that leverages more historical interactions from similar learners to simulate cold-start learner interactions. Extensive experiments conducted on two real-world datasets demonstrate the superior performance of LE-DLCM, achieving improvements of 12.1 % in NDCG@10 on MOOCCube and 11.6 % in NDCG@5 on MOOCCourse compared to state-of-the-art baselines. These empirical findings not only validate the efficacy of LED-LCM in overcoming data sparsity and isolated courses but also confirm substantial progress in the field of personalized course recommendation systems.
{"title":"LE-DLCM: Decoupled learner and course modeling with large language models for enhanced course recommendation","authors":"Jinjin Ma , Zhuo Zhao , Zhiwen Xie , Yi Zhang , Guangyou Zhou","doi":"10.1016/j.knosys.2025.115135","DOIUrl":"10.1016/j.knosys.2025.115135","url":null,"abstract":"<div><div>Artificial intelligence offers unprecedented opportunities for personalized learning experiences, enhanced accessibility, and improved educational outcomes for all. One particularly promising application is the development of course recommendation systems, which aim to help learners navigate the vast array of available resources to identify courses that best align with their individual needs and objectives. Although previous course recommendation models have yielded encouraging results, they still encounter practical limitations due to the sparsity of learner-course interaction data and the isolation of course offerings. The remarkable success of large language models (LLMs) across various fields has inspired us to explore their integration into course recommendation systems to further optimize personalized recommendation capabilities. This paper presents a dual-channel architecture for course recommendation: the <strong>L</strong>LM-<strong>E</strong>nhanced <strong>D</strong>ecoupled <strong>L</strong>earner and <strong>C</strong>ourse <strong>M</strong>odeling (LE-DLCM) framework. It has two synergistic components: (1) knowledge-enhanced LLM-based course modeling that integrates the potential relations between different courses to obtain enhanced embedding vectors, and (2) interaction-enhanced LLM-based learner modeling that leverages more historical interactions from similar learners to simulate cold-start learner interactions. Extensive experiments conducted on two real-world datasets demonstrate the superior performance of LE-DLCM, achieving improvements of 12.1 % in NDCG@10 on MOOCCube and 11.6 % in NDCG@5 on MOOCCourse compared to state-of-the-art baselines. These empirical findings not only validate the efficacy of LED-LCM in overcoming data sparsity and isolated courses but also confirm substantial progress in the field of personalized course recommendation systems.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115135"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.knosys.2025.115120
Xinyue Xin , Ming Li , Yan Wu , Peng Zhang , Dazhi Xu
With the advancement of technology, PolSAR systems can acquire multiple signals by transmitting and receiving electromagnetic waves in different frequency bands, thereby enabling the collection of richer ground observation information. However, due to the lack of consideration for the concepts of dual-band consistency and dual-band difference, existing fusion methods still encounter problems of incomplete semantic information and low computational efficiency. Moreover, in practice, the process of sample labeling often involves manual intervention, which inevitably introduces labeling errors. To tackle these problems, we propose a novel label-noise tolerant classification framework called DCDLNet: dual-band consistency and difference learning network. Specifically, to extract the rich information contained in dual-band PolSAR data, the DCDLNet comprises two principal parts. The first part is an inter-band difference acquisition module (IDAM), which learns dual-band complementary information based on the concept of dual-band difference. The second part is a spatial-domain and frequency-domain feature extraction (SFFE) module. It acquires more discriminative information by capturing local spatial information in the spatial-domain and global spatial information in the frequency-domain. Furthermore, by integrating the concept of dual-band consistency and the fitting capabilities of neural networks, DCDLNet adopts a cross-band and bidirectional supervised (CBS) strategy to mitigate the impact of label noise during the training process. Experiments on measured PolSAR datasets demonstrate that our method outperforms several existing approaches in terms of dual-band fusion and noisy label processing.
{"title":"DCDLNet: A label-noise tolerant classification algorithm for polsar images based on dual-band consistency and difference","authors":"Xinyue Xin , Ming Li , Yan Wu , Peng Zhang , Dazhi Xu","doi":"10.1016/j.knosys.2025.115120","DOIUrl":"10.1016/j.knosys.2025.115120","url":null,"abstract":"<div><div>With the advancement of technology, PolSAR systems can acquire multiple signals by transmitting and receiving electromagnetic waves in different frequency bands, thereby enabling the collection of richer ground observation information. However, due to the lack of consideration for the concepts of dual-band consistency and dual-band difference, existing fusion methods still encounter problems of incomplete semantic information and low computational efficiency. Moreover, in practice, the process of sample labeling often involves manual intervention, which inevitably introduces labeling errors. To tackle these problems, we propose a novel label-noise tolerant classification framework called DCDLNet: dual-band consistency and difference learning network. Specifically, to extract the rich information contained in dual-band PolSAR data, the DCDLNet comprises two principal parts. The first part is an inter-band difference acquisition module (IDAM), which learns dual-band complementary information based on the concept of dual-band difference. The second part is a spatial-domain and frequency-domain feature extraction (SFFE) module. It acquires more discriminative information by capturing local spatial information in the spatial-domain and global spatial information in the frequency-domain. Furthermore, by integrating the concept of dual-band consistency and the fitting capabilities of neural networks, DCDLNet adopts a cross-band and bidirectional supervised (CBS) strategy to mitigate the impact of label noise during the training process. Experiments on measured PolSAR datasets demonstrate that our method outperforms several existing approaches in terms of dual-band fusion and noisy label processing.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115120"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.knosys.2025.115119
Rongrong Ni , Yuanhui Guo , Biao Yang , Yi Liu , Hai Wang , Chuan Hu
For tasks such as autonomous driving and remote sensing, integrating multimodal data (RGB, depth, infrared, and others) can significantly enhance the accuracy and robustness of semantic segmentation under complex environmental conditions, thereby providing precise and reliable information for downstream tasks. However, existing approaches emphasize segmentation accuracy at the expense of efficiency. To address this trade-off, we propose a multimodal semantic segmentation network based on the linear complexity Selective State Space Model (S6, a.k.a Mamba), dubbed Mul-VMamba. Mul-VMamba establishes selection-fusion relationships among multimodal features, enabling semantic segmentation with any input modalities. Specifically, the Mamba Spatial-consistency Selective Module (MSSM) adaptively extracts feature mapping relationships and filters out redundant features at identical spatial locations, preserving the spatial relationships between each modality. Additionally, the Mamba Cross-Fusion Module (MCFM) introduces a Cross Selective State Space Model (Cross-S6), establishing the relationship between S6 and multimodal features, achieving optimal fusion performance. Qualitative and quantitative evaluations on the MCubes and DeLiVER datasets demonstrate the efficacy and efficiency of Mul-VMamba. Notably, Mul-VMamba achieves 54.65 % / 68.98 % mIoU on Mcubes / DeLiVER datasets using only 55.33M params. The source code of Mul-VMamba is publicly available at https://github.com/Mask0913/Mul-VMamba.
{"title":"Mul-VMamba: Multimodal semantic segmentation using selection-fusion-based vision-Mamba","authors":"Rongrong Ni , Yuanhui Guo , Biao Yang , Yi Liu , Hai Wang , Chuan Hu","doi":"10.1016/j.knosys.2025.115119","DOIUrl":"10.1016/j.knosys.2025.115119","url":null,"abstract":"<div><div>For tasks such as autonomous driving and remote sensing, integrating multimodal data (RGB, depth, infrared, and others) can significantly enhance the accuracy and robustness of semantic segmentation under complex environmental conditions, thereby providing precise and reliable information for downstream tasks. However, existing approaches emphasize segmentation accuracy at the expense of efficiency. To address this trade-off, we propose a multimodal semantic segmentation network based on the linear complexity Selective State Space Model (S6, a.k.a Mamba), dubbed Mul-VMamba. Mul-VMamba establishes selection-fusion relationships among multimodal features, enabling semantic segmentation with any input modalities. Specifically, the Mamba Spatial-consistency Selective Module (MSSM) adaptively extracts feature mapping relationships and filters out redundant features at identical spatial locations, preserving the spatial relationships between each modality. Additionally, the Mamba Cross-Fusion Module (MCFM) introduces a Cross Selective State Space Model (Cross-S6), establishing the relationship between S6 and multimodal features, achieving optimal fusion performance. Qualitative and quantitative evaluations on the MCubes and DeLiVER datasets demonstrate the efficacy and efficiency of Mul-VMamba. Notably, Mul-VMamba achieves 54.65 % / 68.98 % mIoU on Mcubes / DeLiVER datasets using only 55.33M params. The source code of Mul-VMamba is publicly available at <span><span>https://github.com/Mask0913/Mul-VMamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115119"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.knosys.2025.115110
Chengjie Ying , Xi Fang , Zhiqiang Ru , Yuan Wan , Liang Xie
Federated spatio-temporal forecasting is critical for privacy-sensitive applications such as traffic prediction, yet reconstructing global spatial dependencies under strict privacy constraints remains a major challenge. While Transformer-based architectures are effective for spatio-temporal modeling, their application in federated settings is limited by privacy concerns and the quadratic complexity of attention computation. To address these issues, we propose LiFedST, a Linearized Federated Split-attention Transformer that enables global spatial dependency reconstruction with linear complexity and strong privacy guarantees. LiFedST first applies a lightweight linear temporal module for local sequence encoding. To capture global spatial relationships, we introduce a novel Split-attention mechanism based on Taylor Polynomial Feature Mapping, which makes the softmax operation in self-attention separable and reduces its computational complexity to linear. Based on this formulation, each client can locally compute Split-Attention Aggregation (SAG) and transmit it to the server for aggregation, without exposing raw data or spatial structures. A hierarchical federated training process based on FedAvg is used to update model parameters efficiently. Extensive experiments on real-world traffic and air quality datasets demonstrate that LiFedST consistently outperforms state-of-the-art centralized and federated baselines in MAE, RMSE, and MAPE, while preserving data sovereignty. Our code is available at: https://github.com/yingchengjie1109/LiFedST.
{"title":"LiFedST: A linearized federated split-attention transformer for spatio-temporal forecasting","authors":"Chengjie Ying , Xi Fang , Zhiqiang Ru , Yuan Wan , Liang Xie","doi":"10.1016/j.knosys.2025.115110","DOIUrl":"10.1016/j.knosys.2025.115110","url":null,"abstract":"<div><div>Federated spatio-temporal forecasting is critical for privacy-sensitive applications such as traffic prediction, yet reconstructing global spatial dependencies under strict privacy constraints remains a major challenge. While Transformer-based architectures are effective for spatio-temporal modeling, their application in federated settings is limited by privacy concerns and the quadratic complexity of attention computation. To address these issues, we propose LiFedST, a Linearized Federated Split-attention Transformer that enables global spatial dependency reconstruction with linear complexity and strong privacy guarantees. LiFedST first applies a lightweight linear temporal module for local sequence encoding. To capture global spatial relationships, we introduce a novel Split-attention mechanism based on Taylor Polynomial Feature Mapping, which makes the softmax operation in self-attention separable and reduces its computational complexity to linear. Based on this formulation, each client can locally compute Split-Attention Aggregation (SAG) and transmit it to the server for aggregation, without exposing raw data or spatial structures. A hierarchical federated training process based on FedAvg is used to update model parameters efficiently. Extensive experiments on real-world traffic and air quality datasets demonstrate that LiFedST consistently outperforms state-of-the-art centralized and federated baselines in MAE, RMSE, and MAPE, while preserving data sovereignty. Our code is available at: <span><span>https://github.com/yingchengjie1109/LiFedST</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115110"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}