首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
Enhanced phishing detection using multimodal data 使用多模态数据增强网络钓鱼检测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.knosys.2025.115105
Lázaro Bustio-Martínez , Vitali Herrera-Semenets , Jorge A. González-Ordiano , Yamel Pérez-Guadarrama , Luis N. Zúñiga-Morales , Daniela Montoya-Godínez , Miguel A. Álvarez-Carmona , Jan van den Berg
Phishing remains one of the most persistent cybersecurity threats, increasingly exploiting not only technical vulnerabilities but also human cognitive biases. Existing detection systems often rely on single-modality features and black-box models, which restrict both generalization and interpretability. This study presents an explainable multimodal framework that combines textual and technical cues, including message content, URL structure, and Principles of Persuasion, to capture both objective and subjective aspects of phishing. Several classifiers were evaluated using 10-fold stratified cross-validation, with Random Forest achieving the best balance between performance and transparency (ROC-AUC = 0.9840), supported by SHAP explanations that identify the most influential linguistic and structural features. Comparative analysis shows that the proposed framework outperforms unimodal baselines while preserving interpretability, enabling a clear rationale for classification outcomes. These results indicate that integrating multimodal representation with explainable learning strengthens phishing detection accuracy, improves user trust, and supports reliable deployment in real-world environments.
网络钓鱼仍然是最持久的网络安全威胁之一,它不仅越来越多地利用技术漏洞,而且还利用人类的认知偏见。现有的检测系统通常依赖于单模态特征和黑盒模型,这限制了泛化和可解释性。本研究提出了一个可解释的多模式框架,该框架结合了文本和技术线索,包括消息内容、URL结构和说服原则,以捕获网络钓鱼的客观和主观方面。使用10倍分层交叉验证对几个分类器进行了评估,随机森林在性能和透明度之间实现了最佳平衡(ROC-AUC = 0.9840),并得到了SHAP解释的支持,该解释确定了最具影响力的语言和结构特征。比较分析表明,所提出的框架优于单峰基线,同时保持可解释性,为分类结果提供明确的基本原理。这些结果表明,将多模态表示与可解释学习相结合可以增强网络钓鱼检测的准确性,提高用户信任,并支持在现实环境中的可靠部署。
{"title":"Enhanced phishing detection using multimodal data","authors":"Lázaro Bustio-Martínez ,&nbsp;Vitali Herrera-Semenets ,&nbsp;Jorge A. González-Ordiano ,&nbsp;Yamel Pérez-Guadarrama ,&nbsp;Luis N. Zúñiga-Morales ,&nbsp;Daniela Montoya-Godínez ,&nbsp;Miguel A. Álvarez-Carmona ,&nbsp;Jan van den Berg","doi":"10.1016/j.knosys.2025.115105","DOIUrl":"10.1016/j.knosys.2025.115105","url":null,"abstract":"<div><div>Phishing remains one of the most persistent cybersecurity threats, increasingly exploiting not only technical vulnerabilities but also human cognitive biases. Existing detection systems often rely on single-modality features and black-box models, which restrict both generalization and interpretability. This study presents an explainable multimodal framework that combines textual and technical cues, including message content, URL structure, and Principles of Persuasion, to capture both objective and subjective aspects of phishing. Several classifiers were evaluated using 10-fold stratified cross-validation, with Random Forest achieving the best balance between performance and transparency (ROC-AUC = 0.9840), supported by SHAP explanations that identify the most influential linguistic and structural features. Comparative analysis shows that the proposed framework outperforms unimodal baselines while preserving interpretability, enabling a clear rationale for classification outcomes. These results indicate that integrating multimodal representation with explainable learning strengthens phishing detection accuracy, improves user trust, and supports reliable deployment in real-world environments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115105"},"PeriodicalIF":7.6,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GLEm-Net: Unified framework for data reduction with categorical and numerical features GLEm-Net:具有分类和数值特征的数据约简的统一框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1016/j.knosys.2025.115049
Francesco De Santis, Danilo Giordano, Marco Mellia
In an era of effortless data collection, the impact of machine learning — especially neural networks (NNs) — is undeniable. As datasets grow in size and complexity, efficiently handling mixed data types, including categorical and numerical features, becomes critical. Feature encoding and selection play a key role in improving NN performance, efficiency, interpretability, and generalisation. This paper presents GLEm-Net (Grouped Lasso with Embeddings Network), a novel NN-based approach that seamlessly integrates feature encoding and selection directly into the training process. GLEm-Net uses embedding layers to process categorical features with high cardinality, simplifying the model and improving generalisation. By extending the grouped Lasso regularisation to explicitly consider categorical features, GLEm-Net automatically identifies the most relevant features during training and returns them to the analyst.
We evaluate GLEm-Net on open and proprietary industry datasets and compare it to state-of-the-art feature selection methodologies. Results show that GLEm-Net adapts to each dataset by allowing the NN to directly select subsets of most important features, offering on par performance with the best state-of-the-art feature selection methods, while eliminating the need for the external feature encoding and selection steps that are now incorporated in the NN training stage.
在一个轻松收集数据的时代,机器学习——尤其是神经网络(NNs)的影响是不可否认的。随着数据集的规模和复杂性的增长,有效地处理混合数据类型(包括分类和数值特征)变得至关重要。特征编码和选择在提高神经网络性能、效率、可解释性和泛化方面起着关键作用。本文提出了一种新的基于神经网络的方法GLEm-Net(分组套索嵌入网络),它将特征编码和选择直接无缝地集成到训练过程中。GLEm-Net利用嵌入层对高基数的分类特征进行处理,简化了模型,提高了泛化能力。通过扩展分组Lasso正则化来明确考虑分类特征,GLEm-Net在训练期间自动识别最相关的特征并将其返回给分析师。我们在开放和专有的行业数据集上评估GLEm-Net,并将其与最先进的特征选择方法进行比较。结果表明,GLEm-Net通过允许神经网络直接选择最重要特征的子集来适应每个数据集,提供与最先进的特征选择方法相当的性能,同时消除了现在纳入神经网络训练阶段的外部特征编码和选择步骤的需要。
{"title":"GLEm-Net: Unified framework for data reduction with categorical and numerical features","authors":"Francesco De Santis,&nbsp;Danilo Giordano,&nbsp;Marco Mellia","doi":"10.1016/j.knosys.2025.115049","DOIUrl":"10.1016/j.knosys.2025.115049","url":null,"abstract":"<div><div>In an era of effortless data collection, the impact of machine learning — especially neural networks (NNs) — is undeniable. As datasets grow in size and complexity, efficiently handling mixed data types, including categorical and numerical features, becomes critical. Feature encoding and selection play a key role in improving NN performance, efficiency, interpretability, and generalisation. This paper presents GLEm-Net (Grouped Lasso with Embeddings Network), a novel NN-based approach that seamlessly integrates feature encoding and selection directly into the training process. GLEm-Net uses embedding layers to process categorical features with high cardinality, simplifying the model and improving generalisation. By extending the grouped Lasso regularisation to explicitly consider categorical features, GLEm-Net automatically identifies the most relevant features during training and returns them to the analyst.</div><div>We evaluate GLEm-Net on open and proprietary industry datasets and compare it to state-of-the-art feature selection methodologies. Results show that GLEm-Net adapts to each dataset by allowing the NN to directly select subsets of most important features, offering on par performance with the best state-of-the-art feature selection methods, while eliminating the need for the external feature encoding and selection steps that are now incorporated in the NN training stage.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115049"},"PeriodicalIF":7.6,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topological and semantic contrastive graph clustering by Ricci curvature augmentation and hypergraph fusion 基于Ricci曲率增强和超图融合的拓扑和语义对比图聚类
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1016/j.knosys.2025.115130
Dehua Peng , Guangyao Fang , Zhipeng Gui , Yuhang Liu , Huayi Wu
Contrastive graph clustering is an advanced technology in the field of cluster analysis. By leveraging graph neural networks and contrastive learning paradigm, it enables the coupling of topological structure and node semantic information for attributed graph networks. Graph augmentation and positive sample selection are two essentials of contrastive graph clustering. However, existing graph augmentation methods tend to disrupt the cluster structures, and most positive sample selectors suffer from the false negative sample problem. In this paper, we propose a Topological and Semantic Contrastive Graph Clustering (TSCGC) model consisting of three learning components. The representation learning component augments original graph using Ricci curvature to preserve the cluster structure, and introduces hypergraph view to capture high-order relationships. Graph and hypergraph convolutional networks are used to encode the triple-view embeddings. Meanwhile, we develop a dual contrastive learning component to extract the topological and semantic information. To reduce the number of false negatives, it utilizes K-means to generate pseudo cluster labels to guide the selection of positive samples. The self-supervised learning component is leveraged to align the three graph views. The final clustering results are obtained by performing K-means on the aligned embeddings. We demonstrated the effectiveness by comparing the performance of TSCGC with 13 clustering baselines on six real-world networks. Ablations verified the validity of key components and the impact of parameter settings were also analyzed. We further applied TSCGC to identify the function types of 10,370 buildings in ShenZhen City, China based on multi-source geospatial data. It achieved the highest accuracy and exhibit significant potential in handling complex network structures and high-dimensional node features. The code is available at: https://github.com/ZPGuiGroupWhu/TSCGC.
对比图聚类是聚类分析领域的一项先进技术。通过利用图神经网络和对比学习范式,实现了属性图网络拓扑结构和节点语义信息的耦合。图增广和正样本选择是对比图聚类的两个关键。然而,现有的图增广方法往往会破坏聚类结构,并且大多数正样本选择器都存在假负样本问题。本文提出了一种拓扑语义对比图聚类(TSCGC)模型,该模型由三个学习组件组成。表示学习组件利用Ricci曲率增强原始图以保持聚类结构,并引入超图视图以捕获高阶关系。使用图和超图卷积网络对三视图嵌入进行编码。同时,我们开发了一种双对比学习组件来提取拓扑和语义信息。为了减少假阴性的数量,它利用K-means生成伪聚类标签来指导阳性样本的选择。利用自监督学习组件来对齐三个图视图。最终的聚类结果是通过对对齐的嵌入执行K-means得到的。我们通过将TSCGC与六个真实网络上的13个聚类基线的性能进行比较来证明其有效性。验证了关键部件的有效性,并对参数设置的影响进行了分析。基于多源地理空间数据,应用TSCGC对中国深圳市10370栋建筑的功能类型进行了识别。它达到了最高的精度,在处理复杂网络结构和高维节点特征方面显示出巨大的潜力。代码可从https://github.com/ZPGuiGroupWhu/TSCGC获得。
{"title":"Topological and semantic contrastive graph clustering by Ricci curvature augmentation and hypergraph fusion","authors":"Dehua Peng ,&nbsp;Guangyao Fang ,&nbsp;Zhipeng Gui ,&nbsp;Yuhang Liu ,&nbsp;Huayi Wu","doi":"10.1016/j.knosys.2025.115130","DOIUrl":"10.1016/j.knosys.2025.115130","url":null,"abstract":"<div><div>Contrastive graph clustering is an advanced technology in the field of cluster analysis. By leveraging graph neural networks and contrastive learning paradigm, it enables the coupling of topological structure and node semantic information for attributed graph networks. Graph augmentation and positive sample selection are two essentials of contrastive graph clustering. However, existing graph augmentation methods tend to disrupt the cluster structures, and most positive sample selectors suffer from the false negative sample problem. In this paper, we propose a Topological and Semantic Contrastive Graph Clustering (TSCGC) model consisting of three learning components. The representation learning component augments original graph using Ricci curvature to preserve the cluster structure, and introduces hypergraph view to capture high-order relationships. Graph and hypergraph convolutional networks are used to encode the triple-view embeddings. Meanwhile, we develop a dual contrastive learning component to extract the topological and semantic information. To reduce the number of false negatives, it utilizes K-means to generate pseudo cluster labels to guide the selection of positive samples. The self-supervised learning component is leveraged to align the three graph views. The final clustering results are obtained by performing K-means on the aligned embeddings. We demonstrated the effectiveness by comparing the performance of TSCGC with 13 clustering baselines on six real-world networks. Ablations verified the validity of key components and the impact of parameter settings were also analyzed. We further applied TSCGC to identify the function types of 10,370 buildings in ShenZhen City, China based on multi-source geospatial data. It achieved the highest accuracy and exhibit significant potential in handling complex network structures and high-dimensional node features. The code is available at: <span><span>https://github.com/ZPGuiGroupWhu/TSCGC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115130"},"PeriodicalIF":7.6,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Span-level detection of AI-generated scientific text via contrastive learning and structural calibration 通过对比学习和结构校准对人工智能生成的科学文本进行跨度级检测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-14 DOI: 10.1016/j.knosys.2025.115123
Zhen Yin , Shenghua Wang
The rapid adoption of large language models (LLMs) in scientific writing raises serious concerns regarding authorship integrity and the reliability of scholarly publications. Existing detection approaches mainly rely on document-level classification or surface-level statistical cues; however, they neglect fine-grained span localization, exhibit weak calibration, and often fail to generalize across disciplines and generators. To address these limitations, we present Sci-SpanDet, a structure-aware framework for detecting AI-generated scholarly texts. The proposed method combines section-conditioned stylistic modeling with multi-level contrastive learning to capture nuanced human-AI differences while mitigating topic dependence, thereby enhancing cross-domain robustness. In addition, it integrates BIO-CRF sequence labeling with pointer-based boundary decoding and confidence calibration to enable precise span-level detection and reliable probability estimates. Extensive experiments on a newly constructed cross-disciplinary dataset of 100,000 annotated samples generated by multiple LLM families (GPT, Qwen, DeepSeek, LLaMA) demonstrate that Sci-SpanDet achieves state-of-the-art performance, with F1(AI) of 80.17, AUROC of 92.63, and Span-F1 of 74.36. Furthermore, it shows strong resilience under adversarial rewriting and maintains balanced accuracy across IMRaD sections and diverse disciplines, substantially surpassing existing baselines. To support reproducibility and encourage future research on AI-generated text detection in scholarly documents, we plan to release the curated dataset and source code at a later stage.
在科学写作中快速采用大型语言模型(llm)引起了对作者身份完整性和学术出版物可靠性的严重关注。现有的检测方法主要依赖于文档级分类或表面级统计线索;然而,它们忽略了细粒度的跨度定位,表现出弱校准,并且经常不能跨学科和生成器进行泛化。为了解决这些限制,我们提出了Sci-SpanDet,这是一个用于检测人工智能生成的学术文本的结构感知框架。该方法将分段条件风格建模与多层次对比学习相结合,在减轻主题依赖性的同时捕捉细微的人类-人工智能差异,从而增强跨域鲁棒性。此外,它集成了BIO-CRF序列标记与基于指针的边界解码和置信度校准,以实现精确的跨度级检测和可靠的概率估计。在由多个LLM家族(GPT, Qwen, DeepSeek, LLaMA)生成的100,000个带注释样本的新构建的跨学科数据集上进行的大量实验表明,Sci-SpanDet达到了最先进的性能,F1(AI)为80.17,AUROC为92.63,Span-F1为74.36。此外,它在对抗性重写下显示出强大的弹性,并在IMRaD部分和不同学科之间保持平衡的准确性,大大超过了现有的基线。为了支持可重复性并鼓励未来对学术文档中人工智能生成的文本检测的研究,我们计划在稍后阶段发布精心策划的数据集和源代码。
{"title":"Span-level detection of AI-generated scientific text via contrastive learning and structural calibration","authors":"Zhen Yin ,&nbsp;Shenghua Wang","doi":"10.1016/j.knosys.2025.115123","DOIUrl":"10.1016/j.knosys.2025.115123","url":null,"abstract":"<div><div>The rapid adoption of large language models (LLMs) in scientific writing raises serious concerns regarding authorship integrity and the reliability of scholarly publications. Existing detection approaches mainly rely on document-level classification or surface-level statistical cues; however, they neglect fine-grained span localization, exhibit weak calibration, and often fail to generalize across disciplines and generators. To address these limitations, we present Sci-SpanDet, a structure-aware framework for detecting AI-generated scholarly texts. The proposed method combines section-conditioned stylistic modeling with multi-level contrastive learning to capture nuanced human-AI differences while mitigating topic dependence, thereby enhancing cross-domain robustness. In addition, it integrates BIO-CRF sequence labeling with pointer-based boundary decoding and confidence calibration to enable precise span-level detection and reliable probability estimates. Extensive experiments on a newly constructed cross-disciplinary dataset of 100,000 annotated samples generated by multiple LLM families (GPT, Qwen, DeepSeek, LLaMA) demonstrate that Sci-SpanDet achieves state-of-the-art performance, with F1(AI) of 80.17, AUROC of 92.63, and Span-F1 of 74.36. Furthermore, it shows strong resilience under adversarial rewriting and maintains balanced accuracy across IMRaD sections and diverse disciplines, substantially surpassing existing baselines. To support reproducibility and encourage future research on AI-generated text detection in scholarly documents, we plan to release the curated dataset and source code at a later stage.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115123"},"PeriodicalIF":7.6,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ACRA: An adaptive chain retrieval architecture for multi-modal knowledge-Augmented visual question answering ACRA:多模态知识增强视觉问答的自适应链检索体系结构
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.knosys.2025.115136
Zihao Zhang , Shuwen Yang , Xingjiao Wu , Jiabao Zhao , Qin Chen , Jing Yang , Liang He
Visual question answering (VQA) in knowledge-intensive scenarios requires integrating of external knowledge to bridge the semantic gap between shallow linguistic queries and complex reasoning requirements. However, existing methods typically rely on single-hop retrieval strategies, which are prone to overlooking intermediate facts essential for accurate reasoning. To address this limitation, we propose adaptive chain retrieval architecture (ACRA), a novel multi-hop retrieval framework based on large-model-generated evidence chain annotations. ACRA constructs structured reasoning paths by progressively selecting key evidence nodes using an adaptive matching mechanism based on an encoder-only transformer. To improve evidence discrimination, we design a hybrid loss optimization strategy that incorporates dynamically mined hard negatives, combining binary cross-entropy and margin-based ranking loss. Furthermore, we introduce a depth-aware adaptive beam search algorithm that models evidence retrieval as a sequential process, gradually increasing the matching threshold with search depth to suppress irrelevant content while maintaining logical coherence. We evaluate ACRA on the WebQA and MultimodalQA. ACRA achieves 55.4 % QA accuracy and 90.2 % F1 score on WebQA, and 78.8 % EM and 82.4 % F1 on MultimodalQA. Experimental results show that ACRA consistently outperforms state-of-the-art baselines in terms of retrieval accuracy and reasoning consistency, demonstrating its effectiveness in mitigating cognitive biases and improving multi-hop reasoning in VQA tasks.
知识密集型场景下的视觉问答(VQA)需要整合外部知识,以弥合浅层语言查询与复杂推理需求之间的语义鸿沟。然而,现有的方法通常依赖于单跳检索策略,这容易忽略准确推理所必需的中间事实。为了解决这一限制,我们提出了自适应链检索架构(ACRA),这是一种基于大模型生成的证据链注释的多跳检索框架。ACRA使用基于编码转换器的自适应匹配机制逐步选择关键证据节点,从而构建结构化的推理路径。为了提高证据辨别能力,我们设计了一种混合损失优化策略,该策略结合了动态挖掘的硬否定,将二元交叉熵和基于边际的排序损失相结合。此外,我们引入了一种深度感知自适应波束搜索算法,该算法将证据检索建模为一个顺序过程,随着搜索深度逐渐增加匹配阈值,以在保持逻辑一致性的同时抑制不相关内容。我们在WebQA和MultimodalQA上对ACRA进行了评估。ACRA在WebQA上达到55.4%的QA准确率和90.2%的F1分数,在MultimodalQA上达到78.8%的EM和82.4%的F1分数。实验结果表明,ACRA在检索精度和推理一致性方面始终优于最先进的基线,证明了其在减轻认知偏差和改善VQA任务中的多跳推理方面的有效性。
{"title":"ACRA: An adaptive chain retrieval architecture for multi-modal knowledge-Augmented visual question answering","authors":"Zihao Zhang ,&nbsp;Shuwen Yang ,&nbsp;Xingjiao Wu ,&nbsp;Jiabao Zhao ,&nbsp;Qin Chen ,&nbsp;Jing Yang ,&nbsp;Liang He","doi":"10.1016/j.knosys.2025.115136","DOIUrl":"10.1016/j.knosys.2025.115136","url":null,"abstract":"<div><div>Visual question answering (VQA) in knowledge-intensive scenarios requires integrating of external knowledge to bridge the semantic gap between shallow linguistic queries and complex reasoning requirements. However, existing methods typically rely on single-hop retrieval strategies, which are prone to overlooking intermediate facts essential for accurate reasoning. To address this limitation, we propose adaptive chain retrieval architecture (ACRA), a novel multi-hop retrieval framework based on large-model-generated evidence chain annotations. ACRA constructs structured reasoning paths by progressively selecting key evidence nodes using an adaptive matching mechanism based on an encoder-only transformer. To improve evidence discrimination, we design a hybrid loss optimization strategy that incorporates dynamically mined hard negatives, combining binary cross-entropy and margin-based ranking loss. Furthermore, we introduce a depth-aware adaptive beam search algorithm that models evidence retrieval as a sequential process, gradually increasing the matching threshold with search depth to suppress irrelevant content while maintaining logical coherence. We evaluate ACRA on the WebQA and MultimodalQA. ACRA achieves 55.4 % QA accuracy and 90.2 % F1 score on WebQA, and 78.8 % EM and 82.4 % F1 on MultimodalQA. Experimental results show that ACRA consistently outperforms state-of-the-art baselines in terms of retrieval accuracy and reasoning consistency, demonstrating its effectiveness in mitigating cognitive biases and improving multi-hop reasoning in VQA tasks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115136"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IVC-DB: Iterative verification correction method guided by dual-Backward mathematical reasoning in large language models IVC-DB:基于双后向数学推理的大型语言模型迭代验证校正方法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.knosys.2025.115095
Kunpeng Du , Xuan Zhang , Chen Gao , Rui Zhu , Shaobo Liu , Tong Li , Zhi Jin
Large Language Models (LLMs) have demonstrated remarkable potential in mathematical reasoning tasks. However, forward chain-of-thought (CoT) reasoning tends to overly rely on the surface description of questions, making it vulnerable to slight modifications of specific numbers or terms, which can significantly impair question-solving performance. Current bidirectional reasoning approaches attempt to alleviate the limitations of unidirectional reasoning by introducing backward reasoning to verify the forward answer. However, LLMs often underperform in backward reasoning, potentially introducing cascading errors during the verification process and thus constraining overall reasoning performances. To address this challenge, we propose the Iterative Verification Correction Method Guided by Dual-Backward (IVC-DB) framework. IVC-DB generates diverse styles of backward question pairs through contextual consistency and a templated method, establishing a dual backward verification mechanism consisting of conclusion verification and premise verification. Furthermore, the framework incorporates iterative modules for reasoning, verification, and correction, which dynamically refine candidate solutions and reduce errors in the verification stage. This design mitigates potential errors in backward reasoning and enhances the mathematical question-solving capabilities of LLMs. Experimental results show that IVC-DB significantly outperforms state-of-the-art methods across seven mathematical reasoning datasets and two non-mathematical tasks, achieving average accuracies of 89.8 % with GPT-3.5-Turbo and 94.5 % with GPT-4. Ablation studies further reveal the complementary nature of different backward question styles and the crucial role of dual verification in reducing cascading errors.
大型语言模型(llm)在数学推理任务中显示出显著的潜力。然而,前向思维链(CoT)推理往往过度依赖于对问题的表面描述,使其容易受到特定数字或术语的轻微修改的影响,这可能会严重损害解决问题的性能。目前的双向推理方法试图通过引入反向推理来验证正向答案,从而减轻单向推理的局限性。然而,llm在逆向推理中往往表现不佳,在验证过程中可能会引入级联错误,从而限制整体推理性能。为了解决这一挑战,我们提出了双向后(IVC-DB)框架引导的迭代验证校正方法。IVC-DB通过上下文一致性和模板化方法生成不同风格的后向问题对,建立了结论验证和前提验证的双重后向验证机制。此外,该框架结合了用于推理、验证和修正的迭代模块,可以动态地改进候选解决方案并减少验证阶段的错误。这种设计减轻了逆向推理中的潜在错误,增强了llm的数学问题解决能力。实验结果表明,IVC-DB在7个数学推理数据集和2个非数学任务上明显优于最先进的方法,GPT-3.5-Turbo的平均准确率为89.8%,GPT-4的平均准确率为94.5%。消融研究进一步揭示了不同后向问题风格的互补性,以及双重验证在减少级联错误方面的关键作用。
{"title":"IVC-DB: Iterative verification correction method guided by dual-Backward mathematical reasoning in large language models","authors":"Kunpeng Du ,&nbsp;Xuan Zhang ,&nbsp;Chen Gao ,&nbsp;Rui Zhu ,&nbsp;Shaobo Liu ,&nbsp;Tong Li ,&nbsp;Zhi Jin","doi":"10.1016/j.knosys.2025.115095","DOIUrl":"10.1016/j.knosys.2025.115095","url":null,"abstract":"<div><div>Large Language Models (LLMs) have demonstrated remarkable potential in mathematical reasoning tasks. However, forward chain-of-thought (CoT) reasoning tends to overly rely on the surface description of questions, making it vulnerable to slight modifications of specific numbers or terms, which can significantly impair question-solving performance. Current bidirectional reasoning approaches attempt to alleviate the limitations of unidirectional reasoning by introducing backward reasoning to verify the forward answer. However, LLMs often underperform in backward reasoning, potentially introducing cascading errors during the verification process and thus constraining overall reasoning performances. To address this challenge, we propose the Iterative Verification Correction Method Guided by Dual-Backward (IVC-DB) framework. IVC-DB generates diverse styles of backward question pairs through contextual consistency and a templated method, establishing a dual backward verification mechanism consisting of conclusion verification and premise verification. Furthermore, the framework incorporates iterative modules for reasoning, verification, and correction, which dynamically refine candidate solutions and reduce errors in the verification stage. This design mitigates potential errors in backward reasoning and enhances the mathematical question-solving capabilities of LLMs. Experimental results show that IVC-DB significantly outperforms state-of-the-art methods across seven mathematical reasoning datasets and two non-mathematical tasks, achieving average accuracies of 89.8 % with GPT-3.5-Turbo and 94.5 % with GPT-4. Ablation studies further reveal the complementary nature of different backward question styles and the crucial role of dual verification in reducing cascading errors.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115095"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LE-DLCM: Decoupled learner and course modeling with large language models for enhanced course recommendation LE-DLCM:使用大型语言模型的解耦学习者和课程建模,用于增强课程推荐
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.knosys.2025.115135
Jinjin Ma , Zhuo Zhao , Zhiwen Xie , Yi Zhang , Guangyou Zhou
Artificial intelligence offers unprecedented opportunities for personalized learning experiences, enhanced accessibility, and improved educational outcomes for all. One particularly promising application is the development of course recommendation systems, which aim to help learners navigate the vast array of available resources to identify courses that best align with their individual needs and objectives. Although previous course recommendation models have yielded encouraging results, they still encounter practical limitations due to the sparsity of learner-course interaction data and the isolation of course offerings. The remarkable success of large language models (LLMs) across various fields has inspired us to explore their integration into course recommendation systems to further optimize personalized recommendation capabilities. This paper presents a dual-channel architecture for course recommendation: the LLM-Enhanced Decoupled Learner and Course Modeling (LE-DLCM) framework. It has two synergistic components: (1) knowledge-enhanced LLM-based course modeling that integrates the potential relations between different courses to obtain enhanced embedding vectors, and (2) interaction-enhanced LLM-based learner modeling that leverages more historical interactions from similar learners to simulate cold-start learner interactions. Extensive experiments conducted on two real-world datasets demonstrate the superior performance of LE-DLCM, achieving improvements of 12.1 % in NDCG@10 on MOOCCube and 11.6 % in NDCG@5 on MOOCCourse compared to state-of-the-art baselines. These empirical findings not only validate the efficacy of LED-LCM in overcoming data sparsity and isolated courses but also confirm substantial progress in the field of personalized course recommendation systems.
人工智能为个性化学习体验、增强可访问性和改善所有人的教育成果提供了前所未有的机会。一个特别有前途的应用是课程推荐系统的开发,该系统旨在帮助学习者在大量可用资源中导航,以确定最符合他们个人需求和目标的课程。尽管以前的课程推荐模型已经取得了令人鼓舞的结果,但由于学习者-课程交互数据的稀疏性和课程设置的孤立性,它们仍然遇到了实际的限制。大型语言模型(llm)在各个领域的显著成功激发了我们探索将其集成到课程推荐系统中,以进一步优化个性化推荐功能。本文提出了一种双通道的课程推荐体系结构:llm增强的解耦学习者和课程建模(LE-DLCM)框架。它有两个协同的组成部分:(1)基于知识增强的llm课程建模,集成了不同课程之间的潜在关系,以获得增强的嵌入向量;(2)基于交互增强的llm学习者建模,利用类似学习者的更多历史交互来模拟冷启动学习者交互。在两个真实数据集上进行的大量实验证明了LE-DLCM的优越性能,与最先进的基线相比,在MOOCCube上的NDCG@10和moooccourse上的NDCG@5分别提高了12.1%和11.6%。这些实证结果不仅验证了LED-LCM在克服数据稀疏性和孤立性课程方面的有效性,也证实了个性化课程推荐系统领域的实质性进展。
{"title":"LE-DLCM: Decoupled learner and course modeling with large language models for enhanced course recommendation","authors":"Jinjin Ma ,&nbsp;Zhuo Zhao ,&nbsp;Zhiwen Xie ,&nbsp;Yi Zhang ,&nbsp;Guangyou Zhou","doi":"10.1016/j.knosys.2025.115135","DOIUrl":"10.1016/j.knosys.2025.115135","url":null,"abstract":"<div><div>Artificial intelligence offers unprecedented opportunities for personalized learning experiences, enhanced accessibility, and improved educational outcomes for all. One particularly promising application is the development of course recommendation systems, which aim to help learners navigate the vast array of available resources to identify courses that best align with their individual needs and objectives. Although previous course recommendation models have yielded encouraging results, they still encounter practical limitations due to the sparsity of learner-course interaction data and the isolation of course offerings. The remarkable success of large language models (LLMs) across various fields has inspired us to explore their integration into course recommendation systems to further optimize personalized recommendation capabilities. This paper presents a dual-channel architecture for course recommendation: the <strong>L</strong>LM-<strong>E</strong>nhanced <strong>D</strong>ecoupled <strong>L</strong>earner and <strong>C</strong>ourse <strong>M</strong>odeling (LE-DLCM) framework. It has two synergistic components: (1) knowledge-enhanced LLM-based course modeling that integrates the potential relations between different courses to obtain enhanced embedding vectors, and (2) interaction-enhanced LLM-based learner modeling that leverages more historical interactions from similar learners to simulate cold-start learner interactions. Extensive experiments conducted on two real-world datasets demonstrate the superior performance of LE-DLCM, achieving improvements of 12.1 % in NDCG@10 on MOOCCube and 11.6 % in NDCG@5 on MOOCCourse compared to state-of-the-art baselines. These empirical findings not only validate the efficacy of LED-LCM in overcoming data sparsity and isolated courses but also confirm substantial progress in the field of personalized course recommendation systems.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115135"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCDLNet: A label-noise tolerant classification algorithm for polsar images based on dual-band consistency and difference DCDLNet:一种基于双频一致性和差分的偏振图像标签噪声容忍分类算法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.knosys.2025.115120
Xinyue Xin , Ming Li , Yan Wu , Peng Zhang , Dazhi Xu
With the advancement of technology, PolSAR systems can acquire multiple signals by transmitting and receiving electromagnetic waves in different frequency bands, thereby enabling the collection of richer ground observation information. However, due to the lack of consideration for the concepts of dual-band consistency and dual-band difference, existing fusion methods still encounter problems of incomplete semantic information and low computational efficiency. Moreover, in practice, the process of sample labeling often involves manual intervention, which inevitably introduces labeling errors. To tackle these problems, we propose a novel label-noise tolerant classification framework called DCDLNet: dual-band consistency and difference learning network. Specifically, to extract the rich information contained in dual-band PolSAR data, the DCDLNet comprises two principal parts. The first part is an inter-band difference acquisition module (IDAM), which learns dual-band complementary information based on the concept of dual-band difference. The second part is a spatial-domain and frequency-domain feature extraction (SFFE) module. It acquires more discriminative information by capturing local spatial information in the spatial-domain and global spatial information in the frequency-domain. Furthermore, by integrating the concept of dual-band consistency and the fitting capabilities of neural networks, DCDLNet adopts a cross-band and bidirectional supervised (CBS) strategy to mitigate the impact of label noise during the training process. Experiments on measured PolSAR datasets demonstrate that our method outperforms several existing approaches in terms of dual-band fusion and noisy label processing.
随着技术的进步,PolSAR系统可以通过发射和接收不同频段的电磁波来获取多个信号,从而可以收集更丰富的地面观测信息。然而,由于缺乏对双频一致性和双频差分概念的考虑,现有的融合方法仍然存在语义信息不完整和计算效率低等问题。此外,在实践中,样品标注过程往往涉及人工干预,这不可避免地会引入标注错误。为了解决这些问题,我们提出了一种新的标签-噪声容忍分类框架,称为DCDLNet:双频一致性和差异学习网络。具体来说,为了提取双频PolSAR数据中包含的丰富信息,DCDLNet包括两个主要部分。第一部分是带间差分采集模块(IDAM),该模块基于双频差分的概念学习双频互补信息。第二部分是空域和频域特征提取模块。该方法在空域捕获局部空间信息,在频域捕获全局空间信息,从而获得更多的判别信息。此外,DCDLNet通过整合双频一致性的概念和神经网络的拟合能力,采用跨频带双向监督(CBS)策略来减轻训练过程中标签噪声的影响。在实测PolSAR数据集上的实验表明,我们的方法在双频融合和噪声标签处理方面优于现有的几种方法。
{"title":"DCDLNet: A label-noise tolerant classification algorithm for polsar images based on dual-band consistency and difference","authors":"Xinyue Xin ,&nbsp;Ming Li ,&nbsp;Yan Wu ,&nbsp;Peng Zhang ,&nbsp;Dazhi Xu","doi":"10.1016/j.knosys.2025.115120","DOIUrl":"10.1016/j.knosys.2025.115120","url":null,"abstract":"<div><div>With the advancement of technology, PolSAR systems can acquire multiple signals by transmitting and receiving electromagnetic waves in different frequency bands, thereby enabling the collection of richer ground observation information. However, due to the lack of consideration for the concepts of dual-band consistency and dual-band difference, existing fusion methods still encounter problems of incomplete semantic information and low computational efficiency. Moreover, in practice, the process of sample labeling often involves manual intervention, which inevitably introduces labeling errors. To tackle these problems, we propose a novel label-noise tolerant classification framework called DCDLNet: dual-band consistency and difference learning network. Specifically, to extract the rich information contained in dual-band PolSAR data, the DCDLNet comprises two principal parts. The first part is an inter-band difference acquisition module (IDAM), which learns dual-band complementary information based on the concept of dual-band difference. The second part is a spatial-domain and frequency-domain feature extraction (SFFE) module. It acquires more discriminative information by capturing local spatial information in the spatial-domain and global spatial information in the frequency-domain. Furthermore, by integrating the concept of dual-band consistency and the fitting capabilities of neural networks, DCDLNet adopts a cross-band and bidirectional supervised (CBS) strategy to mitigate the impact of label noise during the training process. Experiments on measured PolSAR datasets demonstrate that our method outperforms several existing approaches in terms of dual-band fusion and noisy label processing.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115120"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mul-VMamba: Multimodal semantic segmentation using selection-fusion-based vision-Mamba multi - mamba:使用基于选择融合的视觉mamba进行多模态语义分割
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.knosys.2025.115119
Rongrong Ni , Yuanhui Guo , Biao Yang , Yi Liu , Hai Wang , Chuan Hu
For tasks such as autonomous driving and remote sensing, integrating multimodal data (RGB, depth, infrared, and others) can significantly enhance the accuracy and robustness of semantic segmentation under complex environmental conditions, thereby providing precise and reliable information for downstream tasks. However, existing approaches emphasize segmentation accuracy at the expense of efficiency. To address this trade-off, we propose a multimodal semantic segmentation network based on the linear complexity Selective State Space Model (S6, a.k.a Mamba), dubbed Mul-VMamba. Mul-VMamba establishes selection-fusion relationships among multimodal features, enabling semantic segmentation with any input modalities. Specifically, the Mamba Spatial-consistency Selective Module (MSSM) adaptively extracts feature mapping relationships and filters out redundant features at identical spatial locations, preserving the spatial relationships between each modality. Additionally, the Mamba Cross-Fusion Module (MCFM) introduces a Cross Selective State Space Model (Cross-S6), establishing the relationship between S6 and multimodal features, achieving optimal fusion performance. Qualitative and quantitative evaluations on the MCubes and DeLiVER datasets demonstrate the efficacy and efficiency of Mul-VMamba. Notably, Mul-VMamba achieves 54.65 % / 68.98 % mIoU on Mcubes / DeLiVER datasets using only 55.33M params. The source code of Mul-VMamba is publicly available at https://github.com/Mask0913/Mul-VMamba.
对于自动驾驶和遥感等任务,集成多模态数据(RGB、深度、红外等)可以显著提高复杂环境条件下语义分割的准确性和鲁棒性,从而为下游任务提供精确可靠的信息。然而,现有的分割方法以牺牲效率为代价强调分割的准确性。为了解决这种权衡,我们提出了一种基于线性复杂性选择状态空间模型(S6,又名Mamba)的多模态语义分割网络,称为multi - vmamba。multi - vamba在多模态特征之间建立了选择融合关系,实现了对任何输入模态的语义分割。具体而言,曼巴空间一致性选择模块(MSSM)自适应提取特征映射关系,并在相同的空间位置过滤冗余特征,保留每个模态之间的空间关系。此外,曼巴交叉融合模块(MCFM)引入了交叉选择状态空间模型(Cross-S6),建立了S6和多模态特征之间的关系,实现了最佳融合性能。对MCubes和DeLiVER数据集的定性和定量评价证明了multi - vamba的有效性和效率。值得注意的是,multi - vamba在Mcubes / DeLiVER数据集上实现了54.65% / 68.98%的mIoU,仅使用55.33万个参数。mull - vamba的源代码可以在https://github.com/Mask0913/Mul-VMamba上公开获得。
{"title":"Mul-VMamba: Multimodal semantic segmentation using selection-fusion-based vision-Mamba","authors":"Rongrong Ni ,&nbsp;Yuanhui Guo ,&nbsp;Biao Yang ,&nbsp;Yi Liu ,&nbsp;Hai Wang ,&nbsp;Chuan Hu","doi":"10.1016/j.knosys.2025.115119","DOIUrl":"10.1016/j.knosys.2025.115119","url":null,"abstract":"<div><div>For tasks such as autonomous driving and remote sensing, integrating multimodal data (RGB, depth, infrared, and others) can significantly enhance the accuracy and robustness of semantic segmentation under complex environmental conditions, thereby providing precise and reliable information for downstream tasks. However, existing approaches emphasize segmentation accuracy at the expense of efficiency. To address this trade-off, we propose a multimodal semantic segmentation network based on the linear complexity Selective State Space Model (S6, a.k.a Mamba), dubbed Mul-VMamba. Mul-VMamba establishes selection-fusion relationships among multimodal features, enabling semantic segmentation with any input modalities. Specifically, the Mamba Spatial-consistency Selective Module (MSSM) adaptively extracts feature mapping relationships and filters out redundant features at identical spatial locations, preserving the spatial relationships between each modality. Additionally, the Mamba Cross-Fusion Module (MCFM) introduces a Cross Selective State Space Model (Cross-S6), establishing the relationship between S6 and multimodal features, achieving optimal fusion performance. Qualitative and quantitative evaluations on the MCubes and DeLiVER datasets demonstrate the efficacy and efficiency of Mul-VMamba. Notably, Mul-VMamba achieves 54.65 % / 68.98 % mIoU on Mcubes / DeLiVER datasets using only 55.33M params. The source code of Mul-VMamba is publicly available at <span><span>https://github.com/Mask0913/Mul-VMamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115119"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LiFedST: A linearized federated split-attention transformer for spatio-temporal forecasting LiFedST:用于时空预测的线性化联邦分散注意转换器
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.knosys.2025.115110
Chengjie Ying , Xi Fang , Zhiqiang Ru , Yuan Wan , Liang Xie
Federated spatio-temporal forecasting is critical for privacy-sensitive applications such as traffic prediction, yet reconstructing global spatial dependencies under strict privacy constraints remains a major challenge. While Transformer-based architectures are effective for spatio-temporal modeling, their application in federated settings is limited by privacy concerns and the quadratic complexity of attention computation. To address these issues, we propose LiFedST, a Linearized Federated Split-attention Transformer that enables global spatial dependency reconstruction with linear complexity and strong privacy guarantees. LiFedST first applies a lightweight linear temporal module for local sequence encoding. To capture global spatial relationships, we introduce a novel Split-attention mechanism based on Taylor Polynomial Feature Mapping, which makes the softmax operation in self-attention separable and reduces its computational complexity to linear. Based on this formulation, each client can locally compute Split-Attention Aggregation (SAG) and transmit it to the server for aggregation, without exposing raw data or spatial structures. A hierarchical federated training process based on FedAvg is used to update model parameters efficiently. Extensive experiments on real-world traffic and air quality datasets demonstrate that LiFedST consistently outperforms state-of-the-art centralized and federated baselines in MAE, RMSE, and MAPE, while preserving data sovereignty. Our code is available at: https://github.com/yingchengjie1109/LiFedST.
联邦时空预测对于隐私敏感的应用(如交通预测)至关重要,但在严格的隐私约束下重建全球空间依赖关系仍然是一个主要挑战。虽然基于变压器的体系结构对时空建模是有效的,但它们在联邦环境中的应用受到隐私问题和注意力计算的二次复杂度的限制。为了解决这些问题,我们提出了LiFedST,这是一个线性化的联邦分离注意转换器,它支持具有线性复杂性和强隐私保证的全局空间依赖重建。LiFedST首先为局部序列编码应用轻量级线性时序模块。为了捕捉全局空间关系,我们引入了一种新的基于Taylor多项式特征映射的分离注意机制,使自注意中的softmax操作可分离,并将其计算复杂度降低到线性。基于这个公式,每个客户端都可以在本地计算Split-Attention Aggregation (SAG)并将其传输到服务器进行聚合,而无需暴露原始数据或空间结构。采用基于fedag的分层联邦训练过程有效地更新模型参数。在实际交通和空气质量数据集上进行的大量实验表明,LiFedST在保持数据主权的同时,始终优于MAE、RMSE和MAPE中最先进的集中式和联合基线。我们的代码可在:https://github.com/yingchengjie1109/LiFedST。
{"title":"LiFedST: A linearized federated split-attention transformer for spatio-temporal forecasting","authors":"Chengjie Ying ,&nbsp;Xi Fang ,&nbsp;Zhiqiang Ru ,&nbsp;Yuan Wan ,&nbsp;Liang Xie","doi":"10.1016/j.knosys.2025.115110","DOIUrl":"10.1016/j.knosys.2025.115110","url":null,"abstract":"<div><div>Federated spatio-temporal forecasting is critical for privacy-sensitive applications such as traffic prediction, yet reconstructing global spatial dependencies under strict privacy constraints remains a major challenge. While Transformer-based architectures are effective for spatio-temporal modeling, their application in federated settings is limited by privacy concerns and the quadratic complexity of attention computation. To address these issues, we propose LiFedST, a Linearized Federated Split-attention Transformer that enables global spatial dependency reconstruction with linear complexity and strong privacy guarantees. LiFedST first applies a lightweight linear temporal module for local sequence encoding. To capture global spatial relationships, we introduce a novel Split-attention mechanism based on Taylor Polynomial Feature Mapping, which makes the softmax operation in self-attention separable and reduces its computational complexity to linear. Based on this formulation, each client can locally compute Split-Attention Aggregation (SAG) and transmit it to the server for aggregation, without exposing raw data or spatial structures. A hierarchical federated training process based on FedAvg is used to update model parameters efficiently. Extensive experiments on real-world traffic and air quality datasets demonstrate that LiFedST consistently outperforms state-of-the-art centralized and federated baselines in MAE, RMSE, and MAPE, while preserving data sovereignty. Our code is available at: <span><span>https://github.com/yingchengjie1109/LiFedST</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"334 ","pages":"Article 115110"},"PeriodicalIF":7.6,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1