首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
Structure adversarial augmented graph anomaly detection via multi-view contrastive learning 基于多视图对比学习的结构对抗增强图异常检测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-01 DOI: 10.1016/j.knosys.2026.115455
Qian Chen , Huiying Xu , Ruidong Wang , Yue Liu , Xinzhong Zhu
Graph anomaly detection is essential for many security-related fields but faces significant challenges in handling complex real-world graph data. Due to the complex and imbalanced graph structure, it is difficult to find abnormal points among many nodes. Current contrastive learning methods often overlook structural imperfections in real-world graphs, such as redundant edges and low-degree sparse nodes. Redundant connections may introduce noise during message passing, while sparse nodes receive insufficient structural information to accurately learn representation, which can degrade detection performance. To overcome above challenges, we propose SAA-GCL, an innovative framework that integrates adaptive structure adversarial augmentation with multi-view contrastive learning. Specifically, by edge weight learning and LMSE loss calculation, our approach adaptively optimizes the structure of the augmented graph, discards redundant edges as much as possible, and retains more discriminating features. For low-degree sparse nodes, we mix their self-networks with the self-networks of auxiliary nodes to improve the representation quality. In order to fully mine abnormal information, we use the multi-view contrastive loss function to distinguish positive and negative sample pairs within the view and maintain cross-view consistency. The framework adaptively refines the graph topology to suppress noisy edges and enhance representations for structurally weak nodes, so it can improve anomaly detection performance in the imbalanced structure attribute graph. Comprehensive experiments on six real-world graph datasets show that SAA-GCL is superior to existing methods in detection accuracy. Our code is open source at https://github.com/HZAI-ZJNU/SAAGCL.
图异常检测在许多与安全相关的领域是必不可少的,但在处理复杂的现实世界图数据方面面临着重大挑战。由于图结构复杂且不平衡,很难在众多节点中发现异常点。目前的对比学习方法往往忽略了现实世界图的结构缺陷,如冗余边和低度稀疏节点。冗余连接可能会在消息传递过程中引入噪声,而稀疏节点接收到的结构信息不足,无法准确学习表征,从而降低检测性能。为了克服上述挑战,我们提出了一种集成了自适应结构对抗增强和多视角对比学习的创新框架SAA-GCL。具体来说,我们的方法通过边权学习和LMSE损失计算,自适应优化增广图的结构,尽可能地丢弃冗余边,并保留更多的判别特征。对于低度稀疏节点,我们将其自网络与辅助节点的自网络混合,以提高表示质量。为了充分挖掘异常信息,我们使用多视图对比损失函数来区分视图内的正、负样本对,并保持跨视图一致性。该框架自适应细化图拓扑,抑制边缘噪声,增强结构弱节点的表示,从而提高结构不平衡属性图的异常检测性能。在6个真实图数据集上的综合实验表明,SAA-GCL在检测精度上优于现有方法。我们的代码在https://github.com/HZAI-ZJNU/SAAGCL上是开源的。
{"title":"Structure adversarial augmented graph anomaly detection via multi-view contrastive learning","authors":"Qian Chen ,&nbsp;Huiying Xu ,&nbsp;Ruidong Wang ,&nbsp;Yue Liu ,&nbsp;Xinzhong Zhu","doi":"10.1016/j.knosys.2026.115455","DOIUrl":"10.1016/j.knosys.2026.115455","url":null,"abstract":"<div><div>Graph anomaly detection is essential for many security-related fields but faces significant challenges in handling complex real-world graph data. Due to the complex and imbalanced graph structure, it is difficult to find abnormal points among many nodes. Current contrastive learning methods often overlook structural imperfections in real-world graphs, such as redundant edges and low-degree sparse nodes. Redundant connections may introduce noise during message passing, while sparse nodes receive insufficient structural information to accurately learn representation, which can degrade detection performance. To overcome above challenges, we propose SAA-GCL, an innovative framework that integrates adaptive structure adversarial augmentation with multi-view contrastive learning. Specifically, by edge weight learning and LMSE loss calculation, our approach adaptively optimizes the structure of the augmented graph, discards redundant edges as much as possible, and retains more discriminating features. For low-degree sparse nodes, we mix their self-networks with the self-networks of auxiliary nodes to improve the representation quality. In order to fully mine abnormal information, we use the multi-view contrastive loss function to distinguish positive and negative sample pairs within the view and maintain cross-view consistency. The framework adaptively refines the graph topology to suppress noisy edges and enhance representations for structurally weak nodes, so it can improve anomaly detection performance in the imbalanced structure attribute graph. Comprehensive experiments on six real-world graph datasets show that SAA-GCL is superior to existing methods in detection accuracy. Our code is open source at <span><span>https://github.com/HZAI-ZJNU/SAAGCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115455"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scene-aware memory discrimination: Deciding which personal knowledge stays 情景感知记忆辨别:决定哪些个人知识保留
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-09 DOI: 10.1016/j.knosys.2026.115496
Yijie Zhong , Mengying Guo , Zewei Wang , Zhongyang Li , Dandan Tu , Haofen Wang
Intelligent devices have become deeply integrated into everyday life, generating vast amounts of user interactions that form valuable personal knowledge. Efficient organization of this knowledge in user memory is essential for enabling personalized applications. However, current research on memory writing, management, and reading using large language models (LLMs) faces challenges in filtering irrelevant information and in dealing with rising computational costs. Inspired by the concept of selective attention in the human brain, we introduce a memory discrimination task. To address large-scale interactions and diverse memory standards in this task, we propose a Scene-Aware Memory Discrimination method (SAMD), which comprises two key components: the Gating Unit Module (GUM) and the Cluster Prompting Module (CPM). GUM enhances processing efficiency by filtering out non-memorable interactions and focusing on the salient content most relevant to application demands. CPM establishes adaptive memory standards, guiding LLMs to discern what information should be remembered or discarded. It also analyzes the relationship between user intents and memory contexts to build effective clustering prompts. Comprehensive direct and indirect evaluations demonstrate the effectiveness and generalization of our approach. We independently assess the performance of memory discrimination, showing that SAMD successfully recalls the majority of memorable data and remains robust in dynamic scenarios. Furthermore, when integrated into personalized applications, SAMD significantly enhances both the efficiency and quality of memory construction, leading to better organization of personal knowledge.
智能设备已经深度融入日常生活,产生了大量的用户交互,形成了宝贵的个人知识。在用户内存中有效地组织这些知识对于启用个性化应用程序至关重要。然而,目前使用大型语言模型(llm)进行内存写入、管理和读取的研究面临着过滤不相关信息和处理不断上升的计算成本的挑战。受人类大脑选择性注意概念的启发,我们引入了一个记忆辨别任务。为了解决该任务中的大规模交互和不同的记忆标准,我们提出了一种场景感知记忆识别方法(SAMD),该方法由两个关键组件组成:门控单元模块(GUM)和集群提示模块(CPM)。GUM通过过滤掉难以记忆的交互并专注于与应用程序需求最相关的突出内容来提高处理效率。CPM建立了自适应记忆标准,指导法学硕士辨别哪些信息应该被记住或丢弃。它还分析用户意图和内存上下文之间的关系,以构建有效的聚类提示。全面的直接和间接评价表明了我们方法的有效性和普遍性。我们独立评估了记忆辨别的性能,表明SAMD成功地回忆了大部分可记忆的数据,并在动态场景中保持鲁棒性。此外,当集成到个性化应用中时,SAMD显著提高了记忆构建的效率和质量,从而更好地组织个人知识。
{"title":"Scene-aware memory discrimination: Deciding which personal knowledge stays","authors":"Yijie Zhong ,&nbsp;Mengying Guo ,&nbsp;Zewei Wang ,&nbsp;Zhongyang Li ,&nbsp;Dandan Tu ,&nbsp;Haofen Wang","doi":"10.1016/j.knosys.2026.115496","DOIUrl":"10.1016/j.knosys.2026.115496","url":null,"abstract":"<div><div>Intelligent devices have become deeply integrated into everyday life, generating vast amounts of user interactions that form valuable personal knowledge. Efficient organization of this knowledge in user memory is essential for enabling personalized applications. However, current research on memory writing, management, and reading using large language models (LLMs) faces challenges in filtering irrelevant information and in dealing with rising computational costs. Inspired by the concept of selective attention in the human brain, we introduce a memory discrimination task. To address large-scale interactions and diverse memory standards in this task, we propose a Scene-Aware Memory Discrimination method (SAMD), which comprises two key components: the Gating Unit Module (GUM) and the Cluster Prompting Module (CPM). GUM enhances processing efficiency by filtering out non-memorable interactions and focusing on the salient content most relevant to application demands. CPM establishes adaptive memory standards, guiding LLMs to discern what information should be remembered or discarded. It also analyzes the relationship between user intents and memory contexts to build effective clustering prompts. Comprehensive direct and indirect evaluations demonstrate the effectiveness and generalization of our approach. We independently assess the performance of memory discrimination, showing that SAMD successfully recalls the majority of memorable data and remains robust in dynamic scenarios. Furthermore, when integrated into personalized applications, SAMD significantly enhances both the efficiency and quality of memory construction, leading to better organization of personal knowledge.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115496"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive multi-modal retrieval for structured semantic trees in engineering documents 工程文档结构化语义树的递归多模态检索
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-01-29 DOI: 10.1016/j.knosys.2026.115433
Fei Li, Xinyu Li, Jinsong Bao
In lifecycle-oriented manufacturing systems, numerous engineering documents with text, tables, and images are continuously produced. Retrieval-augmented generation (RAG) models enhance document retrieval efficiency and adapt to evolving domain knowledge. However, existing methods struggle to achieve accurate cross-modal semantic alignment and high-precision retrieval in engineering documents. To address these limitations, this paper proposes the recursive multi-modal retrieval for structured semantic trees (RMR-SST) method for engineering documents. First, layout analysis extracts multimodal elements and divides metadata into three hierarchical levels: minimal chunks, assembly chunks, and section chunks. Domain rules are then applied to compute inter-section semantic relationships and construct the structured semantic trees (SSTs) of engineering documents. Second, a context-aware multimodal semantic alignment strategy is proposed to embed multimodal metadata chunks and their semantic relationships into a unified vector space, enabling cross-modal semantic alignment of SSTs. Finally, a recursive abstractive multimodal metadata retrieval algorithm is designed to integrate multimodal information across documents at different abstraction levels and to generate multimodal retrieval results. Based on 872 ship-design engineering documents, multiple SSTs were constructed for evaluation. Experiments show that RMR-SST outperforms conventional RAG methods in multimodal retrieval and semantic alignment tasks, achieving a Hit@5 of 88.3% when integrated with the Qwen3–235B model.
在面向生命周期的制造系统中,不断产生大量带有文本、表格和图像的工程文档。检索增强生成(RAG)模型提高了文档检索效率,适应了不断发展的领域知识。然而,现有的方法难以在工程文档中实现准确的跨模态语义对齐和高精度检索。针对这些局限性,本文提出了面向工程文档的递归多模态结构化语义树检索(RMR-SST)方法。首先,进行布局分析,提取多模态元素,并将元数据划分为最小块、装配块和分段块三个层次。然后应用领域规则计算交叉语义关系,构建工程文档的结构化语义树(SSTs)。其次,提出了上下文感知的多模态语义对齐策略,将多模态元数据块及其语义关系嵌入到统一的向量空间中,实现了sst的跨模态语义对齐。最后,设计了一种递归抽象多模态元数据检索算法,用于集成不同抽象层次文档中的多模态信息,生成多模态检索结果。基于872份船舶设计工程文件,构建了多个SSTs进行评价。实验表明,RMR-SST在多模态检索和语义对齐任务上优于传统的RAG方法,与Qwen3-235B模型集成后,准确率达到Hit@5 88.3%。
{"title":"Recursive multi-modal retrieval for structured semantic trees in engineering documents","authors":"Fei Li,&nbsp;Xinyu Li,&nbsp;Jinsong Bao","doi":"10.1016/j.knosys.2026.115433","DOIUrl":"10.1016/j.knosys.2026.115433","url":null,"abstract":"<div><div>In lifecycle-oriented manufacturing systems, numerous engineering documents with text, tables, and images are continuously produced. Retrieval-augmented generation (RAG) models enhance document retrieval efficiency and adapt to evolving domain knowledge. However, existing methods struggle to achieve accurate cross-modal semantic alignment and high-precision retrieval in engineering documents. To address these limitations, this paper proposes the recursive multi-modal retrieval for structured semantic trees (RMR-SST) method for engineering documents. First, layout analysis extracts multimodal elements and divides metadata into three hierarchical levels: minimal chunks, assembly chunks, and section chunks. Domain rules are then applied to compute inter-section semantic relationships and construct the structured semantic trees (SSTs) of engineering documents. Second, a context-aware multimodal semantic alignment strategy is proposed to embed multimodal metadata chunks and their semantic relationships into a unified vector space, enabling cross-modal semantic alignment of SSTs. Finally, a recursive abstractive multimodal metadata retrieval algorithm is designed to integrate multimodal information across documents at different abstraction levels and to generate multimodal retrieval results. Based on 872 ship-design engineering documents, multiple SSTs were constructed for evaluation. Experiments show that RMR-SST outperforms conventional RAG methods in multimodal retrieval and semantic alignment tasks, achieving a Hit@5 of 88.3% when integrated with the Qwen3–235B model.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115433"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transferable multi-level spatial-temporal graph neural network for adaptive multi-agent trajectory prediction 自适应多智能体轨迹预测的可转移多层次时空图神经网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-01-31 DOI: 10.1016/j.knosys.2026.115451
Yu Sun , Dengyu Xiao , Mengdie Huang , Jiali Wang , Chuan Tong , Jun Luo , Huayan Pu
Accurately predicting future multi-agent trajectories at intersections is crucial yet challenging due to complex and dynamic traffic environments. Existing methods struggle with cross-domain trajectory prediction owing to: 1) there are significant differences in spatiotemporal features between domains, which leads to insufficient modeling of trajectory temporal sequence dynamics during cross-domain spatiotemporal alignment; and 2) the strong heterogeneity of behavioral patterns within different datasets causes significant domain shifts, resulting in a notable performance decline when the model is transferred across datasets. To address the aforementioned challenges, this paper proposes a transferable multi-level spatial-temporal graph neural network (T-MLSTG). Based on maximum mean discrepancy theory, we design a windowed mean gradient discrepancy (WMGD) metric that incorporates mean and gradient information of temporal features to better capture cross-domain distribution differences. Furthermore, a multi-level spatial-temporal graph network (MLSTG) is designed with a two-level architecture. The first level encodes historical spatiotemporal features independently, while the second level integrates spatiotemporal features and employs a channel attention mechanism to enhance feature discrimination. The performance of T-MLSTG was evaluated on the inD and INTERACTION datasets. Compared to the baseline model, the cross-domain trajectory prediction results demonstrate a reduction in root mean square error (RMSE) of 0.812. In cross-dataset trajectory prediction evaluation, the mean error was reduced by 27.8%, demonstrating the method’s effectiveness and generalization capability.
由于复杂和动态的交通环境,准确预测未来交叉口的多智能体轨迹至关重要,但也具有挑战性。现有方法在跨域轨迹预测方面存在困难,主要原因是:1)域间时空特征差异较大,导致对跨域轨迹时间序列动态建模不足;2)不同数据集中行为模式的强异质性导致了显著的域迁移,导致模型跨数据集迁移时性能显著下降。为了解决上述问题,本文提出了一种可转移的多层次时空图神经网络(T-MLSTG)。基于最大均值差异理论,设计了一种结合时间特征均值和梯度信息的带窗均值梯度差异度量,以更好地捕捉跨域分布差异。在此基础上,设计了一个具有两层结构的多层次时空图网络。第1层对历史时空特征进行独立编码,第2层对时空特征进行整合,并采用通道注意机制增强特征识别。在inD和INTERACTION数据集上对T-MLSTG的性能进行了评估。与基线模型相比,跨域轨迹预测结果的均方根误差(RMSE)降低了0.812。在跨数据集轨迹预测评价中,平均误差降低了27.8%,证明了该方法的有效性和泛化能力。
{"title":"Transferable multi-level spatial-temporal graph neural network for adaptive multi-agent trajectory prediction","authors":"Yu Sun ,&nbsp;Dengyu Xiao ,&nbsp;Mengdie Huang ,&nbsp;Jiali Wang ,&nbsp;Chuan Tong ,&nbsp;Jun Luo ,&nbsp;Huayan Pu","doi":"10.1016/j.knosys.2026.115451","DOIUrl":"10.1016/j.knosys.2026.115451","url":null,"abstract":"<div><div>Accurately predicting future multi-agent trajectories at intersections is crucial yet challenging due to complex and dynamic traffic environments. Existing methods struggle with cross-domain trajectory prediction owing to: 1) there are significant differences in spatiotemporal features between domains, which leads to insufficient modeling of trajectory temporal sequence dynamics during cross-domain spatiotemporal alignment; and 2) the strong heterogeneity of behavioral patterns within different datasets causes significant domain shifts, resulting in a notable performance decline when the model is transferred across datasets. To address the aforementioned challenges, this paper proposes a transferable multi-level spatial-temporal graph neural network (T-MLSTG). Based on maximum mean discrepancy theory, we design a windowed mean gradient discrepancy (<em>WMGD</em>) metric that incorporates mean and gradient information of temporal features to better capture cross-domain distribution differences. Furthermore, a multi-level spatial-temporal graph network (MLSTG) is designed with a two-level architecture. The first level encodes historical spatiotemporal features independently, while the second level integrates spatiotemporal features and employs a channel attention mechanism to enhance feature discrimination. The performance of T-MLSTG was evaluated on the inD and INTERACTION datasets. Compared to the baseline model, the cross-domain trajectory prediction results demonstrate a reduction in root mean square error (RMSE) of 0.812. In cross-dataset trajectory prediction evaluation, the mean error was reduced by 27.8%, demonstrating the method’s effectiveness and generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115451"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameterized image restoration with diffusion and gradient priors 基于扩散和梯度先验的参数化图像恢复
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-05 DOI: 10.1016/j.knosys.2026.115488
Yang Yang, Xi Zhang, Jiaqi Zhang, Lanling Zeng
The diffusion models have demonstrated remarkable performance on the task of image restoration. Most of the existing image restoration methods leverage the diffusion model as a powerful prior. In this paper, we propose a novel method named PIRP that further integrates the gradient prior, which has been a popular prior in image restoration. The integration harnesses the strengths of both priors, thus being able to enhance the overall efficacy of image restoration. More importantly, the incorporation of the gradient prior improves the flexibility of the method by facilitating parameterized image restoration, i.e., it provides an effective way to tweak the parameters, which is essential in tailoring satisfactory results. Moreover, we propose a novel plug-and-play sampling method based on the proposed model, which is able to improve the image restoration quality without necessitating any retraining. To validate the effectiveness of the proposed method, we have conducted extensive experiments on multiple image restoration tasks, including single-image super-resolution, Gaussian deblurring, motion deblurring, and their noisy variants. Both qualitative and quantitative results on popular datasets demonstrate the advantages of the proposed method.
扩散模型在图像恢复任务中表现出了显著的性能。大多数现有的图像恢复方法利用扩散模型作为一个强大的先验。在本文中,我们提出了一种新的方法,称为PIRP,它进一步整合了梯度先验,这是一种流行的图像恢复先验。这种融合利用了两种先验算法的优势,从而能够提高图像恢复的整体效果。更重要的是,梯度先验的引入通过促进参数化图像恢复提高了方法的灵活性,即它提供了一种有效的调整参数的方法,这对于定制满意的结果至关重要。此外,我们提出了一种新的即插即用采样方法,该方法可以在不需要再训练的情况下提高图像恢复质量。为了验证所提出方法的有效性,我们在多个图像恢复任务上进行了广泛的实验,包括单图像超分辨率、高斯去模糊、运动去模糊及其噪声变体。在常用数据集上的定性和定量结果都证明了该方法的优点。
{"title":"Parameterized image restoration with diffusion and gradient priors","authors":"Yang Yang,&nbsp;Xi Zhang,&nbsp;Jiaqi Zhang,&nbsp;Lanling Zeng","doi":"10.1016/j.knosys.2026.115488","DOIUrl":"10.1016/j.knosys.2026.115488","url":null,"abstract":"<div><div>The diffusion models have demonstrated remarkable performance on the task of image restoration. Most of the existing image restoration methods leverage the diffusion model as a powerful prior. In this paper, we propose a novel method named PIRP that further integrates the gradient prior, which has been a popular prior in image restoration. The integration harnesses the strengths of both priors, thus being able to enhance the overall efficacy of image restoration. More importantly, the incorporation of the gradient prior improves the flexibility of the method by facilitating parameterized image restoration, i.e., it provides an effective way to tweak the parameters, which is essential in tailoring satisfactory results. Moreover, we propose a novel plug-and-play sampling method based on the proposed model, which is able to improve the image restoration quality without necessitating any retraining. To validate the effectiveness of the proposed method, we have conducted extensive experiments on multiple image restoration tasks, including single-image super-resolution, Gaussian deblurring, motion deblurring, and their noisy variants. Both qualitative and quantitative results on popular datasets demonstrate the advantages of the proposed method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115488"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMs For drug-Drug interaction prediction using textual drug descriptors 使用文本药物描述符进行药物-药物相互作用预测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-04 DOI: 10.1016/j.knosys.2026.115486
Gabriele De Vito , Filomena Ferrucci , Athanasios Angelakis
As treatment plans involve more medications, anticipating and preventing drug-drug interactions (DDIs) becomes increasingly important. Such interactions can result in harmful side effects and may reduce therapy effectiveness. Currently, most computational approaches for DDI prediction rely heavily on complex feature engineering and require chemical information to be structured in specific formats to enable accurate detection of potential interactions. This study presents the first investigation of the application of Large Language Models (LLMs) for DDI prediction using drug characteristics expressed solely in free-text form. Specifically, we use SMILES notations, target organisms, and gene associations as inputs in purpose-designed prompts, allowing LLMs to learn the underlying relationships among these descriptors and accordingly predict possible DDIs. We evaluated the performance of 18 distinct LLMs under zero-shot, few-shot, and fine-tuning settings on the DrugBank dataset (version 5.1.12) to identify the most effective paradigm. We then assessed the generalizability of the fine-tuned models on 13 external DDI datasets against well-known machine learning baselines. The results demonstrated that, while zero-shot and few-shot paradigms showed only modest utility, fine-tuned models achieved superior sensitivity while maintaining competitive accuracy and F1-score compared to baselines. Notably, despite its small size, the Phi-3.5 2.7B model attained a sensitivity of 0.978 and an accuracy of 0.919. These findings suggest that computational efficiency and task-specific adaptation are more important than model size in order to capture the complex patterns inherent in drug interactions, and outline a more accessible paradigm for DDI prediction that can be integrated into clinical decision support systems.
随着治疗方案涉及更多的药物,预测和预防药物相互作用(ddi)变得越来越重要。这种相互作用可能导致有害的副作用,并可能降低治疗效果。目前,大多数用于DDI预测的计算方法严重依赖于复杂的特征工程,并且需要以特定格式构建化学信息,以便准确检测潜在的相互作用。本研究首次研究了大型语言模型(LLMs)在DDI预测中的应用,该模型使用仅以自由文本形式表达的药物特征。具体来说,我们使用SMILES符号、目标生物和基因关联作为目的设计提示符的输入,允许llm学习这些描述符之间的潜在关系,并相应地预测可能的ddi。我们在DrugBank数据集(版本5.1.12)上评估了18种不同llm在零射击、少射击和微调设置下的性能,以确定最有效的范式。然后,我们根据众所周知的机器学习基线,在13个外部DDI数据集上评估了微调模型的泛化性。结果表明,虽然零射击和少射击范式仅显示适度的效用,但微调模型在保持竞争精度和f1分数的同时获得了更高的灵敏度。值得注意的是,尽管尺寸较小,但Phi-3.5 2.7B模型的灵敏度为0.978,精度为0.919。这些发现表明,为了捕捉药物相互作用中固有的复杂模式,计算效率和任务特异性适应比模型大小更重要,并概述了可集成到临床决策支持系统的DDI预测更容易获得的范式。
{"title":"LLMs For drug-Drug interaction prediction using textual drug descriptors","authors":"Gabriele De Vito ,&nbsp;Filomena Ferrucci ,&nbsp;Athanasios Angelakis","doi":"10.1016/j.knosys.2026.115486","DOIUrl":"10.1016/j.knosys.2026.115486","url":null,"abstract":"<div><div>As treatment plans involve more medications, anticipating and preventing drug-drug interactions (DDIs) becomes increasingly important. Such interactions can result in harmful side effects and may reduce therapy effectiveness. Currently, most computational approaches for DDI prediction rely heavily on complex feature engineering and require chemical information to be structured in specific formats to enable accurate detection of potential interactions. This study presents the first investigation of the application of Large Language Models (LLMs) for DDI prediction using drug characteristics expressed solely in free-text form. Specifically, we use SMILES notations, target organisms, and gene associations as inputs in purpose-designed prompts, allowing LLMs to learn the underlying relationships among these descriptors and accordingly predict possible DDIs. We evaluated the performance of 18 distinct LLMs under zero-shot, few-shot, and fine-tuning settings on the DrugBank dataset (version 5.1.12) to identify the most effective paradigm. We then assessed the generalizability of the fine-tuned models on 13 external DDI datasets against well-known machine learning baselines. The results demonstrated that, while zero-shot and few-shot paradigms showed only modest utility, fine-tuned models achieved superior sensitivity while maintaining competitive accuracy and F1-score compared to baselines. Notably, despite its small size, the Phi-3.5 2.7B model attained a sensitivity of 0.978 and an accuracy of 0.919. These findings suggest that computational efficiency and task-specific adaptation are more important than model size in order to capture the complex patterns inherent in drug interactions, and outline a more accessible paradigm for DDI prediction that can be integrated into clinical decision support systems.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115486"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating deep clustering and multi-view graph neural networks for recommender system 集成深度聚类和多视图神经网络的推荐系统
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-07 DOI: 10.1016/j.knosys.2026.115449
Jiaxuan Song , Yue Li , Duantengchuan Li , Xiaoguang Wang , Rui Zhang , Hui Zhang , Jinsong Chen
The existing graph neural network recommendation models aggregate neighborhood information by using a weighted sum strategy based on node popularity. However, this strategy struggles to accurately model the impact of item category features on user behavior. To alleviate this problem, we propose MDCRec, a novel graph convolutional recommendation framework integrating deep clustering. MDCRec utilizes the deep clustering module to mine item category information from the item review keyword documents and constructs multi-view subgraphs based on category information. Information aggregation based on node popularity is performed subsequently on each subgraph to obtain the node embeddings within each subgraph. Ultimately, based on the interaction distribution of users in each subgraph, the embeddings within multi-view subgraphs are aggregated into the final embeddings of nodes. MDCRec integrates item category information and user interests across categories into information aggregation, allowing recommendation models to capture more fine-grained relationships between items and user preferences. It can also work in tandem with other performance-enhancing techniques like contrastive learning to further boost model effectiveness. Experimental results on public real-world datasets indicate that most graph neural network recommendation models—including variants that use contrastive learning—integrated with the MDCRec information aggregation framework outperform the original popularity-based version. These models achieve varying degrees of performance gains, with average improvements of 1.75% in Recall@20 and 1.87% in NDCG@20. Our code is publicly available at https://github.com/dacilab/MDCRec.
现有的图神经网络推荐模型采用基于节点人气的加权和策略对邻域信息进行聚合。然而,这种策略很难准确地模拟商品类别特征对用户行为的影响。为了缓解这一问题,我们提出了一种集成深度聚类的新型图卷积推荐框架MDCRec。MDCRec利用深度聚类模块从商品评审关键字文档中挖掘商品品类信息,并基于品类信息构建多视图子图。然后对每个子图进行基于节点流行度的信息聚合,得到每个子图中的节点嵌入。最后,根据每个子图中用户的交互分布,将多视图子图中的嵌入聚合成最终的节点嵌入。MDCRec将项目类别信息和用户兴趣跨类别集成到信息聚合中,允许推荐模型捕获项目和用户偏好之间更细粒度的关系。它还可以与其他性能增强技术(如对比学习)协同工作,以进一步提高模型的有效性。在公开的真实世界数据集上的实验结果表明,与MDCRec信息聚合框架集成的大多数图神经网络推荐模型(包括使用对比学习的变体)优于原始的基于流行度的版本。这些模型实现了不同程度的性能提升,Recall@20和NDCG@20的平均性能提升分别为1.75%和1.87%。我们的代码可以在https://github.com/dacilab/MDCRec上公开获得。
{"title":"Integrating deep clustering and multi-view graph neural networks for recommender system","authors":"Jiaxuan Song ,&nbsp;Yue Li ,&nbsp;Duantengchuan Li ,&nbsp;Xiaoguang Wang ,&nbsp;Rui Zhang ,&nbsp;Hui Zhang ,&nbsp;Jinsong Chen","doi":"10.1016/j.knosys.2026.115449","DOIUrl":"10.1016/j.knosys.2026.115449","url":null,"abstract":"<div><div>The existing graph neural network recommendation models aggregate neighborhood information by using a weighted sum strategy based on node popularity. However, this strategy struggles to accurately model the impact of item category features on user behavior. To alleviate this problem, we propose MDCRec, a novel graph convolutional recommendation framework integrating deep clustering. MDCRec utilizes the deep clustering module to mine item category information from the item review keyword documents and constructs multi-view subgraphs based on category information. Information aggregation based on node popularity is performed subsequently on each subgraph to obtain the node embeddings within each subgraph. Ultimately, based on the interaction distribution of users in each subgraph, the embeddings within multi-view subgraphs are aggregated into the final embeddings of nodes. MDCRec integrates item category information and user interests across categories into information aggregation, allowing recommendation models to capture more fine-grained relationships between items and user preferences. It can also work in tandem with other performance-enhancing techniques like contrastive learning to further boost model effectiveness. Experimental results on public real-world datasets indicate that most graph neural network recommendation models—including variants that use contrastive learning—integrated with the MDCRec information aggregation framework outperform the original popularity-based version. These models achieve varying degrees of performance gains, with average improvements of 1.75% in Recall@20 and 1.87% in NDCG@20. Our code is publicly available at <span><span>https://github.com/dacilab/MDCRec</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115449"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ATARS: Adaptive task-Aware feature learning for Few-Shot Fine-Grained classification ATARS:自适应任务感知特征学习,用于少量细粒度分类
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-04 DOI: 10.1016/j.knosys.2026.115485
Xiaomei Long, Xinyue Wang, Cheng Yang, Zongbo He, Qian He, Xiangdong Chen
Few-shot fine-grained classification is challenging due to subtle inter-class differences and limited annotations. Existing methods often fail to fully exploit task-level information, limiting adaptation to scarce samples. We present ATARS, a task-aware framework that organizes alignment, feature reconstruction, and task-conditioned channel selection into a coordinated pipeline. These components progressively refine task-adaptive feature representations, enhancing intra-class consistency and discriminative capacity. Extensive experiments on five fine-grained benchmarks demonstrate the effectiveness of this design: ATARS achieves 5-way 5-shot accuracies of 97.38% on Cars, 94.40% on CUB, and 89.78% on Dogs, consistently outperforming previous reconstruction-based and task-aware approaches. The results highlight the benefits of coordinated component design under task-aware guidance in few-shot scenarios. The source code is available at: https://github.com/lxm-hjk/ATARS-FSL.
由于微妙的类间差异和有限的注释,少量的细粒度分类是具有挑战性的。现有的方法往往不能充分利用任务级信息,限制了对稀缺样本的适应。我们提出了ATARS,一个任务感知框架,将对齐、特征重建和任务条件通道选择组织到一个协调的管道中。这些组件逐步细化任务自适应特征表示,增强类内一致性和判别能力。在五个细粒度基准测试上进行的大量实验证明了这种设计的有效性:ATARS在Cars上实现了97.38%的5-way 5-shot准确率,在CUB上实现了94.40%,在Dogs上实现了89.78%,始终优于以前基于重建和任务感知的方法。结果强调了在任务感知指导下的协同组件设计在少数射击场景下的好处。源代码可从https://github.com/lxm-hjk/ATARS-FSL获得。
{"title":"ATARS: Adaptive task-Aware feature learning for Few-Shot Fine-Grained classification","authors":"Xiaomei Long,&nbsp;Xinyue Wang,&nbsp;Cheng Yang,&nbsp;Zongbo He,&nbsp;Qian He,&nbsp;Xiangdong Chen","doi":"10.1016/j.knosys.2026.115485","DOIUrl":"10.1016/j.knosys.2026.115485","url":null,"abstract":"<div><div>Few-shot fine-grained classification is challenging due to subtle inter-class differences and limited annotations. Existing methods often fail to fully exploit task-level information, limiting adaptation to scarce samples. We present ATARS, a task-aware framework that organizes alignment, feature reconstruction, and task-conditioned channel selection into a coordinated pipeline. These components progressively refine task-adaptive feature representations, enhancing intra-class consistency and discriminative capacity. Extensive experiments on five fine-grained benchmarks demonstrate the effectiveness of this design: ATARS achieves 5-way 5-shot accuracies of 97.38% on Cars, 94.40% on CUB, and 89.78% on Dogs, consistently outperforming previous reconstruction-based and task-aware approaches. The results highlight the benefits of coordinated component design under task-aware guidance in few-shot scenarios. The source code is available at: <span><span>https://github.com/lxm-hjk/ATARS-FSL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115485"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BDGKT: Bidirectional dynamic graph knowledge tracing BDGKT:双向动态图知识跟踪
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-10 DOI: 10.1016/j.knosys.2026.115532
Xinjia Ou , Tao Huang , Shengze Hu , Huali Yang , Zhuoran Xu , Junjie Hu , Jing Geng
Knowledge tracing (KT) aims to model the evolution of students’ knowledge states by analyzing their historical learning trajectories and predicting future performance. However, current KT methods primarily focus on unidirectional relationship modeling, overlooking the bidirectional dynamic interaction mechanisms between learners and questions. Student knowledge states shape question adaptability through group patterns (e.g., difficulty calibration), whereas dynamic transformation of question features provides progressive guidance signals for knowledge advancement across learning stages. In this study, we propose a novel bidirectional dynamic graph KT (BDGKT) method for modeling the information flow between students and questions while capturing knowledge state evolution and question characteristic transformation. Specifically, we first introduce a dynamic graph construction based on homogeneous student groups that uses a spatiotemporal constraint strategy to reduce computational costs while improving information propagation quality. Subsequently, we design a bidirectional message propagation mechanism to capture time-evolving bidirectional dynamic signals. To update question nodes (from students to questions), we introduce a state-aware attention mechanism that aggregates student nodes and responses, revealing group-level question commonalities. By contrast, to update student nodes (from questions to students), we propose an evolution mechanism that aggregates question nodes and responses based on timestamps, allowing us to track the evolution of student knowledge states. Extensive experiments on four real-world datasets validate the effectiveness and compatibility of our method. Furthermore, BDGKT improves interpretability by exploring question absolute information (group-agnostic) and relative information (group-dependent).
知识追踪(Knowledge tracing, KT)旨在通过分析学生的历史学习轨迹和预测学生的未来表现,来模拟学生知识状态的演变。然而,目前的KT方法主要侧重于单向关系建模,忽略了学习者与问题之间的双向动态交互机制。学生的知识状态通过群体模式(如难度校准)塑造问题的适应性,而问题特征的动态转换为跨学习阶段的知识进步提供了渐进式的指导信号。在本研究中,我们提出了一种新的双向动态图KT (BDGKT)方法来建模学生与问题之间的信息流,同时捕捉知识状态演变和问题特征转换。具体来说,我们首先引入了一种基于同质学生群体的动态图构建方法,该方法使用时空约束策略来降低计算成本,同时提高信息传播质量。随后,我们设计了一种双向消息传播机制来捕获随时间变化的双向动态信号。为了更新问题节点(从学生到问题),我们引入了一个状态感知的关注机制,该机制聚合了学生节点和响应,揭示了组级问题的共性。相比之下,为了更新学生节点(从问题到学生),我们提出了一种基于时间戳聚合问题节点和响应的进化机制,使我们能够跟踪学生知识状态的演变。在四个真实数据集上的大量实验验证了我们方法的有效性和兼容性。此外,BDGKT通过探索问题绝对信息(群体不可知)和相对信息(群体依赖)来提高可解释性。
{"title":"BDGKT: Bidirectional dynamic graph knowledge tracing","authors":"Xinjia Ou ,&nbsp;Tao Huang ,&nbsp;Shengze Hu ,&nbsp;Huali Yang ,&nbsp;Zhuoran Xu ,&nbsp;Junjie Hu ,&nbsp;Jing Geng","doi":"10.1016/j.knosys.2026.115532","DOIUrl":"10.1016/j.knosys.2026.115532","url":null,"abstract":"<div><div>Knowledge tracing (KT) aims to model the evolution of students’ knowledge states by analyzing their historical learning trajectories and predicting future performance. However, current KT methods primarily focus on unidirectional relationship modeling, overlooking the bidirectional dynamic interaction mechanisms between learners and questions. Student knowledge states shape question adaptability through group patterns (e.g., difficulty calibration), whereas dynamic transformation of question features provides progressive guidance signals for knowledge advancement across learning stages. In this study, we propose a novel bidirectional dynamic graph KT (BDGKT) method for modeling the information flow between students and questions while capturing knowledge state evolution and question characteristic transformation. Specifically, we first introduce a dynamic graph construction based on homogeneous student groups that uses a spatiotemporal constraint strategy to reduce computational costs while improving information propagation quality. Subsequently, we design a bidirectional message propagation mechanism to capture time-evolving bidirectional dynamic signals. To update question nodes (from students to questions), we introduce a state-aware attention mechanism that aggregates student nodes and responses, revealing group-level question commonalities. By contrast, to update student nodes (from questions to students), we propose an evolution mechanism that aggregates question nodes and responses based on timestamps, allowing us to track the evolution of student knowledge states. Extensive experiments on four real-world datasets validate the effectiveness and compatibility of our method. Furthermore, BDGKT improves interpretability by exploring question absolute information (group-agnostic) and relative information (group-dependent).</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115532"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emo-STCapsnet: A spatio-temporal modeling approach with enhanced CapsNet for speech emotion recognition Emo-STCapsnet:一种基于增强CapsNet的时空建模方法,用于语音情感识别
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-03 DOI: 10.1016/j.knosys.2026.115447
Yonghong Fan , Heming Huang , Huiyun Zhang , Ziqi Zhou
Speech emotion recognition (SER) aims to enable computers to accurately identify emotional states embedded in speech signals, a critical area in human-computer interaction. Effective spatio-temporal feature extraction, which captures consistent emotional patterns while minimizing inter-emotion variability, is critical for SER. However, existing approaches often fall short in learning comprehensive spatio-temporal features. To address this, Emo-STCapsNet, a spatio-temporal modeling approach with enhanced capsule network, is proposed. It integrates four components: a temporal dynamic activation block to capture multi-scale temporal variations, a two-stream attentive fusion for past and future context integration to establish global emotional representations, a convolutional block for high-level feature abstraction from the bidirectional temporal representations, and an attention-enhanced CapsNet that leverages vectorized entity representations and dynamic routing mechanisms to more effectively capture hierarchical spatial relationships among emotional features compared to conventional methods like CNNs. Experimental results on the benchmark SER datasets IEMOCAP, EMODB, and CASIA demonstrate the superior performance of Emo-STCapsNet, achieving accuracies of 71.86%, 93.46%, and 87.92%, respectively. Comparative results highlight the superiority of Emo-STCapsNet approach over other methods. Extensive ablation studies further validate the effectiveness of the architecture of Emo-STCapsNet and underscore the necessity of comprehensive spatio-temporal feature learning in SER.
语音情绪识别(SER)旨在使计算机能够准确识别嵌入语音信号中的情绪状态,这是人机交互的一个关键领域。有效的时空特征提取,既能捕获一致的情绪模式,又能最大限度地减少情绪间的可变性,对SER至关重要。然而,现有的方法在学习综合时空特征方面往往存在不足。为了解决这个问题,Emo-STCapsNet提出了一种具有增强胶囊网络的时空建模方法。它由四个部分组成:一个用于捕获多尺度时间变化的时间动态激活块,一个用于过去和未来情境整合的两流关注融合以建立全局情感表征,一个用于从双向时间表征中提取高级特征的卷积块,以及一个注意力增强的CapsNet,它利用矢量化实体表示和动态路由机制,与cnn等传统方法相比,更有效地捕捉情感特征之间的层次空间关系。在基准SER数据集IEMOCAP、EMODB和CASIA上的实验结果表明,Emo-STCapsNet具有优异的性能,准确率分别达到71.86%、93.46%和87.92%。对比结果表明Emo-STCapsNet方法优于其他方法。大量的消融研究进一步验证了Emo-STCapsNet架构的有效性,并强调了在SER中进行全面的时空特征学习的必要性。
{"title":"Emo-STCapsnet: A spatio-temporal modeling approach with enhanced CapsNet for speech emotion recognition","authors":"Yonghong Fan ,&nbsp;Heming Huang ,&nbsp;Huiyun Zhang ,&nbsp;Ziqi Zhou","doi":"10.1016/j.knosys.2026.115447","DOIUrl":"10.1016/j.knosys.2026.115447","url":null,"abstract":"<div><div>Speech emotion recognition (SER) aims to enable computers to accurately identify emotional states embedded in speech signals, a critical area in human-computer interaction. Effective spatio-temporal feature extraction, which captures consistent emotional patterns while minimizing inter-emotion variability, is critical for SER. However, existing approaches often fall short in learning comprehensive spatio-temporal features. To address this, Emo-STCapsNet, a spatio-temporal modeling approach with enhanced capsule network, is proposed. It integrates four components: a temporal dynamic activation block to capture multi-scale temporal variations, a two-stream attentive fusion for past and future context integration to establish global emotional representations, a convolutional block for high-level feature abstraction from the bidirectional temporal representations, and an attention-enhanced CapsNet that leverages vectorized entity representations and dynamic routing mechanisms to more effectively capture hierarchical spatial relationships among emotional features compared to conventional methods like CNNs. Experimental results on the benchmark SER datasets IEMOCAP, EMODB, and CASIA demonstrate the superior performance of Emo-STCapsNet, achieving accuracies of 71.86%, 93.46%, and 87.92%, respectively. Comparative results highlight the superiority of Emo-STCapsNet approach over other methods. Extensive ablation studies further validate the effectiveness of the architecture of Emo-STCapsNet and underscore the necessity of comprehensive spatio-temporal feature learning in SER.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115447"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acta Geophys. GEOL BELG ARCT ANTARCT ALP RES Conserv. Biol. Appl. Geochem. Aust. J. Earth Sci. Acta Oceanolog. Sin. Geostand. Geoanal. Res. EUR PHYS J-SPEC TOP ARCH ACOUST ASTRON ASTROPHYS J. Afr. Earth. Sci. ACTA PETROL SIN Geosci. Model Dev. 2005 Asian Conference on Sensors and the International Conference on New Techniques in Pharmaceutical and Biomedical Research J. Geog. Sci. J. Clim. Hydrogeol. J. Conserv. Genet. Resour. ENG SANIT AMBIENT 2011 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring Int. J. Biometeorol. EPL-EUROPHYS LETT ECOSYSTEMS 2012 IEEE International Conference on Oxide Materials for Electronic Engineering (OMEE) J. Hydrol. Environ. Prog. Sustainable Energy ACTA GEOL SIN-ENGL Environ. Technol. Innovation ATMOSPHERE-BASEL Phys. Chem. Miner. ACTA DIABETOL Appl. Phys. Rev. Geosci. Front. Front. Phys. Clim. Change ACTA CHIR ORTHOP TR ACTA NEUROL BELG Polar Sci. ACTA MED OKAYAMA 2013 IEEE 39th Photovoltaic Specialists Conference (PVSC) Hydrol. Earth Syst. Sci. J NONLINEAR OPT PHYS ARCHAEOMETRY 2013 IEEE International Conference on Computer Vision J. Math. Phys. Chin. Phys. C Adv. Atmos. Sci. J. Earth Syst. Sci. Vadose Zone J. ADV CHRONIC KIDNEY D ACTA REUMATOL PORT Quat. Sci. Rev. Terra Nova Chin. Phys. Lett. Ocean and Polar Research 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) Nat. Geosci. J. Atmos. Sol. Terr. Phys. Leading Edge Laser Phys. Lett. Org. Geochem. Ecol. Processes 2011 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference Environ. Prot. Eng. Geochem. Int. Jpn. J. Appl. Phys. [Hokkaido igaku zasshi] The Hokkaido journal of medical science Ann. Glaciol. 1997 IEEE Ultrasonics Symposium Proceedings. An International Symposium (Cat. No.97CH36118) Atmos. Meas. Tech. 2009 International Workshop on Intelligent Systems and Applications Int. J. Earth Sci. Contrib. Plasma Phys. European Journal of Chemistry American Journal of Men's Health Chem. Ecol. J OPT SOC AM A Zhongguo Yi Miao He Mian Yi ENVIRONMENT PALAEOGEOGR PALAEOCL Solid Earth 2013 Abstracts IEEE International Conference on Plasma Science (ICOPS) Environ. Pollut. Bioavailability ECOLOGY Geol. Ore Deposits Environmental Epigenetics OPT LASER TECHNOL Biomedicine (India) Alcohol Alcohol. NUCL INSTRUM METH A 2012 Symposium on VLSI Circuits (VLSIC) ACTA GEOL POL Contrib. Mineral. Petrol. IZV-PHYS SOLID EART+ Aquat. Geochem. J. Adv. Model. Earth Syst. Geochim. Cosmochim. Acta Archaeol. Anthropol. Sci. Condens. Matter Phys.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1