首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
Decoupled spatial-temporal predicting model for weakly supervised action localization 弱监督动作定位的解耦时空预测模型
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.knosys.2025.115241
Guiqin Wang , Peng Zhao , Xiang Wang , Xin An , Qian Zhang , Shusen Yang , Qinghai Guo
Weakly-supervised action localization (WSAL) aims to identify and temporally localize action instances in untrimmed videos using only video-level labels. Existing methods (e.g., multi-instance learning, pseudo-label learning) primarily focus on leveraging coarse-grained category labels to infer fine-grained action intervals. With the rise of vision-language models, integrating language modalities has become a promising direction for WSAL, focusing on leveraging class-based supervision to enhance the localization of fine-grained action instances. However, current methods largely concentrate on mapping class labels to visual representations, rather than capturing the semantic relationships within action sequences themselves. To address these issues, we propose a novel framework that aligns vision-semantic and category-semantic knowledge for fine-grained action localization. Specifically, we introduce a residual hierarchical spatial-temporal predictive model to learn the temporal variations of feature semantics, which, when combined with a class-based supervision module, enables more accurate action localization. Extensive experiments and ablation studies demonstrate that our method consistently surpasses all previous state-of-the-art approaches.
弱监督动作定位(WSAL)旨在仅使用视频级别标签识别和临时定位未修剪视频中的动作实例。现有的方法(例如,多实例学习、伪标签学习)主要侧重于利用粗粒度的类别标签来推断细粒度的动作间隔。随着视觉语言模型的兴起,集成语言模式已经成为WSAL的一个有前途的方向,重点是利用基于类的监督来增强细粒度操作实例的本地化。然而,当前的方法主要集中于将类标签映射到视觉表示,而不是捕获动作序列本身的语义关系。为了解决这些问题,我们提出了一个新的框架,该框架将视觉语义和类别语义知识结合起来,用于细粒度的动作定位。具体来说,我们引入了残差分层时空预测模型来学习特征语义的时间变化,当与基于类的监督模块相结合时,可以实现更准确的动作定位。广泛的实验和消融研究表明,我们的方法始终超越所有以前的最先进的方法。
{"title":"Decoupled spatial-temporal predicting model for weakly supervised action localization","authors":"Guiqin Wang ,&nbsp;Peng Zhao ,&nbsp;Xiang Wang ,&nbsp;Xin An ,&nbsp;Qian Zhang ,&nbsp;Shusen Yang ,&nbsp;Qinghai Guo","doi":"10.1016/j.knosys.2025.115241","DOIUrl":"10.1016/j.knosys.2025.115241","url":null,"abstract":"<div><div>Weakly-supervised action localization (WSAL) aims to identify and temporally localize action instances in untrimmed videos using only video-level labels. Existing methods (e.g., multi-instance learning, pseudo-label learning) primarily focus on leveraging coarse-grained category labels to infer fine-grained action intervals. With the rise of vision-language models, integrating language modalities has become a promising direction for WSAL, focusing on leveraging class-based supervision to enhance the localization of fine-grained action instances. However, current methods largely concentrate on mapping class labels to visual representations, rather than capturing the semantic relationships within action sequences themselves. To address these issues, we propose a novel framework that aligns vision-semantic and category-semantic knowledge for fine-grained action localization. Specifically, we introduce a residual hierarchical spatial-temporal predictive model to learn the temporal variations of feature semantics, which, when combined with a class-based supervision module, enables more accurate action localization. Extensive experiments and ablation studies demonstrate that our method consistently surpasses all previous state-of-the-art approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115241"},"PeriodicalIF":7.6,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing user cold-start recommendation with graph structures and semantic dependencies 通过图结构和语义依赖增强用户冷启动推荐
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.knosys.2026.115280
Li Zou , Qi Chen , Zhiying Deng , Guohui Li
Graph-based Collaborative Filtering (GCF) has achieved remarkable progress in recommendation systems by modeling high-order user-item relationships. However, traditional GCF methods are “transductive” – they rely on predefined embeddings and therefore cannot handle new users without complete retraining. While recent inductive approaches incorporate basic graph signals to accommodate new users, they often struggle to generate personalized representations for users with extremely sparse interactions, as their aggregation mechanisms can be sensitive to noisy signals and suboptimally capture complex relational semantics. To address this, we propose the Graph Structure and Semantic Dependency (GSSD) framework, a novel approach that enhances cold-start user representation through multifaceted signal integration. First, we employ a curriculum learning strategy to build robust initial embeddings. The training begins by focusing on high-popularity users by assigning them larger embedding norms and progressively shifts to include sparse users, enhancing overall model stability. Crucially, this curriculum-based refinement establishes a robust foundation of user embeddings, enabling the propagation of rich, personalized semantics across a user-user dependency graph. To mitigate popularity bias within this graph, edge weights are computed by subtracting the global mean item vector from each user’s profile before applying a Cosine similarity metric. Recognizing that neighborhood aggregation alone risks over-smoothing user representations and blurring their unique profiles, we employ a hybrid training objective. This objective combines BRP loss with a layer-wise contrastive learning loss to explicitly preserve high-order structural information and enforce representation uniqueness. Additionally, an adaptive task-weighting strategy dynamically balances loss components, improving optimization efficiency. Extensive experiments demonstrate that GSSD consistently outperforms state-of-the-art baselines in both standard (transductive) and cold-start (inductive) settings. In inductive scenarios, it achieves an average improvement of over 8% in Recall and NDCG, effectively enriching sparse user representations. These results indicate that our approach not only successfully alleviates cold-start problems but can be readily adopted in various recommendation scenarios.
基于图的协同过滤(GCF)通过对高阶用户-物品关系进行建模,在推荐系统中取得了显著的进展。然而,传统的GCF方法是“可转换的”——它们依赖于预定义的嵌入,因此如果没有完全的再培训,就无法处理新用户。虽然最近的归纳方法结合了基本的图信号来适应新用户,但它们通常难以为具有极其稀疏交互的用户生成个性化表示,因为它们的聚合机制可能对噪声信号敏感,并且在捕获复杂关系语义方面不够理想。为了解决这个问题,我们提出了图结构和语义依赖(GSSD)框架,这是一种通过多方面信号集成增强冷启动用户表示的新方法。首先,我们采用课程学习策略来构建稳健的初始嵌入。训练从关注高人气用户开始,通过分配更大的嵌入规范,逐步转移到包括稀疏用户,提高整体模型的稳定性。至关重要的是,这种基于课程的细化建立了用户嵌入的坚实基础,使丰富的个性化语义能够在用户-用户依赖关系图中传播。为了减轻此图中的流行度偏差,在应用余弦相似度度量之前,通过从每个用户的个人资料中减去全局平均项目向量来计算边缘权重。认识到邻域聚合单独存在过度平滑用户表示和模糊其独特轮廓的风险,我们采用混合训练目标。该目标将BRP损失与分层对比学习损失相结合,以明确地保留高阶结构信息并强制表示唯一性。此外,自适应任务加权策略动态平衡损失分量,提高优化效率。大量实验表明,GSSD在标准(传感器)和冷启动(感应)设置中始终优于最先进的基线。在归纳场景下,它在Recall和NDCG方面的平均提升超过8%,有效地丰富了稀疏的用户表示。这些结果表明,我们的方法不仅成功地缓解了冷启动问题,而且可以很容易地用于各种推荐场景。
{"title":"Enhancing user cold-start recommendation with graph structures and semantic dependencies","authors":"Li Zou ,&nbsp;Qi Chen ,&nbsp;Zhiying Deng ,&nbsp;Guohui Li","doi":"10.1016/j.knosys.2026.115280","DOIUrl":"10.1016/j.knosys.2026.115280","url":null,"abstract":"<div><div>Graph-based Collaborative Filtering (GCF) has achieved remarkable progress in recommendation systems by modeling high-order user-item relationships. However, traditional GCF methods are “transductive” – they rely on predefined embeddings and therefore cannot handle new users without complete retraining. While recent inductive approaches incorporate basic graph signals to accommodate new users, they often struggle to generate personalized representations for users with extremely sparse interactions, as their aggregation mechanisms can be sensitive to noisy signals and suboptimally capture complex relational semantics. To address this, we propose the Graph Structure and Semantic Dependency (GSSD) framework, a novel approach that enhances cold-start user representation through multifaceted signal integration. First, we employ a curriculum learning strategy to build robust initial embeddings. The training begins by focusing on high-popularity users by assigning them larger embedding norms and progressively shifts to include sparse users, enhancing overall model stability. Crucially, this curriculum-based refinement establishes a robust foundation of user embeddings, enabling the propagation of rich, personalized semantics across a user-user dependency graph. To mitigate popularity bias within this graph, edge weights are computed by subtracting the global mean item vector from each user’s profile before applying a Cosine similarity metric. Recognizing that neighborhood aggregation alone risks over-smoothing user representations and blurring their unique profiles, we employ a hybrid training objective. This objective combines BRP loss with a layer-wise contrastive learning loss to explicitly preserve high-order structural information and enforce representation uniqueness. Additionally, an adaptive task-weighting strategy dynamically balances loss components, improving optimization efficiency. Extensive experiments demonstrate that GSSD consistently outperforms state-of-the-art baselines in both standard (transductive) and cold-start (inductive) settings. In inductive scenarios, it achieves an average improvement of over 8% in Recall and NDCG, effectively enriching sparse user representations. These results indicate that our approach not only successfully alleviates cold-start problems but can be readily adopted in various recommendation scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115280"},"PeriodicalIF":7.6,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental extraction of bespoke association rules 定制关联规则的增量提取
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.knosys.2025.115127
Eung-Hee Kim , Hong-Gee Kim , Suk-Hyung Hwang
Association rule mining is a fundamental technique for discovering co-occurrence patterns in transactional databases. However, conventional methods often struggle to capture complex user preferences, limiting their utility in settings that require expressive antecedents and context-aware consequents. We introduce the lode model, a hierarchical framework that incorporates user-defined preferences and explicitly represents relationships among association rules. The term “lode” metaphorically evokes a rich vein of valuable material, reflecting our goal of incrementally uncovering informative patterns embedded in large-scale transactional data. Central to the framework is the excavation algorithm, which incrementally extracts informative consequents and leverages parent–child relationships to improve scalability and interpretability. By combining support and confidence with user-specific constraints, the algorithm discovers bespoke association rules that are tailored to diverse analytical needs. Experiments on multiple real-world datasets demonstrate the computational efficiency, scalability, and ability of the framework to reveal patterns frequently overlooked by conventional methods. The excavation algorithm robustly handles datasets of varying size and sparsity, scales with data complexity, and yields actionable insights into user behavior. Overall, the lode model and excavation algorithm provide a principled and effective foundation for preference-driven association rule mining, with broad applicability to recommendation systems, behavioral modeling, and related domains.
关联规则挖掘是在事务性数据库中发现共现模式的基本技术。然而,传统的方法往往难以捕捉复杂的用户偏好,限制了它们在需要表达性前提和上下文感知结果的设置中的效用。我们引入了矿脉模型,这是一个包含用户定义偏好并显式表示关联规则之间关系的分层框架。术语“矿脉”隐喻地唤起了有价值材料的丰富矿脉,反映了我们逐步发现嵌入在大规模事务数据中的信息模式的目标。该框架的核心是挖掘算法,它增量地提取信息结果,并利用父子关系来提高可伸缩性和可解释性。通过将支持和信任与用户特定的约束相结合,算法可以发现定制的关联规则,这些规则可以根据不同的分析需求进行定制。在多个真实数据集上的实验证明了该框架的计算效率、可伸缩性和揭示常规方法经常忽略的模式的能力。挖掘算法健壮地处理不同大小和稀疏度的数据集,随数据复杂性扩展,并产生对用户行为的可操作见解。总的来说,矿脉模型和挖掘算法为偏好驱动的关联规则挖掘提供了一个有原则和有效的基础,在推荐系统、行为建模和相关领域具有广泛的适用性。
{"title":"Incremental extraction of bespoke association rules","authors":"Eung-Hee Kim ,&nbsp;Hong-Gee Kim ,&nbsp;Suk-Hyung Hwang","doi":"10.1016/j.knosys.2025.115127","DOIUrl":"10.1016/j.knosys.2025.115127","url":null,"abstract":"<div><div>Association rule mining is a fundamental technique for discovering co-occurrence patterns in transactional databases. However, conventional methods often struggle to capture complex user preferences, limiting their utility in settings that require expressive antecedents and context-aware consequents. We introduce the <em><strong>lode model</strong></em>, a hierarchical framework that incorporates user-defined preferences and explicitly represents relationships among association rules. The term “lode” metaphorically evokes a rich vein of valuable material, reflecting our goal of incrementally uncovering informative patterns embedded in large-scale transactional data. Central to the framework is the <em><strong>excavation algorithm</strong></em>, which incrementally extracts informative consequents and leverages parent–child relationships to improve scalability and interpretability. By combining support and confidence with user-specific constraints, the algorithm discovers <em><strong>bespoke association rules</strong></em> that are tailored to diverse analytical needs. Experiments on multiple real-world datasets demonstrate the computational efficiency, scalability, and ability of the framework to reveal patterns frequently overlooked by conventional methods. The excavation algorithm robustly handles datasets of varying size and sparsity, scales with data complexity, and yields actionable insights into user behavior. Overall, the lode model and excavation algorithm provide a principled and effective foundation for preference-driven association rule mining, with broad applicability to recommendation systems, behavioral modeling, and related domains.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115127"},"PeriodicalIF":7.6,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Category knowledge distillation for MNER under unified cross-modal prompting 统一跨模态提示下MNER的类别知识提取
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.knosys.2026.115269
Enping Li, Jitong Lei, Li Lu, Tianrui Li
Multimodal Named Entity Recognition (MNER) aims to identify and classify named entities by jointly leveraging textual and visual information. Despite significant progress, existing MNER methods often focus on how to align and fuse textual and visual modalities effectively, with limited attention to enhancing recognition through external knowledge. In view of this, we introduce Category Knowledge Distillation for MNER under Unified Cross-Modal Prompting (CKD-UCMP), a novel framework that leverages prompts to guide the category knowledge of pre-trained models, and further utilizes this knowledge to enhance MNER. Specifically, we design a unified cross-modal prompting mechanism that integrates extended categories with random vectors to guide the CLIP model in extracting category knowledge from both textual and image modalities. Subsequently, a teacher-student distillation strategy is employed to utilize this knowledge to enhance MNER. Although a few studies have explored strategies to enhance MNER with external knowledge, their approaches typically rely on directly concatenating such knowledge into the model’s input, without incorporating necessary noise filtering steps and thus introducing excessive irrelevant information. In contrast, our method dynamically refines category knowledge during training and distills it at the feature level into both visual and textual representations, thereby minimizing the introduction of noise to the greatest extent. We extensively evaluate the proposed method on public datasets, and the results demonstrate the superiority of our CKD-UCMP approach.
多模态命名实体识别(Multimodal Named Entity Recognition, MNER)旨在通过联合利用文本和视觉信息对命名实体进行识别和分类。尽管取得了重大进展,但现有的MNER方法往往侧重于如何有效地对齐和融合文本和视觉模式,而对通过外部知识增强识别的关注有限。鉴于此,我们在统一跨模态提示(CKD-UCMP)下引入了面向MNER的类别知识提炼,这是一个利用提示来指导预训练模型的类别知识,并进一步利用这些知识来增强MNER的新框架。具体而言,我们设计了一个统一的跨模态提示机制,该机制将扩展类别与随机向量集成在一起,以指导CLIP模型从文本和图像两种模态中提取类别知识。随后,采用师生蒸馏策略来利用这些知识来提高MNER。尽管有一些研究探索了利用外部知识增强MNER的策略,但它们的方法通常依赖于将这些知识直接连接到模型的输入中,而没有纳入必要的噪声过滤步骤,从而引入了过多的不相关信息。相比之下,我们的方法在训练过程中动态地提炼类别知识,并在特征级别将其提炼为视觉和文本表示,从而最大限度地减少噪声的引入。我们在公共数据集上广泛评估了所提出的方法,结果证明了我们的CKD-UCMP方法的优越性。
{"title":"Category knowledge distillation for MNER under unified cross-modal prompting","authors":"Enping Li,&nbsp;Jitong Lei,&nbsp;Li Lu,&nbsp;Tianrui Li","doi":"10.1016/j.knosys.2026.115269","DOIUrl":"10.1016/j.knosys.2026.115269","url":null,"abstract":"<div><div>Multimodal Named Entity Recognition (MNER) aims to identify and classify named entities by jointly leveraging textual and visual information. Despite significant progress, existing MNER methods often focus on how to align and fuse textual and visual modalities effectively, with limited attention to enhancing recognition through external knowledge. In view of this, we introduce Category Knowledge Distillation for MNER under Unified Cross-Modal Prompting (CKD-UCMP), a novel framework that leverages prompts to guide the category knowledge of pre-trained models, and further utilizes this knowledge to enhance MNER. Specifically, we design a unified cross-modal prompting mechanism that integrates extended categories with random vectors to guide the CLIP model in extracting category knowledge from both textual and image modalities. Subsequently, a teacher-student distillation strategy is employed to utilize this knowledge to enhance MNER. Although a few studies have explored strategies to enhance MNER with external knowledge, their approaches typically rely on directly concatenating such knowledge into the model’s input, without incorporating necessary noise filtering steps and thus introducing excessive irrelevant information. In contrast, our method dynamically refines category knowledge during training and distills it at the feature level into both visual and textual representations, thereby minimizing the introduction of noise to the greatest extent. We extensively evaluate the proposed method on public datasets, and the results demonstrate the superiority of our CKD-UCMP approach.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115269"},"PeriodicalIF":7.6,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAR-GCN: A meta-action refinement graph convolutional network for skeleton-based human action recognition MAR-GCN:基于骨架的人体动作识别的元动作细化图卷积网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.knosys.2026.115261
Qiannan Guo , Weiran Li , Mina Han , Zhenbo Li
Graph Convolutional Networks (GCNs) have been widely adopted in skeleton-based human action recognition, demonstrating remarkable performance. However, while existing GCN-based approaches effectively model the hierarchical relationships within the human skeletal framework across multiple scales, their performance remains constrained by the lack of semantic information guidance. This limitation particularly affects the feature representation of actions that require coordinated movement of identical body parts (e.g., clapping and rub two hands) or involve subtle temporal variations (e.g., reading and writing). To address these issues, we propose a novel Meta-Action Refinement model based on Graph Convolutional Network (MAR-GCN), which constructs cross-parts human meta-actions via semantic hyperedges and refines the features in multiple dimensions to achieve optimal action representation. Specifically, the model mainly composed of a Spatial Semantic feature Aggregation Graph Convolution (SSA-GC) module and a data-driven Triple Attention Cascade (TAC) module. The SSA-GC module is designed to capture spatial semantic features of the constructed human meta-actions, effectively modeling long-range dependencies among cross-part joints. The TAC module employs a data-driven strategy to explore and refine features across spatial, temporal, and channel dimensions, thereby facilitating the learning of more discriminative features for individual action samples. Extensive experiments show that the proposed MAR-GCN achieves performance comparable to state-of-the-art methods, with accuracy scores of 96.5% on NTU-RGB+D, 90.3% on NTU-RGB+D 120, and 96.6% on NW-UCLA.
图卷积网络(GCNs)在基于骨骼的人体动作识别中得到了广泛的应用,表现出了显著的性能。然而,尽管现有的基于遗传神经网络的方法可以有效地跨多个尺度对人体骨骼框架内的层次关系进行建模,但它们的性能仍然受到缺乏语义信息指导的限制。这一限制尤其影响到需要相同身体部位协调运动的动作的特征表示(例如,拍手和搓双手)或涉及微妙的时间变化(例如,阅读和写作)。为了解决这些问题,我们提出了一种基于图卷积网络(MAR-GCN)的元动作优化模型,该模型通过语义超边缘构建跨部分的人类元动作,并在多个维度上对特征进行优化,以实现最佳的动作表示。具体来说,该模型主要由空间语义特征聚合图卷积(SSA-GC)模块和数据驱动的三重注意级联(TAC)模块组成。SSA-GC模块旨在捕获构建的人元动作的空间语义特征,有效地建模跨部分关节之间的远程依赖关系。TAC模块采用数据驱动的策略来探索和细化跨空间、时间和通道维度的特征,从而促进对单个动作样本的更多判别特征的学习。大量实验表明,所提出的MAR-GCN在NTU-RGB+D上的准确率为96.5%,在NTU-RGB+D 120上的准确率为90.3%,在NW-UCLA上的准确率为96.6%。
{"title":"MAR-GCN: A meta-action refinement graph convolutional network for skeleton-based human action recognition","authors":"Qiannan Guo ,&nbsp;Weiran Li ,&nbsp;Mina Han ,&nbsp;Zhenbo Li","doi":"10.1016/j.knosys.2026.115261","DOIUrl":"10.1016/j.knosys.2026.115261","url":null,"abstract":"<div><div>Graph Convolutional Networks (GCNs) have been widely adopted in skeleton-based human action recognition, demonstrating remarkable performance. However, while existing GCN-based approaches effectively model the hierarchical relationships within the human skeletal framework across multiple scales, their performance remains constrained by the lack of semantic information guidance. This limitation particularly affects the feature representation of actions that require coordinated movement of identical body parts (e.g., <em>clapping</em> and <em>rub two hands</em>) or involve subtle temporal variations (e.g., <em>reading</em> and <em>writing</em>). To address these issues, we propose a novel <strong>M</strong>eta-<strong>A</strong>ction <strong>R</strong>efinement model based on <strong>G</strong>raph <strong>C</strong>onvolutional <strong>N</strong>etwork (<strong>MAR-GCN</strong>), which constructs cross-parts human meta-actions via semantic hyperedges and refines the features in multiple dimensions to achieve optimal action representation. Specifically, the model mainly composed of a Spatial Semantic feature Aggregation Graph Convolution (SSA-GC) module and a data-driven Triple Attention Cascade (TAC) module. The SSA-GC module is designed to capture spatial semantic features of the constructed human meta-actions, effectively modeling long-range dependencies among cross-part joints. The TAC module employs a data-driven strategy to explore and refine features across spatial, temporal, and channel dimensions, thereby facilitating the learning of more discriminative features for individual action samples. Extensive experiments show that the proposed MAR-GCN achieves performance comparable to state-of-the-art methods, with accuracy scores of 96.5% on NTU-RGB+D, 90.3% on NTU-RGB+D 120, and 96.6% on NW-UCLA.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115261"},"PeriodicalIF":7.6,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cross-domain retrieval method for a sketch-based 3D part model based on feature fusion of a double-layer hypergraph 一种基于双层超图特征融合的基于草图的三维零件模型跨域检索方法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.knosys.2026.115294
Shenquan Huang, Panfeng Li, Zirui Chen, Leran Xia, Chengyong Yan, Luchuan Yu
The retrieval of a sketch-based 3D part model is an important approach for mechanical part retrieval. To address problems such as information loss caused by the cross-modal conversion of a 3D part model into a 2D view, weak representation ability of the view feature, and poor cross-domain matching effect between the 3D model and the 2D sketch, this study proposes a cross-domain retrieval method for a sketch-based 3D part model based on feature fusion of a double-layer hypergraph. First, by integrally considering complexity and diversity, a projection sketch representation and screening method for a 3D part model is proposed, which markedly reduces information redundancy and loss in the projection sketches. Subsequently, a feature extraction method for a 3D part model based on feature fusion of a dual-layer hypergraph is developed to extract and fuse features from multiple 3D model projection sketches, achieving deep fusion of sketch features from different perspectives and semantic enhancement oriented to categories. Moreover, a cross-domain matching method for a 3D part model based on multiloss metric learning is proposed. Hand-drawn sketches and 3D part models are embedded into the same metric space, and an end-to-end training framework for a 3D part model retrieval based on joint loss function is created, achieving an efficient matching between hand-drawn sketches and 3D models. Experimental results on the ESB dataset indicate that compared with existing methods, the proposed method showed considerable improvements across multiple evaluation indicators, confirming the feasibility and effectiveness of the proposed method.
基于草图的三维零件模型检索是机械零件检索的重要方法。针对三维零件模型跨模态转换为二维视图导致信息丢失、视图特征表达能力弱、三维模型与二维草图跨域匹配效果差等问题,提出了一种基于双层超图特征融合的基于草图的三维零件模型跨域检索方法。首先,综合考虑零件模型的复杂性和多样性,提出了一种三维零件模型投影草图的表示和筛选方法,显著降低了投影草图中的信息冗余和丢失;随后,提出了一种基于双层超图特征融合的三维零件模型特征提取方法,从多个三维模型投影草图中提取和融合特征,实现了不同角度草图特征的深度融合和面向类别的语义增强。此外,提出了一种基于多损失度量学习的三维零件模型跨域匹配方法。将手绘草图和三维零件模型嵌入到同一度量空间中,建立了基于联合损失函数的三维零件模型检索端到端训练框架,实现了手绘草图和三维模型的高效匹配。ESB数据集上的实验结果表明,与现有方法相比,所提方法在多个评价指标上均有较大改进,验证了所提方法的可行性和有效性。
{"title":"A cross-domain retrieval method for a sketch-based 3D part model based on feature fusion of a double-layer hypergraph","authors":"Shenquan Huang,&nbsp;Panfeng Li,&nbsp;Zirui Chen,&nbsp;Leran Xia,&nbsp;Chengyong Yan,&nbsp;Luchuan Yu","doi":"10.1016/j.knosys.2026.115294","DOIUrl":"10.1016/j.knosys.2026.115294","url":null,"abstract":"<div><div>The retrieval of a sketch-based 3D part model is an important approach for mechanical part retrieval. To address problems such as information loss caused by the cross-modal conversion of a 3D part model into a 2D view, weak representation ability of the view feature, and poor cross-domain matching effect between the 3D model and the 2D sketch, this study proposes a cross-domain retrieval method for a sketch-based 3D part model based on feature fusion of a double-layer hypergraph. First, by integrally considering complexity and diversity, a projection sketch representation and screening method for a 3D part model is proposed, which markedly reduces information redundancy and loss in the projection sketches. Subsequently, a feature extraction method for a 3D part model based on feature fusion of a dual-layer hypergraph is developed to extract and fuse features from multiple 3D model projection sketches, achieving deep fusion of sketch features from different perspectives and semantic enhancement oriented to categories. Moreover, a cross-domain matching method for a 3D part model based on multiloss metric learning is proposed. Hand-drawn sketches and 3D part models are embedded into the same metric space, and an end-to-end training framework for a 3D part model retrieval based on joint loss function is created, achieving an efficient matching between hand-drawn sketches and 3D models. Experimental results on the ESB dataset indicate that compared with existing methods, the proposed method showed considerable improvements across multiple evaluation indicators, confirming the feasibility and effectiveness of the proposed method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115294"},"PeriodicalIF":7.6,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PDN-MVSNet: A texture-prior-informed multi-view stereo network for weak-texture reconstruction PDN-MVSNet:一种基于纹理先验的弱纹理重建多视点立体网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.knosys.2025.115229
Jie Han , Ning Yang , Jun Fang , Lanbo Zhao , Wei Wei , Lin Hu
Multi-view stereo (MVS) remains challenged by two intertwined factors in realistic scenes: low-contrast or reflective regions where feature pyramids under-extract discriminative cues, and uncontrolled error propagation across coarse-to-fine stages in the absence of explicit geometric regularization. We present PDN-MVSNet, a framework that couples texture priors and geometry so reconstruction can exploit whichever signal is informative at each location. First, a Texture-Prior Multi-scale Feature Extractor (TP-MFE) enriches structure by fusing learnable features with hand-crafted priors, improving discrimination when cues are faint. Second, a Dual Normal-Depth Regularizer (DNDR) imposes a bidirectional consistency loop between depth and surface normals, converting depth to normals and back to refine both while suppressing scale-to-scale drift and producing edge-aligned normals. Third, a Normal-Similarity Adaptive Cost Aggregation (NS-ACA) module reweights neighborhood support in the cost volume using normal agreement, yielding robustness to illumination changes, occlusion edges, and low contrast. Extensive evaluations on DTU, BlendedMVS, and Tanks and Temples demonstrate competitive average performance: PDN-MVSNet attains 0.303 mm overall accuracy on DTU and a 64.64 average F-score on Tanks and Temples, with 3/8 scene-wise leads.
在现实场景中,多视图立体(MVS)仍然受到两个相互交织的因素的挑战:低对比度或反射区域,其中特征金字塔未充分提取鉴别线索,以及在没有明确几何正则化的情况下,从粗到细阶段的不受控制的误差传播。我们提出了PDN-MVSNet,这是一个结合纹理先验和几何的框架,因此重建可以利用每个位置的任何信息信号。首先,纹理先验多尺度特征提取器(TP-MFE)通过融合可学习特征和手工先验来丰富结构,提高线索模糊时的识别能力。其次,双法向深度正则化器(DNDR)在深度和表面法线之间施加双向一致性循环,将深度转换为法线并返回以精炼两者,同时抑制尺度间的漂移并产生边缘对齐的法线。第三,normal - similarity Adaptive Cost Aggregation (NS-ACA)模块使用normal agreement重新加权Cost volume中的邻域支持,从而对光照变化、遮挡边缘和低对比度具有鲁棒性。对DTU、BlendedMVS和坦克和神庙的广泛评估显示出具有竞争力的平均性能:PDN-MVSNet在DTU上的总体精度达到0.303 mm,在坦克和神庙上的平均f分为64.64,在场景方面领先3/8。
{"title":"PDN-MVSNet: A texture-prior-informed multi-view stereo network for weak-texture reconstruction","authors":"Jie Han ,&nbsp;Ning Yang ,&nbsp;Jun Fang ,&nbsp;Lanbo Zhao ,&nbsp;Wei Wei ,&nbsp;Lin Hu","doi":"10.1016/j.knosys.2025.115229","DOIUrl":"10.1016/j.knosys.2025.115229","url":null,"abstract":"<div><div>Multi-view stereo (MVS) remains challenged by two intertwined factors in realistic scenes: low-contrast or reflective regions where feature pyramids under-extract discriminative cues, and uncontrolled error propagation across coarse-to-fine stages in the absence of explicit geometric regularization. We present PDN-MVSNet, a framework that couples texture priors and geometry so reconstruction can exploit whichever signal is informative at each location. First, a Texture-Prior Multi-scale Feature Extractor (TP-MFE) enriches structure by fusing learnable features with hand-crafted priors, improving discrimination when cues are faint. Second, a Dual Normal-Depth Regularizer (DNDR) imposes a bidirectional consistency loop between depth and surface normals, converting depth to normals and back to refine both while suppressing scale-to-scale drift and producing edge-aligned normals. Third, a Normal-Similarity Adaptive Cost Aggregation (NS-ACA) module reweights neighborhood support in the cost volume using normal agreement, yielding robustness to illumination changes, occlusion edges, and low contrast. Extensive evaluations on DTU, BlendedMVS, and Tanks and Temples demonstrate competitive average performance: PDN-MVSNet attains 0.303 mm overall accuracy on DTU and a 64.64 average F-score on Tanks and Temples, with 3/8 scene-wise leads.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"335 ","pages":"Article 115229"},"PeriodicalIF":7.6,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recurrent neural networks with attention mechanisms and dimensionality reduction for accurate data-driven modelling of antennas 具有注意机制和降维的递归神经网络用于天线的精确数据驱动建模
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.knosys.2026.115274
Kaustab C. Sahu , Slawomir Koziel , Anna Pietrenko-Dabrowska
Electromagnetic (EM) simulation is an essential tool in the design of modern antennas, offering the ability to capture complex phenomena such as mutual coupling, dielectric and radiation losses, and environmental effects. However, this accuracy comes at a high computational expense. Behavioral modelling has been widely adopted to mitigate these costs and to accelerate EM-driven design procedures. Yet, developing design-ready models that span broad ranges of antenna geometries, materials, operating conditions, and frequencies remains challenging due to factors such as the curse of dimensionality. This study presents an innovative data-driven modelling strategy utilizing long short-term memory (LSTM)-based recurrent neural networks (RNNs) enhanced with gated recurrent units (GRUs) and an attention mechanism. In our approach, exploiting the problem-specific knowledge is realized by treating frequency as a sequential parameter, thus, the proposed approach more effectively captures dependencies in antenna responses. The LSTM and GRU layers model long-term relationships, while the attention layer processes outputs from multiple GRU layers, selectively emphasizing the most relevant frequency components—such as those corresponding to antenna resonances—rather than treating all inputs equally. This dynamic weighting allows the model to focus on key frequency bands, improving predictive accuracy and maximizing the utility of available training data. Further refinement is achieved through explicit dimensionality reduction using global sensitivity analysis, ensuring a streamlined yet highly effective modelling process. The major original contributions of this work include the sequential treatment of frequency characteristics (unlike existing ANN approaches, including RNN) and the adaptive weighting of frequency components to improve modelling of critical sub-ranges (e.g., resonances), combined with sensitivity-driven domain confinement. Collectively, these innovations improve the accuracy of the proposed modelling approach beyond the capability of state-of-the-art benchmark techniques. Its performance has been demonstrated through extensive comparative experiments involving various regression and deep learning models.
电磁(EM)仿真是现代天线设计中必不可少的工具,它提供了捕捉复杂现象的能力,如相互耦合、介电和辐射损耗以及环境影响。然而,这种准确性需要很高的计算成本。行为建模已被广泛采用,以减轻这些成本,并加快电磁驱动的设计过程。然而,由于诸如维度诅咒之类的因素,开发涵盖广泛天线几何形状、材料、工作条件和频率的设计就绪模型仍然具有挑战性。本研究提出了一种创新的数据驱动建模策略,利用基于长短期记忆(LSTM)的递归神经网络(rnn),增强了门控递归单元(gru)和注意机制。在我们的方法中,利用特定问题的知识是通过将频率作为顺序参数来实现的,因此,所提出的方法更有效地捕获天线响应中的依赖性。LSTM和GRU层对长期关系进行建模,而注意力层处理来自多个GRU层的输出,有选择地强调最相关的频率成分(例如与天线谐振相对应的频率成分),而不是平等地处理所有输入。这种动态加权使模型能够专注于关键频段,提高预测精度并最大限度地利用可用的训练数据。进一步的细化是通过使用全局敏感性分析显式降维实现的,确保了一个精简而高效的建模过程。这项工作的主要原始贡献包括频率特性的顺序处理(与现有的人工神经网络方法不同,包括RNN)和频率分量的自适应加权,以改进关键子范围(例如共振)的建模,并结合灵敏度驱动的域限制。总的来说,这些创新提高了所提出的建模方法的准确性,超出了最先进的基准技术的能力。其性能已通过涉及各种回归和深度学习模型的广泛比较实验得到证明。
{"title":"Recurrent neural networks with attention mechanisms and dimensionality reduction for accurate data-driven modelling of antennas","authors":"Kaustab C. Sahu ,&nbsp;Slawomir Koziel ,&nbsp;Anna Pietrenko-Dabrowska","doi":"10.1016/j.knosys.2026.115274","DOIUrl":"10.1016/j.knosys.2026.115274","url":null,"abstract":"<div><div>Electromagnetic (EM) simulation is an essential tool in the design of modern antennas, offering the ability to capture complex phenomena such as mutual coupling, dielectric and radiation losses, and environmental effects. However, this accuracy comes at a high computational expense. Behavioral modelling has been widely adopted to mitigate these costs and to accelerate EM-driven design procedures. Yet, developing design-ready models that span broad ranges of antenna geometries, materials, operating conditions, and frequencies remains challenging due to factors such as the curse of dimensionality. This study presents an innovative data-driven modelling strategy utilizing long short-term memory (LSTM)-based recurrent neural networks (RNNs) enhanced with gated recurrent units (GRUs) and an attention mechanism. In our approach, exploiting the problem-specific knowledge is realized by treating frequency as a sequential parameter, thus, the proposed approach more effectively captures dependencies in antenna responses. The LSTM and GRU layers model long-term relationships, while the attention layer processes outputs from multiple GRU layers, selectively emphasizing the most relevant frequency components—such as those corresponding to antenna resonances—rather than treating all inputs equally. This dynamic weighting allows the model to focus on key frequency bands, improving predictive accuracy and maximizing the utility of available training data. Further refinement is achieved through explicit dimensionality reduction using global sensitivity analysis, ensuring a streamlined yet highly effective modelling process. The major original contributions of this work include the sequential treatment of frequency characteristics (unlike existing ANN approaches, including RNN) and the adaptive weighting of frequency components to improve modelling of critical sub-ranges (e.g., resonances), combined with sensitivity-driven domain confinement. Collectively, these innovations improve the accuracy of the proposed modelling approach beyond the capability of state-of-the-art benchmark techniques. Its performance has been demonstrated through extensive comparative experiments involving various regression and deep learning models.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"335 ","pages":"Article 115274"},"PeriodicalIF":7.6,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic modulation classification using fractional S-transform and semi-supervised learning with confidence-guided consistency regularization 基于分数阶s变换的自动调制分类和基于置信度指导的一致性正则化的半监督学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.knosys.2026.115260
Anwr Hasan Yahya Hasan Abohadi , Ningbo Zhu , Amal Abdullah Mohammed Mohammed Yayah , Weihua Guo , Ali M. Alsaih
Automatic modulation classification (AMC) is a key enabling component in cognitive radio, supporting automatic identification of the modulation type of a received signal. Recently, deep learning approaches have been extensively adopted to improve AMC performance. Given the intrinsic scarcity of labeled modulated signals, semi-supervised learning and self-supervised representation learning methods for AMC have progressed notably, aiming to exploit additional unlabeled samples to improve predictive-model generalization. In this paper, a novel semi-supervised AMC method is proposed, in which self-supervised representation learning is leveraged to enhance both feature representation and classification performance. Specifically, self-supervised consistency regularization is enforced between the model predictions obtained from two strongly augmented views of a signal. Each prediction is weighted by a learnable confidence measure, estimated using the prediction from a weakly augmented view of the same signal as an anchor, thereby guiding the consistency loss toward the more reliable branch. Three semantic-preserving augmentation strategies are employed to generate two strongly augmented views and one weakly augmented view; the resulting views are transformed into the time-frequency domain using the fractional S transform (FrST). Extensive experiments on the RML2016.10A and RML2016.04C datasets under various labeled-sample regimes demonstrate that the proposed method significantly outperforms competing approaches, particularly in scenarios of extreme labeled-data scarcity.
自动调制分类(AMC)是认知无线电的关键使能组件,支持对接收信号的调制类型进行自动识别。最近,深度学习方法被广泛用于提高AMC的性能。鉴于标记调制信号的内在稀缺性,AMC的半监督学习和自监督表示学习方法取得了显著进展,旨在利用额外的未标记样本来提高预测模型的泛化。本文提出了一种新的半监督AMC方法,该方法利用自监督表示学习来提高特征表示和分类性能。具体来说,在从信号的两个强增强视图中获得的模型预测之间强制实施自监督一致性正则化。每个预测都由一个可学习的置信度度量来加权,使用来自相同信号的弱增强视图的预测作为锚点进行估计,从而将一致性损失引导到更可靠的分支。采用三种保持语义的增强策略生成两个强增强视图和一个弱增强视图;使用分数S变换(FrST)将结果视图转换为时频域。在各种标记样本制度下对RML2016.10A和RML2016.04C数据集进行的大量实验表明,所提出的方法显着优于竞争方法,特别是在极端标记数据稀缺的情况下。
{"title":"Automatic modulation classification using fractional S-transform and semi-supervised learning with confidence-guided consistency regularization","authors":"Anwr Hasan Yahya Hasan Abohadi ,&nbsp;Ningbo Zhu ,&nbsp;Amal Abdullah Mohammed Mohammed Yayah ,&nbsp;Weihua Guo ,&nbsp;Ali M. Alsaih","doi":"10.1016/j.knosys.2026.115260","DOIUrl":"10.1016/j.knosys.2026.115260","url":null,"abstract":"<div><div>Automatic modulation classification (AMC) is a key enabling component in cognitive radio, supporting automatic identification of the modulation type of a received signal. Recently, deep learning approaches have been extensively adopted to improve AMC performance. Given the intrinsic scarcity of labeled modulated signals, semi-supervised learning and self-supervised representation learning methods for AMC have progressed notably, aiming to exploit additional unlabeled samples to improve predictive-model generalization. In this paper, a novel semi-supervised AMC method is proposed, in which self-supervised representation learning is leveraged to enhance both feature representation and classification performance. Specifically, self-supervised consistency regularization is enforced between the model predictions obtained from two strongly augmented views of a signal. Each prediction is weighted by a learnable confidence measure, estimated using the prediction from a weakly augmented view of the same signal as an anchor, thereby guiding the consistency loss toward the more reliable branch. Three semantic-preserving augmentation strategies are employed to generate two strongly augmented views and one weakly augmented view; the resulting views are transformed into the time-frequency domain using the fractional <em>S</em> transform (FrST). Extensive experiments on the RML2016.10A and RML2016.04C datasets under various labeled-sample regimes demonstrate that the proposed method significantly outperforms competing approaches, particularly in scenarios of extreme labeled-data scarcity.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115260"},"PeriodicalIF":7.6,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From benchmarks to interpretability: A holistic survey of deep learning for time series classification 从基准到可解释性:时间序列分类深度学习的整体调查
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.knosys.2025.115251
Jahoon Jeong , Youngje Oh , Donghwan Kim, Hyunsoo Yoon
Time Series Classification (TSC) is a critical task in machine learning, with wide-ranging applications in healthcare, finance, human activity recognition, and industrial monitoring. Traditional methods based on handcrafted features often struggle with scalability, robustness, and the modeling of complex temporal dependencies. This survey presents a comprehensive and task-centric review of recent deep learning-based approaches for TSC. We systematically cover the full pipeline, from data characteristics and preprocessing strategies to architectural design and learning paradigms. The review encompasses core model families such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformers, Graph Neural Networks (GNNs), and hybrid variants. We also explore emerging paradigms including self-supervised learning, multiple-instance learning, and foundation models tailored to time series data. Unlike prior reviews, our work systematically integrates model architectures and learning strategies into a unified taxonomy, highlighting their structural relationships while emphasizing interpretability, scalability, and domain applicability. We further outline open challenges and future directions toward building generalizable, efficient, and explainable TSC systems. This work aims to serve as a foundational reference for researchers and practitioners developing next-generation deep learning models for time series classification.
时间序列分类(TSC)是机器学习中的一项关键任务,在医疗保健、金融、人类活动识别和工业监控等领域有着广泛的应用。基于手工特征的传统方法经常与可伸缩性、健壮性和复杂时间依赖性的建模作斗争。本调查对最近基于深度学习的TSC方法进行了全面和以任务为中心的回顾。我们系统地涵盖了整个流程,从数据特征和预处理策略到架构设计和学习范例。该综述涵盖了核心模型家族,如卷积神经网络(cnn)、循环神经网络(rnn)、变压器、图神经网络(gnn)和混合变体。我们还探讨了新兴的范例,包括自监督学习、多实例学习和针对时间序列数据定制的基础模型。与之前的评论不同,我们的工作系统地将模型架构和学习策略集成到一个统一的分类法中,强调它们的结构关系,同时强调可解释性、可扩展性和领域适用性。我们进一步概述了开放的挑战和未来的方向,以建立通用的,高效的,可解释的TSC系统。这项工作旨在为研究人员和从业人员开发下一代时间序列分类深度学习模型提供基础参考。
{"title":"From benchmarks to interpretability: A holistic survey of deep learning for time series classification","authors":"Jahoon Jeong ,&nbsp;Youngje Oh ,&nbsp;Donghwan Kim,&nbsp;Hyunsoo Yoon","doi":"10.1016/j.knosys.2025.115251","DOIUrl":"10.1016/j.knosys.2025.115251","url":null,"abstract":"<div><div>Time Series Classification (TSC) is a critical task in machine learning, with wide-ranging applications in healthcare, finance, human activity recognition, and industrial monitoring. Traditional methods based on handcrafted features often struggle with scalability, robustness, and the modeling of complex temporal dependencies. This survey presents a comprehensive and task-centric review of recent deep learning-based approaches for TSC. We systematically cover the full pipeline, from data characteristics and preprocessing strategies to architectural design and learning paradigms. The review encompasses core model families such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformers, Graph Neural Networks (GNNs), and hybrid variants. We also explore emerging paradigms including self-supervised learning, multiple-instance learning, and foundation models tailored to time series data. Unlike prior reviews, our work systematically integrates model architectures and learning strategies into a unified taxonomy, highlighting their structural relationships while emphasizing interpretability, scalability, and domain applicability. We further outline open challenges and future directions toward building generalizable, efficient, and explainable TSC systems. This work aims to serve as a foundational reference for researchers and practitioners developing next-generation deep learning models for time series classification.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115251"},"PeriodicalIF":7.6,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1