Knowledge-Driven Compositional Action Recognition

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-07-01 Epub Date: 2025-02-14 DOI:10.1016/j.patcog.2025.111452

Yang Liu, Fang Liu, Licheng Jiao, Qianyue Bao, Shuo Li, Lingling Li, Xu Liu

{"title":"Knowledge-Driven Compositional Action Recognition","authors":"Yang Liu, Fang Liu, Licheng Jiao, Qianyue Bao, Shuo Li, Lingling Li, Xu Liu","doi":"10.1016/j.patcog.2025.111452","DOIUrl":null,"url":null,"abstract":"<div><div>Human action often involves interaction with objects, so in action recognition, action labels can be defined by compositions of verbs and nouns. It is almost infeasible to collect and annotate enough training data for every possible composition in the real world. Therefore, the main challenge in compositional action recognition is to enable the model to understand “action-objects” compositions that have not been seen during training. We propose a Knowledge-Driven Composition Modulation Model (KCMM), which constructs unseen “action-objects” compositions to improve action recognition generalization. We first design a Grammar Knowledge-Driven Composition (GKC) module, which extracts the labels of verbs and nouns and their corresponding feature representations from compositional actions, and then modulates them under the guidance of grammatical rules to construct new “action-objects” actions. Subsequently, to verify the rationality of the new “action-objects” actions, we design a Common Knowledge-Driven Verification (CKV) module. This module extracts motion commonsense from ConceptNet and infuses it into the compositional labels to improve the comprehensiveness of the verification. It should be noted that GKC does not construct new videos, but directly composes verbs and nouns at the label and feature space to obtain new compositional action label-feature pairs. We conduct extensive experiments on Something-Else and NEU-I datasets, and our method significantly outperforms current state-of-the-art methods in both compositional settings and few-shot settings. The source code is available at <span><span>https://github.com/XDLiuyyy/KCMM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111452"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325001128","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Human action often involves interaction with objects, so in action recognition, action labels can be defined by compositions of verbs and nouns. It is almost infeasible to collect and annotate enough training data for every possible composition in the real world. Therefore, the main challenge in compositional action recognition is to enable the model to understand “action-objects” compositions that have not been seen during training. We propose a Knowledge-Driven Composition Modulation Model (KCMM), which constructs unseen “action-objects” compositions to improve action recognition generalization. We first design a Grammar Knowledge-Driven Composition (GKC) module, which extracts the labels of verbs and nouns and their corresponding feature representations from compositional actions, and then modulates them under the guidance of grammatical rules to construct new “action-objects” actions. Subsequently, to verify the rationality of the new “action-objects” actions, we design a Common Knowledge-Driven Verification (CKV) module. This module extracts motion commonsense from ConceptNet and infuses it into the compositional labels to improve the comprehensiveness of the verification. It should be noted that GKC does not construct new videos, but directly composes verbs and nouns at the label and feature space to obtain new compositional action label-feature pairs. We conduct extensive experiments on Something-Else and NEU-I datasets, and our method significantly outperforms current state-of-the-art methods in both compositional settings and few-shot settings. The source code is available at https://github.com/XDLiuyyy/KCMM.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

知识驱动的组合动作识别

人类的行为经常涉及到与物体的交互，因此在动作识别中，动作标签可以通过动词和名词的组合来定义。在现实世界中，为每种可能的组合收集和注释足够的训练数据几乎是不可行的。因此，组合动作识别的主要挑战是使模型能够理解在训练期间未见过的“动作对象”组合。我们提出了一种知识驱动的组合调制模型（KCMM），该模型构建了不可见的“动作对象”组合以提高动作识别的泛化。我们首先设计了一个语法知识驱动合成（GKC）模块，该模块从合成动作中提取动词和名词的标签及其对应的特征表示，然后在语法规则的指导下对其进行调制，构建新的“动作对象”动作。随后，为了验证新“动作对象”动作的合理性，我们设计了一个通用知识驱动验证（Common Knowledge-Driven Verification， CKV）模块。该模块从ConceptNet中提取运动常识，并将其注入到组合标签中，以提高验证的全面性。需要注意的是，GKC并没有构造新的视频，而是直接在标签和特征空间对动词和名词进行组合，得到新的组合动作标签-特征对。我们在Something-Else和nue - i数据集上进行了大量的实验，我们的方法在构图设置和少量镜头设置方面都明显优于当前最先进的方法。源代码可从https://github.com/XDLiuyyy/KCMM获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.