用于视频动作识别的多路径注意力和自适应门控网络

IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Processing Letters Pub Date : 2024-03-27 DOI:10.1007/s11063-024-11591-3
Haiping Zhang, Zepeng Hu, Dongjin Yu, Liming Guan, Xu Liu, Conghao Ma
{"title":"用于视频动作识别的多路径注意力和自适应门控网络","authors":"Haiping Zhang, Zepeng Hu, Dongjin Yu, Liming Guan, Xu Liu, Conghao Ma","doi":"10.1007/s11063-024-11591-3","DOIUrl":null,"url":null,"abstract":"<p>3D CNN networks can model existing large action recognition datasets well in temporal modeling and have made extremely great progress in the field of RGB-based video action recognition. However, the previous 3D CNN models also face many troubles. For video feature extraction convolutional kernels are often designed and fixed in each layer of the network, which may not be suitable for the diversity of data in action recognition tasks. In this paper, a new model called <i>Multipath Attention and Adaptive Gating Network</i> (MAAGN) is proposed. The core idea of MAAGN is to use the <i>spatial difference module</i> (SDM) and the <i>multi-angle temporal attention module</i> (MTAM) in parallel at each layer of the multipath network to obtain spatial and temporal features, respectively, and then dynamically fuses the spatial-temporal features by the <i>adaptive gating module</i> (AGM). SDM explores the action video spatial domain using difference operators based on the attention mechanism, while MTAM tends to explore the action video temporal domain in terms of both global timing and local timing. AGM is built on an adaptive gate unit, the value of which is determined by the input of each layer, and it is unique in each layer, dynamically fusing the spatial and temporal features in the paths of each layer in the multipath network. We construct the temporal network MAAGN, which has a competitive or better performance than state-of-the-art methods in video action recognition, and we provide exhaustive experiments on several large datasets to demonstrate the effectiveness of our approach.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"27 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multipath Attention and Adaptive Gating Network for Video Action Recognition\",\"authors\":\"Haiping Zhang, Zepeng Hu, Dongjin Yu, Liming Guan, Xu Liu, Conghao Ma\",\"doi\":\"10.1007/s11063-024-11591-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>3D CNN networks can model existing large action recognition datasets well in temporal modeling and have made extremely great progress in the field of RGB-based video action recognition. However, the previous 3D CNN models also face many troubles. For video feature extraction convolutional kernels are often designed and fixed in each layer of the network, which may not be suitable for the diversity of data in action recognition tasks. In this paper, a new model called <i>Multipath Attention and Adaptive Gating Network</i> (MAAGN) is proposed. The core idea of MAAGN is to use the <i>spatial difference module</i> (SDM) and the <i>multi-angle temporal attention module</i> (MTAM) in parallel at each layer of the multipath network to obtain spatial and temporal features, respectively, and then dynamically fuses the spatial-temporal features by the <i>adaptive gating module</i> (AGM). SDM explores the action video spatial domain using difference operators based on the attention mechanism, while MTAM tends to explore the action video temporal domain in terms of both global timing and local timing. AGM is built on an adaptive gate unit, the value of which is determined by the input of each layer, and it is unique in each layer, dynamically fusing the spatial and temporal features in the paths of each layer in the multipath network. We construct the temporal network MAAGN, which has a competitive or better performance than state-of-the-art methods in video action recognition, and we provide exhaustive experiments on several large datasets to demonstrate the effectiveness of our approach.</p>\",\"PeriodicalId\":51144,\"journal\":{\"name\":\"Neural Processing Letters\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Processing Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11063-024-11591-3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Processing Letters","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11063-024-11591-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

三维 CNN 网络可以对现有的大型动作识别数据集进行良好的时态建模,并在基于 RGB 的视频动作识别领域取得了巨大进步。然而,以往的三维 CNN 模型也面临着许多问题。对于视频特征提取,卷积核通常被设计并固定在网络的每一层,这可能不适合动作识别任务中数据的多样性。本文提出了一种名为 "多路注意和自适应门控网络(MAAGN)"的新模型。MAAGN 的核心思想是在多路径网络的每一层并行使用空间差异模块(SDM)和多角度时间注意力模块(MTAM),分别获取空间和时间特征,然后通过自适应门控模块(AGM)动态融合空间-时间特征。SDM 利用基于注意力机制的差分算子探索动作视频的空间域,而 MTAM 则倾向于从全局定时和局部定时两个方面探索动作视频的时间域。AGM 建立在自适应门单元之上,其值由各层输入决定,在各层中都是唯一的,可动态融合多路径网络中各层路径的空间和时间特征。我们构建的时态网络 MAAGN 在视频动作识别方面的性能可与最先进的方法相媲美,甚至更好,我们还在几个大型数据集上进行了详尽的实验,以证明我们的方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multipath Attention and Adaptive Gating Network for Video Action Recognition

3D CNN networks can model existing large action recognition datasets well in temporal modeling and have made extremely great progress in the field of RGB-based video action recognition. However, the previous 3D CNN models also face many troubles. For video feature extraction convolutional kernels are often designed and fixed in each layer of the network, which may not be suitable for the diversity of data in action recognition tasks. In this paper, a new model called Multipath Attention and Adaptive Gating Network (MAAGN) is proposed. The core idea of MAAGN is to use the spatial difference module (SDM) and the multi-angle temporal attention module (MTAM) in parallel at each layer of the multipath network to obtain spatial and temporal features, respectively, and then dynamically fuses the spatial-temporal features by the adaptive gating module (AGM). SDM explores the action video spatial domain using difference operators based on the attention mechanism, while MTAM tends to explore the action video temporal domain in terms of both global timing and local timing. AGM is built on an adaptive gate unit, the value of which is determined by the input of each layer, and it is unique in each layer, dynamically fusing the spatial and temporal features in the paths of each layer in the multipath network. We construct the temporal network MAAGN, which has a competitive or better performance than state-of-the-art methods in video action recognition, and we provide exhaustive experiments on several large datasets to demonstrate the effectiveness of our approach.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neural Processing Letters
Neural Processing Letters 工程技术-计算机:人工智能
CiteScore
4.90
自引率
12.90%
发文量
392
审稿时长
2.8 months
期刊介绍: Neural Processing Letters is an international journal publishing research results and innovative ideas on all aspects of artificial neural networks. Coverage includes theoretical developments, biological models, new formal modes, learning, applications, software and hardware developments, and prospective researches. The journal promotes fast exchange of information in the community of neural network researchers and users. The resurgence of interest in the field of artificial neural networks since the beginning of the 1980s is coupled to tremendous research activity in specialized or multidisciplinary groups. Research, however, is not possible without good communication between people and the exchange of information, especially in a field covering such different areas; fast communication is also a key aspect, and this is the reason for Neural Processing Letters
期刊最新文献
Label-Only Membership Inference Attack Based on Model Explanation A Robot Ground Medium Classification Algorithm Based on Feature Fusion and Adaptive Spatio-Temporal Cascade Networks A Deep Learning-Based Hybrid CNN-LSTM Model for Location-Aware Web Service Recommendation A Clustering Pruning Method Based on Multidimensional Channel Information A Neural Network-Based Poisson Solver for Fluid Simulation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1