Agent-Environment Network for Temporal Action Proposal Generation

Viet-Khoa Vo-Ho, Ngan T. H. Le, Kashu Yamazaki, A. Sugimoto, Minh-Triet Tran
{"title":"Agent-Environment Network for Temporal Action Proposal Generation","authors":"Viet-Khoa Vo-Ho, Ngan T. H. Le, Kashu Yamazaki, A. Sugimoto, Minh-Triet Tran","doi":"10.1109/ICASSP39728.2021.9415101","DOIUrl":null,"url":null,"abstract":"Temporal action proposal generation is an essential and challenging task that aims at localizing temporal intervals containing human actions in untrimmed videos. Most of existing approaches are unable to follow the human cognitive process of understanding the video context due to lack of attention mechanism to express the concept of an action or an agent who performs the action or the interaction between the agent and the environment. Based on the action definition that a human, known as an agent, interacts with the environment and performs an action that affects the environment, we propose a contextual Agent-Environment Network. Our proposed contextual AEN involves (i) agent pathway, operating at a local level to tell about which humans/agents are acting and (ii) environment pathway operating at a global level to tell about how the agents interact with the environment. Comprehensive evaluations on 20-action THUMOS-14 and 200-action ActivityNet-1.3 datasets with different backbone networks, i.e C3D and SlowFast, show that our method robustly exhibits outperformance against state-of-the-art methods regardless of the employed backbone network.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9415101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Temporal action proposal generation is an essential and challenging task that aims at localizing temporal intervals containing human actions in untrimmed videos. Most of existing approaches are unable to follow the human cognitive process of understanding the video context due to lack of attention mechanism to express the concept of an action or an agent who performs the action or the interaction between the agent and the environment. Based on the action definition that a human, known as an agent, interacts with the environment and performs an action that affects the environment, we propose a contextual Agent-Environment Network. Our proposed contextual AEN involves (i) agent pathway, operating at a local level to tell about which humans/agents are acting and (ii) environment pathway operating at a global level to tell about how the agents interact with the environment. Comprehensive evaluations on 20-action THUMOS-14 and 200-action ActivityNet-1.3 datasets with different backbone networks, i.e C3D and SlowFast, show that our method robustly exhibits outperformance against state-of-the-art methods regardless of the employed backbone network.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
时间动作提案生成的agent -环境网络
时间动作建议生成是一项重要且具有挑战性的任务,其目的是在未修剪的视频中定位包含人类动作的时间间隔。现有的大多数方法由于缺乏关注机制来表达动作或执行动作的主体或主体与环境之间的相互作用的概念,因此无法遵循人类理解视频上下文的认知过程。基于人类(称为代理)与环境交互并执行影响环境的操作的行为定义,我们提出了上下文代理-环境网络。我们提出的上下文AEN包括(i)代理途径,在局部层面运行,以告知哪些人类/代理正在行动;(ii)环境途径,在全球层面运行,以告知代理如何与环境相互作用。对不同骨干网(即C3D和SlowFast)的20-action THUMOS-14和200-action ActivityNet-1.3数据集的综合评估表明,无论采用何种骨干网,我们的方法都比最先进的方法表现出更强的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Subspace Oddity - Optimization on Product of Stiefel Manifolds for EEG Data Recognition of Dynamic Hand Gesture Based on Mm-Wave Fmcw Radar Micro-Doppler Signatures Multi-Decoder Dprnn: Source Separation for Variable Number of Speakers Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention On The Accuracy Limit of Joint Time-Delay/Doppler/Acceleration Estimation with a Band-Limited Signal
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1