Early Prediction of Human Action by Deep Reinforcement Learning

Hareesh Devarakonda, Snehasis Mukherjee
{"title":"Early Prediction of Human Action by Deep Reinforcement Learning","authors":"Hareesh Devarakonda, Snehasis Mukherjee","doi":"10.1109/NCC52529.2021.9530126","DOIUrl":null,"url":null,"abstract":"Early action prediction in video is a challenging task where the action of a human performer is expected to be predicted using only the initial few frames. We propose a novel technique for action prediction based on Deep Reinforcement learning, employing a Deep Q-Network (DQN) and the ResNext as the basic CNN architecture. The proposed DQN can predict the actions in videos from features extracted from the first few frames of the video, and the basic CNN model is adjusted by tuning the hyperparameters of the CNN network. The ResNext model is adjusted based on the reward provided by the DQN, and the hyperparameters are updated to predict actions. The agent's stopping criteria is higher or equal to the validation accuracy value. The DQN is rewarded based on the sequential input frames and the transition of action states (i.e., prediction of action class for an incremental 10 percent of the video). The visual features extracted from the first 10 percent of the video is forwarded to the next 10 percent of the video for each action state. The proposed method is tested on the UCF101 dataset and has outperformed the state-of-the-art in action prediction.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC52529.2021.9530126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Early action prediction in video is a challenging task where the action of a human performer is expected to be predicted using only the initial few frames. We propose a novel technique for action prediction based on Deep Reinforcement learning, employing a Deep Q-Network (DQN) and the ResNext as the basic CNN architecture. The proposed DQN can predict the actions in videos from features extracted from the first few frames of the video, and the basic CNN model is adjusted by tuning the hyperparameters of the CNN network. The ResNext model is adjusted based on the reward provided by the DQN, and the hyperparameters are updated to predict actions. The agent's stopping criteria is higher or equal to the validation accuracy value. The DQN is rewarded based on the sequential input frames and the transition of action states (i.e., prediction of action class for an incremental 10 percent of the video). The visual features extracted from the first 10 percent of the video is forwarded to the next 10 percent of the video for each action state. The proposed method is tested on the UCF101 dataset and has outperformed the state-of-the-art in action prediction.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度强化学习的人类行为早期预测
视频中的早期动作预测是一项具有挑战性的任务,因为仅使用最初的几帧就可以预测人类表演者的动作。我们提出了一种基于深度强化学习的动作预测新技术,采用深度Q-Network (DQN)和ResNext作为基本的CNN架构。提出的DQN可以从视频的前几帧提取的特征中预测视频中的动作,并通过调整CNN网络的超参数来调整基本CNN模型。根据DQN提供的奖励调整ResNext模型,并更新超参数以预测动作。代理的停止标准高于或等于验证精度值。DQN基于连续输入帧和动作状态的转换(即,对视频增量10%的动作类别进行预测)而获得奖励。对于每个动作状态,从视频的前10%提取的视觉特征被转发到视频的下10%。该方法在UCF101数据集上进行了测试,并在动作预测方面优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Biomedical Image Retrieval using Muti-Scale Local Bit-plane Arbitrary Shaped Patterns Forensics of Decompressed JPEG Color Images Based on Chroma Subsampling Optimized Bio-inspired Spiking Neural Models based Anatomical and Functional Neurological Image Fusion in NSST Domain Improved Hankel Norm Criterion for Interfered Nonlinear Digital Filters Subjected to Hardware Constraints The Capacity of Photonic Erasure Channels with Detector Dead Times
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1