{"title":"重姿态授权RGB网视频动作识别","authors":"Song Ren, Meng Ding","doi":"10.1109/ICCECE58074.2023.10135328","DOIUrl":null,"url":null,"abstract":"Recently, works related to video action recognition focus on using hybrid streams as input to get better results. Those streams usually are combinations of RGB channel with one additional feature stream such as audio, optical flow and pose information. Among those extra streams, posture as unstructured data is more difficult to fuse with RGB channel than the others. In this paper, we propose our Heavy Pose Empowered RGB Nets (HPER-Nets) ‐‐an end-to-end multitasking model‐‐based on the thorough investigation on how to fuse posture and RGB information. Given video frames as the only input, our model will reinforce it by merging the intrinsic posture information in the form of part affinity fields (PAFs), and use this hybrid stream to perform further video action recognition. Experimental results show that our model can outperform other different methods on UCF-101, UMDB and Kinetics datasets, and with only 16 frames, a 95.3% Top-1 accuracy on UCF101, a 69.6% on HMDB and a 41.0% on Kinetics have been recorded.","PeriodicalId":120030,"journal":{"name":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heavy Pose Empowered RGB Nets for Video Action Recognition\",\"authors\":\"Song Ren, Meng Ding\",\"doi\":\"10.1109/ICCECE58074.2023.10135328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, works related to video action recognition focus on using hybrid streams as input to get better results. Those streams usually are combinations of RGB channel with one additional feature stream such as audio, optical flow and pose information. Among those extra streams, posture as unstructured data is more difficult to fuse with RGB channel than the others. In this paper, we propose our Heavy Pose Empowered RGB Nets (HPER-Nets) ‐‐an end-to-end multitasking model‐‐based on the thorough investigation on how to fuse posture and RGB information. Given video frames as the only input, our model will reinforce it by merging the intrinsic posture information in the form of part affinity fields (PAFs), and use this hybrid stream to perform further video action recognition. Experimental results show that our model can outperform other different methods on UCF-101, UMDB and Kinetics datasets, and with only 16 frames, a 95.3% Top-1 accuracy on UCF101, a 69.6% on HMDB and a 41.0% on Kinetics have been recorded.\",\"PeriodicalId\":120030,\"journal\":{\"name\":\"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE58074.2023.10135328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE58074.2023.10135328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,视频动作识别的研究主要集中在使用混合流作为输入来获得更好的结果。这些流通常是RGB通道与一个额外的特征流(如音频、光流和姿态信息)的组合。在这些额外的流中,姿态作为非结构化数据比其他数据更难与RGB通道融合。在本文中,我们基于如何融合姿态和RGB信息的深入研究,提出了我们的重姿态授权RGB网络(HPER-Nets)——一个端到端多任务模型。给定视频帧作为唯一的输入,我们的模型将通过以部分亲和场(paf)的形式合并固有姿态信息来增强它,并使用这种混合流来执行进一步的视频动作识别。实验结果表明,该模型在UCF-101、UMDB和Kinetics数据集上的表现优于其他不同的方法,仅用16帧,UCF101的Top-1准确率为95.3%,HMDB为69.6%,Kinetics为41.0%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Heavy Pose Empowered RGB Nets for Video Action Recognition
Recently, works related to video action recognition focus on using hybrid streams as input to get better results. Those streams usually are combinations of RGB channel with one additional feature stream such as audio, optical flow and pose information. Among those extra streams, posture as unstructured data is more difficult to fuse with RGB channel than the others. In this paper, we propose our Heavy Pose Empowered RGB Nets (HPER-Nets) ‐‐an end-to-end multitasking model‐‐based on the thorough investigation on how to fuse posture and RGB information. Given video frames as the only input, our model will reinforce it by merging the intrinsic posture information in the form of part affinity fields (PAFs), and use this hybrid stream to perform further video action recognition. Experimental results show that our model can outperform other different methods on UCF-101, UMDB and Kinetics datasets, and with only 16 frames, a 95.3% Top-1 accuracy on UCF101, a 69.6% on HMDB and a 41.0% on Kinetics have been recorded.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Clutter Edge and Target Detection Method Based on Central Moment Feature Adaptive short-time Fourier transform based on reinforcement learning Design and implementation of carrier aggregation and secure communication in distribution field network Power data attribution revocation searchable encrypted cloud storage Research of Intrusion Detection Based on Neural Network Optimized by Sparrow Search Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1