用于智能视频监控动作识别的真实世界图卷积网络(RW-GCNs)

Justin Sanchez, Christopher Neff, H. Tabkhi
{"title":"用于智能视频监控动作识别的真实世界图卷积网络(RW-GCNs)","authors":"Justin Sanchez, Christopher Neff, H. Tabkhi","doi":"10.1145/3453142.3491293","DOIUrl":null,"url":null,"abstract":"Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints. To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Net-works (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32× less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10× cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still main-taining a respectful range of throughput (15.6 to 5.5 Actions per Second) on the resource constrained device. The code is available here: https://github.com/TeCSAR-UNCC/RW-GCN.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"33 1","pages":"121-134"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance\",\"authors\":\"Justin Sanchez, Christopher Neff, H. Tabkhi\",\"doi\":\"10.1145/3453142.3491293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints. To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Net-works (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32× less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10× cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still main-taining a respectful range of throughput (15.6 to 5.5 Actions per Second) on the resource constrained device. The code is available here: https://github.com/TeCSAR-UNCC/RW-GCN.\",\"PeriodicalId\":6779,\"journal\":{\"name\":\"2021 IEEE/ACM Symposium on Edge Computing (SEC)\",\"volume\":\"33 1\",\"pages\":\"121-134\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM Symposium on Edge Computing (SEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3453142.3491293\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453142.3491293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

动作识别是新兴的边缘智能视频监控和安防系统的关键算法部分。基于骨骼的动作识别是一种有吸引力的方法,它不是使用RGB像素数据,而是依赖于人体姿势信息来分类适当的动作。然而,现有算法通常假设理想条件,而这些条件并不代表现实世界的限制,例如噪声输入、延迟要求和边缘资源约束。为了解决现有方法的局限性,本文提出了真实世界图卷积网络(RW-GCNs),这是一种架构级解决方案,用于满足基于真实世界骨架的动作识别的领域约束。受人类视觉皮层反馈连接的启发,RW-GCNs在现有的近先进(SotA)时空图卷积网络(ST-GCNs)上利用细心的反馈增强。ST-GCNs的设计选择源于以信息理论为中心的原则,以解决端到端实时和边缘智能视频系统中通常遇到的空间和时间噪声。我们的研究结果表明,rw - gcn在NTU-RGB-D-120数据集上的SotA精度达到了94.1%,延迟比基线ST-GCN应用低32倍,同时在存在空间关键点噪声的西北加州大学洛杉矶分校数据集上仍然达到了90.4%的精度,从而证明了rw - gcn服务于这些应用的能力。RW-GCNs通过运行在10倍成本效益的NVIDIA Jetson Nano(与NVIDIA Xavier NX相反)上进一步显示系统可扩展性,同时在资源受限的设备上仍然保持吞吐量的尊重范围(每秒15.6到5.5个动作)。代码可从这里获得:https://github.com/TeCSAR-UNCC/RW-GCN。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance
Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints. To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Net-works (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32× less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10× cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still main-taining a respectful range of throughput (15.6 to 5.5 Actions per Second) on the resource constrained device. The code is available here: https://github.com/TeCSAR-UNCC/RW-GCN.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Data-Driven Optimal Control Decision-Making System for Multiple Autonomous Vehicles The Performance Argument for Blockchain-based Edge DNS Caching LotteryFL: Empower Edge Intelligence with Personalized and Communication-Efficient Federated Learning Collaborative Cloud-Edge-Local Computation Offloading for Multi-Component Applications Poster: Enabling Flexible Edge-assisted XR
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1