面向视频实时时空动作定位

Yang Yi, Yang Sun, Saimei Yuan, Yiji Zhu, Mengyi Zhang, Wenjun Zhu
{"title":"面向视频实时时空动作定位","authors":"Yang Yi, Yang Sun, Saimei Yuan, Yiji Zhu, Mengyi Zhang, Wenjun Zhu","doi":"10.1108/aa-07-2021-0098","DOIUrl":null,"url":null,"abstract":"<h3>Purpose</h3>\n<p>The purpose of this paper is to provide a fast and accurate network for spatiotemporal action localization in videos. It detects human actions both in time and space simultaneously in real-time, which is applicable in real-world scenarios such as safety monitoring and collaborative assembly.</p><!--/ Abstract__block -->\n<h3>Design/methodology/approach</h3>\n<p>This paper design an end-to-end deep learning network called collaborator only watch once (COWO). COWO recognizes the ongoing human activities in real-time with enhanced accuracy. COWO inherits from the architecture of you only watch once (YOWO), known to be the best performing network for online action localization to date, but with three major structural modifications: COWO enhances the intraclass compactness and enlarges the interclass separability in the feature level. A new correlation channel fusion and attention mechanism are designed based on the Pearson correlation coefficient. Accordingly, a correction loss function is designed. This function minimizes the same class distance and enhances the intraclass compactness. Use a probabilistic K-means clustering technique for selecting the initial seed points. The idea behind this is that the initial distance between cluster centers should be as considerable as possible. CIOU regression loss function is applied instead of the Smooth L1 loss function to help the model converge stably.</p><!--/ Abstract__block -->\n<h3>Findings</h3>\n<p>COWO outperforms the original YOWO with improvements of frame mAP 3% and 2.1% at a speed of 35.12 fps. Compared with the two-stream, T-CNN, C3D, the improvement is about 5% and 14.5% when applied to J-HMDB-21, UCF101-24 and AGOT data sets.</p><!--/ Abstract__block -->\n<h3>Originality/value</h3>\n<p>COWO extends more flexibility for assembly scenarios as it perceives spatiotemporal human actions in real-time. It contributes to many real-world scenarios such as safety monitoring and collaborative assembly.</p><!--/ Abstract__block -->","PeriodicalId":501194,"journal":{"name":"Robotic Intelligence and Automation","volume":"262 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"COWO: towards real-time spatiotemporal action localization in videos\",\"authors\":\"Yang Yi, Yang Sun, Saimei Yuan, Yiji Zhu, Mengyi Zhang, Wenjun Zhu\",\"doi\":\"10.1108/aa-07-2021-0098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3>Purpose</h3>\\n<p>The purpose of this paper is to provide a fast and accurate network for spatiotemporal action localization in videos. It detects human actions both in time and space simultaneously in real-time, which is applicable in real-world scenarios such as safety monitoring and collaborative assembly.</p><!--/ Abstract__block -->\\n<h3>Design/methodology/approach</h3>\\n<p>This paper design an end-to-end deep learning network called collaborator only watch once (COWO). COWO recognizes the ongoing human activities in real-time with enhanced accuracy. COWO inherits from the architecture of you only watch once (YOWO), known to be the best performing network for online action localization to date, but with three major structural modifications: COWO enhances the intraclass compactness and enlarges the interclass separability in the feature level. A new correlation channel fusion and attention mechanism are designed based on the Pearson correlation coefficient. Accordingly, a correction loss function is designed. This function minimizes the same class distance and enhances the intraclass compactness. Use a probabilistic K-means clustering technique for selecting the initial seed points. The idea behind this is that the initial distance between cluster centers should be as considerable as possible. CIOU regression loss function is applied instead of the Smooth L1 loss function to help the model converge stably.</p><!--/ Abstract__block -->\\n<h3>Findings</h3>\\n<p>COWO outperforms the original YOWO with improvements of frame mAP 3% and 2.1% at a speed of 35.12 fps. Compared with the two-stream, T-CNN, C3D, the improvement is about 5% and 14.5% when applied to J-HMDB-21, UCF101-24 and AGOT data sets.</p><!--/ Abstract__block -->\\n<h3>Originality/value</h3>\\n<p>COWO extends more flexibility for assembly scenarios as it perceives spatiotemporal human actions in real-time. It contributes to many real-world scenarios such as safety monitoring and collaborative assembly.</p><!--/ Abstract__block -->\",\"PeriodicalId\":501194,\"journal\":{\"name\":\"Robotic Intelligence and Automation\",\"volume\":\"262 \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotic Intelligence and Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/aa-07-2021-0098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotic Intelligence and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/aa-07-2021-0098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的为视频中动作的时空定位提供一个快速准确的网络。它可以实时检测人类在时间和空间上的行为,适用于安全监控和协同组装等现实场景。设计/方法/方法本文设计了一个端到端深度学习网络,称为协作者只看一次(coco)。coo实时识别正在进行的人类活动,并提高了准确性。COWO继承了you only watch one (YOWO)的架构,YOWO被认为是迄今为止性能最好的在线动作定位网络,但在结构上进行了三个主要的修改:COWO增强了类内紧凑性,并在特征级别上扩大了类间可分离性。基于Pearson相关系数,设计了一种新的相关通道融合和注意机制。据此,设计了修正损失函数。这个函数最小化了相同的类距离,增强了类内的紧凑性。使用概率k均值聚类技术来选择初始种子点。这背后的想法是,星团中心之间的初始距离应该尽可能大。采用CIOU回归损失函数代替光滑L1损失函数,使模型稳定收敛。在35.12 fps的速度下,scowo比原来的YOWO帧mAP分别提高了3%和2.1%。在J-HMDB-21、UCF101-24和AGOT数据集上,与双流、T-CNN、C3D相比,分别提高了约5%和14.5%。独创性/valueCOWO为装配场景扩展了更多的灵活性,因为它可以实时感知时空的人类行为。它有助于许多现实世界的场景,如安全监控和协作组装。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
COWO: towards real-time spatiotemporal action localization in videos

Purpose

The purpose of this paper is to provide a fast and accurate network for spatiotemporal action localization in videos. It detects human actions both in time and space simultaneously in real-time, which is applicable in real-world scenarios such as safety monitoring and collaborative assembly.

Design/methodology/approach

This paper design an end-to-end deep learning network called collaborator only watch once (COWO). COWO recognizes the ongoing human activities in real-time with enhanced accuracy. COWO inherits from the architecture of you only watch once (YOWO), known to be the best performing network for online action localization to date, but with three major structural modifications: COWO enhances the intraclass compactness and enlarges the interclass separability in the feature level. A new correlation channel fusion and attention mechanism are designed based on the Pearson correlation coefficient. Accordingly, a correction loss function is designed. This function minimizes the same class distance and enhances the intraclass compactness. Use a probabilistic K-means clustering technique for selecting the initial seed points. The idea behind this is that the initial distance between cluster centers should be as considerable as possible. CIOU regression loss function is applied instead of the Smooth L1 loss function to help the model converge stably.

Findings

COWO outperforms the original YOWO with improvements of frame mAP 3% and 2.1% at a speed of 35.12 fps. Compared with the two-stream, T-CNN, C3D, the improvement is about 5% and 14.5% when applied to J-HMDB-21, UCF101-24 and AGOT data sets.

Originality/value

COWO extends more flexibility for assembly scenarios as it perceives spatiotemporal human actions in real-time. It contributes to many real-world scenarios such as safety monitoring and collaborative assembly.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Indoor fixed-point hovering control for UAVs based on visual inertial SLAM Design and performance analysis of different cambered wings for flapping-wing aerial vehicles based on wind tunnel test A novel framework inspired by human behavior for peg-in-hole assembly Development of vision–based SLAM: from traditional methods to multimodal fusion An MS-TCN based spatiotemporal model with three-axis tactile for enhancing flexible printed circuit assembly
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1