深度随机时间扭曲的动作识别

Yutaro Hiraoka, K. Fukui
{"title":"深度随机时间扭曲的动作识别","authors":"Yutaro Hiraoka, K. Fukui","doi":"10.23919/MVA57639.2023.10216189","DOIUrl":null,"url":null,"abstract":"This paper proposes an enhanced Randomized Time Warping (RTW) using CNN features, termed Deep RTW, for motion recognition. RTW is a general extension of Dynamic Time Warping (DTW), widely used for matching and comparing sequential patterns. The basic idea of RTW is to simultaneously calculate the similarities between many pairs of various warped patterns, i.e. Time elastic (TE) features generated by randomly sampling the sequential pattern while retaining their temporal order. This mechanism enables RTW to treat the changes in motion speed flexibly. However, naive TE feature vectors generated from raw images are not expected to have high discriminative power. Besides, the dimension of TE features can increase depending on the number of concatenated images. To address the limitations, we incorporate CNN features extracted from 2D/3D CNNs into the framework of RTW as input to address this issue. Our framework is very simple but effective and applicable to various types of CNN architecture. Extensive experiment on public motion datasets, Jester and Something-Something V2, supports the advantage of our method over the original CNNs.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Randomized Time Warping for Action Recognition\",\"authors\":\"Yutaro Hiraoka, K. Fukui\",\"doi\":\"10.23919/MVA57639.2023.10216189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an enhanced Randomized Time Warping (RTW) using CNN features, termed Deep RTW, for motion recognition. RTW is a general extension of Dynamic Time Warping (DTW), widely used for matching and comparing sequential patterns. The basic idea of RTW is to simultaneously calculate the similarities between many pairs of various warped patterns, i.e. Time elastic (TE) features generated by randomly sampling the sequential pattern while retaining their temporal order. This mechanism enables RTW to treat the changes in motion speed flexibly. However, naive TE feature vectors generated from raw images are not expected to have high discriminative power. Besides, the dimension of TE features can increase depending on the number of concatenated images. To address the limitations, we incorporate CNN features extracted from 2D/3D CNNs into the framework of RTW as input to address this issue. Our framework is very simple but effective and applicable to various types of CNN architecture. Extensive experiment on public motion datasets, Jester and Something-Something V2, supports the advantage of our method over the original CNNs.\",\"PeriodicalId\":338734,\"journal\":{\"name\":\"2023 18th International Conference on Machine Vision and Applications (MVA)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 18th International Conference on Machine Vision and Applications (MVA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/MVA57639.2023.10216189\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th International Conference on Machine Vision and Applications (MVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MVA57639.2023.10216189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种使用CNN特征的增强随机时间扭曲(RTW),称为Deep RTW,用于运动识别。RTW是动态时间翘曲(DTW)的一般扩展,广泛用于序列模式的匹配和比较。RTW的基本思想是同时计算许多对不同的扭曲模式之间的相似度,即时间弹性(TE)特征,该特征是在保持序列模式的时间顺序的情况下随机采样而产生的。这种机制使RTW能够灵活地处理运动速度的变化。然而,从原始图像生成的朴素TE特征向量不具有很高的判别能力。此外,TE特征的维度可以随着拼接图像数量的增加而增加。为了解决这一限制,我们将从2D/3D CNN中提取的CNN特征纳入RTW框架中作为输入来解决这个问题。我们的框架非常简单而有效,适用于各种类型的CNN架构。在公共运动数据集Jester和Something-Something V2上的大量实验证明了我们的方法比原始cnn的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deep Randomized Time Warping for Action Recognition
This paper proposes an enhanced Randomized Time Warping (RTW) using CNN features, termed Deep RTW, for motion recognition. RTW is a general extension of Dynamic Time Warping (DTW), widely used for matching and comparing sequential patterns. The basic idea of RTW is to simultaneously calculate the similarities between many pairs of various warped patterns, i.e. Time elastic (TE) features generated by randomly sampling the sequential pattern while retaining their temporal order. This mechanism enables RTW to treat the changes in motion speed flexibly. However, naive TE feature vectors generated from raw images are not expected to have high discriminative power. Besides, the dimension of TE features can increase depending on the number of concatenated images. To address the limitations, we incorporate CNN features extracted from 2D/3D CNNs into the framework of RTW as input to address this issue. Our framework is very simple but effective and applicable to various types of CNN architecture. Extensive experiment on public motion datasets, Jester and Something-Something V2, supports the advantage of our method over the original CNNs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Small Object Detection for Birds with Swin Transformer CG-based dataset generation and adversarial image conversion for deep cucumber recognition Uncertainty Criteria in Active Transfer Learning for Efficient Video-Specific Human Pose Estimation Joint Learning with Group Relation and Individual Action Diabetic Retinopathy Grading based on a Sparse Network Fusion of Heterogeneous ConvNeXt Models with Category Attention
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1