基于无监督三维骨架序列动作表示学习的广义姿态解耦网络。

IF 10.5 Q1 ENGINEERING, BIOMEDICAL Cyborg and bionic systems (Washington, D.C.) Pub Date : 2022-01-01 DOI:10.34133/cbsystems.0002
Mengyuan Liu, Fanyang Meng, Yongsheng Liang
{"title":"基于无监督三维骨架序列动作表示学习的广义姿态解耦网络。","authors":"Mengyuan Liu,&nbsp;Fanyang Meng,&nbsp;Yongsheng Liang","doi":"10.34133/cbsystems.0002","DOIUrl":null,"url":null,"abstract":"<p><p>Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midlevel feature of the autoencoder as representation. Although RNN can implicitly learn a certain amount of motion information, the extracted representation mainly describes the human shape and is insufficient to describe motion information. Therefore, we first present a handcrafted motion feature called pose flow to guide the reconstruction of the autoencoder, whose midlevel feature is expected to describe motion information. The performance is limited as we observe that actions can be distinctive in either motion direction or motion norm. For example, we can distinguish \"sitting down\" and \"standing up\" from motion direction yet distinguish \"running\" and \"jogging\" from motion norm. In these cases, it is difficult to learn distinctive features from pose flow where direction and norm are mixed. To this end, we present an explicit pose decoupled flow network (PDF-E) to learn from direction and norm in a multi-task learning framework, where 1 encoder is used to generate representation and 2 decoders are used to generating direction and norm, respectively. Further, we use reconstructing the input pose sequence as an additional constraint and present a generalized PDF network (PDF-G) to learn both motion and shape information, which achieves state-of-the-art performances on large-scale and challenging 3D action recognition datasets including the NTU RGB+D 60 dataset and NTU RGB+D 120 dataset.</p>","PeriodicalId":72764,"journal":{"name":"Cyborg and bionic systems (Washington, D.C.)","volume":"2022 ","pages":"0002"},"PeriodicalIF":10.5000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076048/pdf/","citationCount":"9","resultStr":"{\"title\":\"Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning.\",\"authors\":\"Mengyuan Liu,&nbsp;Fanyang Meng,&nbsp;Yongsheng Liang\",\"doi\":\"10.34133/cbsystems.0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midlevel feature of the autoencoder as representation. Although RNN can implicitly learn a certain amount of motion information, the extracted representation mainly describes the human shape and is insufficient to describe motion information. Therefore, we first present a handcrafted motion feature called pose flow to guide the reconstruction of the autoencoder, whose midlevel feature is expected to describe motion information. The performance is limited as we observe that actions can be distinctive in either motion direction or motion norm. For example, we can distinguish \\\"sitting down\\\" and \\\"standing up\\\" from motion direction yet distinguish \\\"running\\\" and \\\"jogging\\\" from motion norm. In these cases, it is difficult to learn distinctive features from pose flow where direction and norm are mixed. To this end, we present an explicit pose decoupled flow network (PDF-E) to learn from direction and norm in a multi-task learning framework, where 1 encoder is used to generate representation and 2 decoders are used to generating direction and norm, respectively. Further, we use reconstructing the input pose sequence as an additional constraint and present a generalized PDF network (PDF-G) to learn both motion and shape information, which achieves state-of-the-art performances on large-scale and challenging 3D action recognition datasets including the NTU RGB+D 60 dataset and NTU RGB+D 120 dataset.</p>\",\"PeriodicalId\":72764,\"journal\":{\"name\":\"Cyborg and bionic systems (Washington, D.C.)\",\"volume\":\"2022 \",\"pages\":\"0002\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076048/pdf/\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cyborg and bionic systems (Washington, D.C.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34133/cbsystems.0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cyborg and bionic systems (Washington, D.C.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/cbsystems.0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 9

摘要

人体动作表征来源于对人体形状和动作的描述。传统的无监督三维人体动作表征学习方法采用基于递归神经网络(RNN)的自编码器重构输入姿态序列,然后将自编码器的中级特征作为表征。虽然RNN可以隐式学习一定量的运动信息,但提取的表示主要描述人体形状,不足以描述运动信息。因此,我们首先提出了一个称为姿态流的手工运动特征来指导自编码器的重建,其中级特征被期望描述运动信息。由于我们观察到动作在运动方向或运动规范上可能是独特的,因此性能是有限的。例如,我们可以从运动方向上区分“坐下”和“站起来”,从运动规范上区分“跑步”和“慢跑”。在这种情况下,很难从方向和规范混合的姿势流中学习到独特的特征。为此,我们提出了一种明确的姿态解耦流网络(PDF-E),用于在多任务学习框架中从方向和范数中学习,其中1个编码器用于生成表示,2个解码器分别用于生成方向和范数。此外,我们使用重建输入姿态序列作为附加约束,并提出广义PDF网络(PDF- g)来学习运动和形状信息,该网络在大规模和具有挑战性的3D动作识别数据集(包括NTU RGB+D 60数据集和NTU RGB+D 120数据集)上实现了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generalized Pose Decoupled Network for Unsupervised 3D Skeleton Sequence-Based Action Representation Learning.

Human action representation is derived from the description of human shape and motion. The traditional unsupervised 3-dimensional (3D) human action representation learning method uses a recurrent neural network (RNN)-based autoencoder to reconstruct the input pose sequence and then takes the midlevel feature of the autoencoder as representation. Although RNN can implicitly learn a certain amount of motion information, the extracted representation mainly describes the human shape and is insufficient to describe motion information. Therefore, we first present a handcrafted motion feature called pose flow to guide the reconstruction of the autoencoder, whose midlevel feature is expected to describe motion information. The performance is limited as we observe that actions can be distinctive in either motion direction or motion norm. For example, we can distinguish "sitting down" and "standing up" from motion direction yet distinguish "running" and "jogging" from motion norm. In these cases, it is difficult to learn distinctive features from pose flow where direction and norm are mixed. To this end, we present an explicit pose decoupled flow network (PDF-E) to learn from direction and norm in a multi-task learning framework, where 1 encoder is used to generate representation and 2 decoders are used to generating direction and norm, respectively. Further, we use reconstructing the input pose sequence as an additional constraint and present a generalized PDF network (PDF-G) to learn both motion and shape information, which achieves state-of-the-art performances on large-scale and challenging 3D action recognition datasets including the NTU RGB+D 60 dataset and NTU RGB+D 120 dataset.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
审稿时长
21 weeks
期刊最新文献
Multi-Section Magnetic Soft Robot with Multirobot Navigation System for Vasculature Intervention. Advances in Biointegrated Wearable and Implantable Optoelectronic Devices for Cardiac Healthcare. Sensors and Devices Guided by Artificial Intelligence for Personalized Pain Medicine. Modeling Grid Cell Distortions with a Grid Cell Calibration Mechanism. Federated Abnormal Heart Sound Detection with Weak to No Labels.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1