AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects

IF 5.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-05-01 DOI:10.1145/3662183
Pedro de Medeiros Gomes, Silvia Rossi, Laura Toni
{"title":"AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects","authors":"Pedro de Medeiros Gomes, Silvia Rossi, Laura Toni","doi":"10.1145/3662183","DOIUrl":null,"url":null,"abstract":"<p>This paper focuses on motion prediction for point cloud sequences in the challenging case of deformable 3D objects, such as human body motion. First, we investigate the challenges caused by deformable shapes and complex motions present in this type of representation, with the ultimate goal of understanding the technical limitations of state-of-the-art models. From this understanding, we propose an improved architecture for point cloud prediction of deformable 3D objects. Specifically, to handle deformable shapes, we propose a graph-based approach that learns and exploits the spatial structure of point clouds to extract more representative features. Then, we propose a module able to combine the learned features in a <i>adaptative</i> manner according to the point cloud movements. The proposed adaptative module controls the composition of local and global motions for each point, enabling the network to model complex motions in deformable 3D objects more effectively. We tested the proposed method on the following datasets: MNIST moving digits, the <i>Mixamo</i> human bodies motions [15], JPEG [5] and CWIPC-SXR [32] real-world dynamic bodies. Simulation results demonstrate that our method outperforms the current baseline methods given its improved ability to model complex movements as well as preserve point cloud shape. Furthermore, we demonstrate the generalizability of the proposed framework for dynamic feature learning by testing the framework for action recognition on the MSRAction3D dataset [19] and achieving results on par with state-of-the-art methods.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"216 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3662183","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This paper focuses on motion prediction for point cloud sequences in the challenging case of deformable 3D objects, such as human body motion. First, we investigate the challenges caused by deformable shapes and complex motions present in this type of representation, with the ultimate goal of understanding the technical limitations of state-of-the-art models. From this understanding, we propose an improved architecture for point cloud prediction of deformable 3D objects. Specifically, to handle deformable shapes, we propose a graph-based approach that learns and exploits the spatial structure of point clouds to extract more representative features. Then, we propose a module able to combine the learned features in a adaptative manner according to the point cloud movements. The proposed adaptative module controls the composition of local and global motions for each point, enabling the network to model complex motions in deformable 3D objects more effectively. We tested the proposed method on the following datasets: MNIST moving digits, the Mixamo human bodies motions [15], JPEG [5] and CWIPC-SXR [32] real-world dynamic bodies. Simulation results demonstrate that our method outperforms the current baseline methods given its improved ability to model complex movements as well as preserve point cloud shape. Furthermore, we demonstrate the generalizability of the proposed framework for dynamic feature learning by testing the framework for action recognition on the MSRAction3D dataset [19] and achieving results on par with state-of-the-art methods.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AGAR - 用于对可变形物体点云进行自适应运动预测的注意力图-RNN
本文的重点是在可变形三维物体(如人体运动)这一具有挑战性的情况下,对点云序列进行运动预测。首先,我们研究了此类表示中存在的可变形形状和复杂运动所带来的挑战,最终目标是了解最先进模型的技术局限性。在此基础上,我们提出了一种用于可变形三维物体点云预测的改进架构。具体来说,为了处理可变形的形状,我们提出了一种基于图的方法,该方法可以学习和利用点云的空间结构来提取更具代表性的特征。然后,我们提出了一个模块,能够根据点云的移动情况,以适应性的方式将学习到的特征组合起来。所提出的自适应模块可以控制每个点的局部和全局运动的组合,从而使网络能够更有效地对可变形三维物体的复杂运动进行建模。我们在以下数据集上测试了所提出的方法:MNIST 移动数字、Mixamo 人体运动 [15]、JPEG [5] 和 CWIPC-SXR [32] 真实世界动态人体。仿真结果表明,我们的方法在复杂运动建模和保留点云形状方面的能力有所提高,因此优于当前的基线方法。此外,我们还在 MSRAction3D 数据集 [19] 上测试了该框架的动作识别能力,并取得了与最先进方法相当的结果,从而证明了所提出的动态特征学习框架的通用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.50
自引率
5.90%
发文量
285
审稿时长
7.5 months
期刊介绍: The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.
期刊最新文献
TA-Detector: A GNN-based Anomaly Detector via Trust Relationship KF-VTON: Keypoints-Driven Flow Based Virtual Try-On Network Unified View Empirical Study for Large Pretrained Model on Cross-Domain Few-Shot Learning Multimodal Fusion for Talking Face Generation Utilizing Speech-related Facial Action Units Compressed Point Cloud Quality Index by Combining Global Appearance and Local Details
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1