AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-05-01 DOI:10.1145/3662183

Pedro de Medeiros Gomes, Silvia Rossi, Laura Toni

{"title":"AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects","authors":"Pedro de Medeiros Gomes, Silvia Rossi, Laura Toni","doi":"10.1145/3662183","DOIUrl":null,"url":null,"abstract":"This paper focuses on motion prediction for point cloud sequences in the challenging case of deformable 3D objects, such as human body motion. First, we investigate the challenges caused by deformable shapes and complex motions present in this type of representation, with the ultimate goal of understanding the technical limitations of state-of-the-art models. From this understanding, we propose an improved architecture for point cloud prediction of deformable 3D objects. Specifically, to handle deformable shapes, we propose a graph-based approach that learns and exploits the spatial structure of point clouds to extract more representative features. Then, we propose a module able to combine the learned features in a adaptative manner according to the point cloud movements. The proposed adaptative module controls the composition of local and global motions for each point, enabling the network to model complex motions in deformable 3D objects more effectively. We tested the proposed method on the following datasets: MNIST moving digits, the Mixamo human bodies motions [15], JPEG [5] and CWIPC-SXR [32] real-world dynamic bodies. Simulation results demonstrate that our method outperforms the current baseline methods given its improved ability to model complex movements as well as preserve point cloud shape. Furthermore, we demonstrate the generalizability of the proposed framework for dynamic feature learning by testing the framework for action recognition on the MSRAction3D dataset [19] and achieving results on par with state-of-the-art methods.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"216 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3662183","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper focuses on motion prediction for point cloud sequences in the challenging case of deformable 3D objects, such as human body motion. First, we investigate the challenges caused by deformable shapes and complex motions present in this type of representation, with the ultimate goal of understanding the technical limitations of state-of-the-art models. From this understanding, we propose an improved architecture for point cloud prediction of deformable 3D objects. Specifically, to handle deformable shapes, we propose a graph-based approach that learns and exploits the spatial structure of point clouds to extract more representative features. Then, we propose a module able to combine the learned features in a adaptative manner according to the point cloud movements. The proposed adaptative module controls the composition of local and global motions for each point, enabling the network to model complex motions in deformable 3D objects more effectively. We tested the proposed method on the following datasets: MNIST moving digits, the Mixamo human bodies motions [15], JPEG [5] and CWIPC-SXR [32] real-world dynamic bodies. Simulation results demonstrate that our method outperforms the current baseline methods given its improved ability to model complex movements as well as preserve point cloud shape. Furthermore, we demonstrate the generalizability of the proposed framework for dynamic feature learning by testing the framework for action recognition on the MSRAction3D dataset [19] and achieving results on par with state-of-the-art methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AGAR - 用于对可变形物体点云进行自适应运动预测的注意力图-RNN

本文的重点是在可变形三维物体（如人体运动）这一具有挑战性的情况下，对点云序列进行运动预测。首先，我们研究了此类表示中存在的可变形形状和复杂运动所带来的挑战，最终目标是了解最先进模型的技术局限性。在此基础上，我们提出了一种用于可变形三维物体点云预测的改进架构。具体来说，为了处理可变形的形状，我们提出了一种基于图的方法，该方法可以学习和利用点云的空间结构来提取更具代表性的特征。然后，我们提出了一个模块，能够根据点云的移动情况，以适应性的方式将学习到的特征组合起来。所提出的自适应模块可以控制每个点的局部和全局运动的组合，从而使网络能够更有效地对可变形三维物体的复杂运动进行建模。我们在以下数据集上测试了所提出的方法：MNIST 移动数字、Mixamo 人体运动 [15]、JPEG [5] 和 CWIPC-SXR [32] 真实世界动态人体。仿真结果表明，我们的方法在复杂运动建模和保留点云形状方面的能力有所提高，因此优于当前的基线方法。此外，我们还在 MSRAction3D 数据集 [19] 上测试了该框架的动作识别能力，并取得了与最先进方法相当的结果，从而证明了所提出的动态特征学习框架的通用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.