Enhancing Robotic Collaborative Tasks Through Contextual Human Motion Prediction and Intention Inference

IF 3.8 2区计算机科学 Q2 ROBOTICS International Journal of Social Robotics Pub Date : 2024-07-13 DOI:10.1007/s12369-024-01140-2

Javier Laplaza, Francesc Moreno, Alberto Sanfeliu

{"title":"Enhancing Robotic Collaborative Tasks Through Contextual Human Motion Prediction and Intention Inference","authors":"Javier Laplaza, Francesc Moreno, Alberto Sanfeliu","doi":"10.1007/s12369-024-01140-2","DOIUrl":null,"url":null,"abstract":"<p>Predicting human motion based on a sequence of past observations is crucial for various applications in robotics and computer vision. Currently, this problem is typically addressed by training deep learning models using some of the most well-known 3D human motion datasets widely used in the community. However, these datasets generally do not consider how humans behave and move when a robot is nearby, leading to a data distribution different from the real distribution of motion that robots will encounter when collaborating with humans. Additionally, incorporating contextual information related to the interactive task between the human and the robot, as well as information on the human willingness to collaborate with the robot, can improve not only the accuracy of the predicted sequence but also serve as a useful tool for robots to navigate through collaborative tasks successfully. In this research, we propose a deep learning architecture that predicts both 3D human body motion and human intention for collaborative tasks. The model employs a multi-head attention mechanism, taking into account human motion and task context as inputs. The resulting outputs include the predicted motion of the human body and the inferred human intention. We have validated this architecture in two different tasks: collaborative object handover and collaborative grape harvesting. While the architecture remains the same for both tasks, the inputs differ. In the handover task, the architecture considers human motion, robot end effector, and obstacle positions as inputs. Additionally, the model can be conditioned on the desired intention to tailor the output motion accordingly. To assess the performance of the collaborative handover task, we conducted a user study to evaluate human perception of the robot’s sociability, naturalness, security, and comfort. This evaluation was conducted by comparing the robot’s behavior when it utilized the prediction in its planner versus when it did not. Furthermore, we also applied the model to a collaborative grape harvesting task. By integrating human motion prediction and human intention inference, our architecture shows promising results in enhancing the capabilities of robots in collaborative scenarios. The model’s flexibility allows it to handle various tasks with different inputs, making it adaptable to real-world applications.\n</p>","PeriodicalId":14361,"journal":{"name":"International Journal of Social Robotics","volume":"25 1","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Social Robotics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12369-024-01140-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Predicting human motion based on a sequence of past observations is crucial for various applications in robotics and computer vision. Currently, this problem is typically addressed by training deep learning models using some of the most well-known 3D human motion datasets widely used in the community. However, these datasets generally do not consider how humans behave and move when a robot is nearby, leading to a data distribution different from the real distribution of motion that robots will encounter when collaborating with humans. Additionally, incorporating contextual information related to the interactive task between the human and the robot, as well as information on the human willingness to collaborate with the robot, can improve not only the accuracy of the predicted sequence but also serve as a useful tool for robots to navigate through collaborative tasks successfully. In this research, we propose a deep learning architecture that predicts both 3D human body motion and human intention for collaborative tasks. The model employs a multi-head attention mechanism, taking into account human motion and task context as inputs. The resulting outputs include the predicted motion of the human body and the inferred human intention. We have validated this architecture in two different tasks: collaborative object handover and collaborative grape harvesting. While the architecture remains the same for both tasks, the inputs differ. In the handover task, the architecture considers human motion, robot end effector, and obstacle positions as inputs. Additionally, the model can be conditioned on the desired intention to tailor the output motion accordingly. To assess the performance of the collaborative handover task, we conducted a user study to evaluate human perception of the robot’s sociability, naturalness, security, and comfort. This evaluation was conducted by comparing the robot’s behavior when it utilized the prediction in its planner versus when it did not. Furthermore, we also applied the model to a collaborative grape harvesting task. By integrating human motion prediction and human intention inference, our architecture shows promising results in enhancing the capabilities of robots in collaborative scenarios. The model’s flexibility allows it to handle various tasks with different inputs, making it adaptable to real-world applications.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过上下文人类运动预测和意图推理加强机器人协作任务

根据一系列过去的观察结果预测人类运动对于机器人和计算机视觉领域的各种应用至关重要。目前，解决这一问题的典型方法是使用一些最著名的三维人类运动数据集来训练深度学习模型，这些数据集在社区中被广泛使用。然而，这些数据集一般不考虑机器人在附近时人类的行为和移动方式，导致数据分布与机器人与人类协作时遇到的真实运动分布不同。此外，纳入与人类和机器人之间交互任务相关的上下文信息，以及人类与机器人合作意愿的信息，不仅能提高预测序列的准确性，还能成为机器人成功完成协作任务的有用工具。在这项研究中，我们提出了一种深度学习架构，可以预测协作任务中的三维人体运动和人类意图。该模型采用多头注意力机制，将人体运动和任务背景作为输入。结果输出包括预测的人体运动和推断的人类意图。我们在两个不同的任务中对这一架构进行了验证：协作物体交接和协作葡萄采摘。虽然两个任务的架构相同，但输入不同。在交接任务中，该架构将人类运动、机器人末端效应器和障碍物位置作为输入。此外，该模型还可根据所需的意图来调整输出运动。为了评估协作交接任务的性能，我们进行了一项用户研究，以评估人类对机器人社交性、自然性、安全性和舒适性的感知。这项评估是通过比较机器人在计划程序中使用预测和不使用预测时的行为来进行的。此外，我们还将该模型应用于协作收获葡萄的任务中。通过整合人类运动预测和人类意图推理，我们的架构在增强机器人在协作场景中的能力方面取得了可喜的成果。该模型的灵活性使其能够处理具有不同输入的各种任务，从而使其能够适应现实世界的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Social Robotics ROBOTICS-

CiteScore

9.80

自引率

8.50%

发文量

期刊介绍： Social Robotics is the study of robots that are able to interact and communicate among themselves, with humans, and with the environment, within the social and cultural structure attached to its role. The journal covers a broad spectrum of topics related to the latest technologies, new research results and developments in the area of social robotics on all levels, from developments in core enabling technologies to system integration, aesthetic design, applications and social implications. It provides a platform for like-minded researchers to present their findings and latest developments in social robotics, covering relevant advances in engineering, computing, arts and social sciences. The journal publishes original, peer reviewed articles and contributions on innovative ideas and concepts, new discoveries and improvements, as well as novel applications, by leading researchers and developers regarding the latest fundamental advances in the core technologies that form the backbone of social robotics, distinguished developmental projects in the area, as well as seminal works in aesthetic design, ethics and philosophy, studies on social impact and influence, pertaining to social robotics.