视觉状态融合：为自主机器人改进深度神经网络

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Intelligent & Robotic Systems Pub Date : 2024-04-10 DOI:10.1007/s10846-024-02091-6

Elia Cereda, Stefano Bonato, Mirko Nava, Alessandro Giusti, Daniele Palossi

{"title":"视觉状态融合：为自主机器人改进深度神经网络","authors":"Elia Cereda, Stefano Bonato, Mirko Nava, Alessandro Giusti, Daniele Palossi","doi":"10.1007/s10846-024-02091-6","DOIUrl":null,"url":null,"abstract":"Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot’s state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e., mediated approaches, the robot’s state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time – to the best of our knowledge – to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general methodology improves the regression performance of deep convolutional neural networks (CNNs) on a broad class of non-egocentric 3D pose estimation problems, with minimal computational cost. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R\\(^{2}\\) regression metric, up to +0.51, compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous cm-scale UAV on the human pose estimation task. Our results show a significant reduction, i.e., 24% on average, on the mean absolute error of our stateful CNN, compared to a State-of-the-Art stateless counterpart.","PeriodicalId":54794,"journal":{"name":"Journal of Intelligent & Robotic Systems","volume":"84 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics\",\"authors\":\"Elia Cereda, Stefano Bonato, Mirko Nava, Alessandro Giusti, Daniele Palossi\",\"doi\":\"10.1007/s10846-024-02091-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot’s state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e., mediated approaches, the robot’s state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time – to the best of our knowledge – to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general methodology improves the regression performance of deep convolutional neural networks (CNNs) on a broad class of non-egocentric 3D pose estimation problems, with minimal computational cost. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R\\\\(^{2}\\\\) regression metric, up to +0.51, compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous cm-scale UAV on the human pose estimation task. Our results show a significant reduction, i.e., 24% on average, on the mean absolute error of our stateful CNN, compared to a State-of-the-Art stateless counterpart.\",\"PeriodicalId\":54794,\"journal\":{\"name\":\"Journal of Intelligent & Robotic Systems\",\"volume\":\"84 1\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent & Robotic Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10846-024-02091-6\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Robotic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10846-024-02091-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于视觉的深度学习感知在机器人技术中发挥着至关重要的作用，有助于解决许多具有挑战性的场景，例如自主无人飞行器（UAV）的杂技表演和机器人辅助高精度手术。以控制为导向的端到端感知方法直接为机器人输出控制变量，通常利用机器人的状态估计作为辅助输入。当中间输出被估算并馈送至下级控制器（即中介方法）时，机器人的状态通常只被用作以自我为中心的任务的输入，即估算机器人本身的物理属性。在这项工作中，我们首次提出将类似方法应用于非以自我为中心的中介任务，在这种任务中，估计的输出指的是外部主体。我们证明了我们的通用方法如何以最小的计算成本提高深度卷积神经网络（CNN）在一大类非自我中心三维姿态估计问题上的回归性能。通过分析从使用机械臂抓取到使用袖珍型无人机跟踪人体等三种高度不同的使用案例，我们的结果与无状态基线相比，持续改善了 R\(^{2}\) 回归指标，最高可达 +0.51。最后，我们验证了厘米级闭环自主无人机在人体姿态估计任务中的现场性能。结果表明，与最先进的无状态 CNN 相比，我们的有状态 CNN 的平均绝对误差大幅降低，平均降低了 24%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics

Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot’s state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e., mediated approaches, the robot’s state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time – to the best of our knowledge – to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general methodology improves the regression performance of deep convolutional neural networks (CNNs) on a broad class of non-egocentric 3D pose estimation problems, with minimal computational cost. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R\(^{2}\) regression metric, up to +0.51, compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous cm-scale UAV on the human pose estimation task. Our results show a significant reduction, i.e., 24% on average, on the mean absolute error of our stateful CNN, compared to a State-of-the-Art stateless counterpart.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Intelligent & Robotic Systems 工程技术-机器人学

CiteScore

7.00

自引率

9.10%

发文量

219

审稿时长

6 months

期刊介绍： The Journal of Intelligent and Robotic Systems bridges the gap between theory and practice in all areas of intelligent systems and robotics. It publishes original, peer reviewed contributions from initial concept and theory to prototyping to final product development and commercialization. On the theoretical side, the journal features papers focusing on intelligent systems engineering, distributed intelligence systems, multi-level systems, intelligent control, multi-robot systems, cooperation and coordination of unmanned vehicle systems, etc. On the application side, the journal emphasizes autonomous systems, industrial robotic systems, multi-robot systems, aerial vehicles, mobile robot platforms, underwater robots, sensors, sensor-fusion, and sensor-based control. Readers will also find papers on real applications of intelligent and robotic systems (e.g., mechatronics, manufacturing, biomedical, underwater, humanoid, mobile/legged robot and space applications, etc.).