首页 > 最新文献

Conference on Robot Learning最新文献

英文 中文
MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models MResT:利用视觉语言模型实现实时控制的多分辨率传感技术
Pub Date : 2024-01-25 DOI: 10.48550/arXiv.2401.14502
Saumya Saxena, Mohit Sharma, Oliver Kroemer
Leveraging sensing modalities across diverse spatial and temporal resolutions can improve performance of robotic manipulation tasks. Multi-spatial resolution sensing provides hierarchical information captured at different spatial scales and enables both coarse and precise motions. Simultaneously multi-temporal resolution sensing enables the agent to exhibit high reactivity and real-time control. In this work, we propose a framework, MResT (Multi-Resolution Transformer), for learning generalizable language-conditioned multi-task policies that utilize sensing at different spatial and temporal resolutions using networks of varying capacities to effectively perform real time control of precise and reactive tasks. We leverage off-the-shelf pretrained vision-language models to operate on low-frequency global features along with small non-pretrained models to adapt to high frequency local feedback. Through extensive experiments in 3 domains (coarse, precise and dynamic manipulation tasks), we show that our approach significantly improves (2X on average) over recent multi-task baselines. Further, our approach generalizes well to visual and geometric variations in target objects and to varying interaction forces.
利用不同空间和时间分辨率的传感模式可以提高机器人操纵任务的性能。多空间分辨率传感可提供在不同空间尺度捕捉到的分层信息,实现粗略和精确的运动。同时,多时间分辨率传感还能让机器人表现出高度的反应能力和实时控制能力。在这项工作中,我们提出了一个名为 MResT(多分辨率转换器)的框架,用于学习可通用的语言条件多任务策略,利用不同容量的网络,利用不同空间和时间分辨率的传感,有效地执行精确和反应性任务的实时控制。我们利用现成的预训练视觉语言模型来处理低频全局特征,同时利用小型非预训练模型来适应高频局部反馈。通过在 3 个领域(粗略、精确和动态操作任务)的广泛实验,我们发现我们的方法比最近的多任务基线有显著提高(平均提高 2 倍)。此外,我们的方法还能很好地适应目标对象的视觉和几何变化以及不同的交互力。
{"title":"MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models","authors":"Saumya Saxena, Mohit Sharma, Oliver Kroemer","doi":"10.48550/arXiv.2401.14502","DOIUrl":"https://doi.org/10.48550/arXiv.2401.14502","url":null,"abstract":"Leveraging sensing modalities across diverse spatial and temporal resolutions can improve performance of robotic manipulation tasks. Multi-spatial resolution sensing provides hierarchical information captured at different spatial scales and enables both coarse and precise motions. Simultaneously multi-temporal resolution sensing enables the agent to exhibit high reactivity and real-time control. In this work, we propose a framework, MResT (Multi-Resolution Transformer), for learning generalizable language-conditioned multi-task policies that utilize sensing at different spatial and temporal resolutions using networks of varying capacities to effectively perform real time control of precise and reactive tasks. We leverage off-the-shelf pretrained vision-language models to operate on low-frequency global features along with small non-pretrained models to adapt to high frequency local feedback. Through extensive experiments in 3 domains (coarse, precise and dynamic manipulation tasks), we show that our approach significantly improves (2X on average) over recent multi-task baselines. Further, our approach generalizes well to visual and geometric variations in target objects and to varying interaction forces.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"299 3","pages":"2210-2228"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140495172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion 具有空间感知的Shapley值的激光雷达选线,用于经济高效的深度完成
Pub Date : 2023-03-21 DOI: 10.48550/arXiv.2303.11720
Kamil Adamczewski, Christos Sakaridis, Vaishakh Patil, L. Gool
Lidar is a vital sensor for estimating the depth of a scene. Typical spinning lidars emit pulses arranged in several horizontal lines and the monetary cost of the sensor increases with the number of these lines. In this work, we present the new problem of optimizing the positioning of lidar lines to find the most effective configuration for the depth completion task. We propose a solution to reduce the number of lines while retaining the up-to-the-mark quality of depth completion. Our method consists of two components, (1) line selection based on the marginal contribution of a line computed via the Shapley value and (2) incorporating line position spread to take into account its need to arrive at image-wide depth completion. Spatially-aware Shapley values (SaS) succeed in selecting line subsets that yield a depth accuracy comparable to the full lidar input while using just half of the lines.
激光雷达是估计景深的重要传感器。典型的旋转激光雷达发射的脉冲排列在几条水平线上,传感器的货币成本随着这些线的数量而增加。在这项工作中,我们提出了优化激光雷达线定位的新问题,以找到深度完成任务的最有效配置。我们提出了一个解决方案,以减少线的数量,同时保持最新的深度完井质量。我们的方法由两个部分组成,(1)基于通过Shapley值计算的线的边际贡献进行线选择,(2)考虑到需要达到图像范围的深度补全。空间感知的Shapley值(SaS)可以成功地选择线子集,在只使用一半的线的情况下,产生与全激光雷达输入相媲美的深度精度。
{"title":"Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion","authors":"Kamil Adamczewski, Christos Sakaridis, Vaishakh Patil, L. Gool","doi":"10.48550/arXiv.2303.11720","DOIUrl":"https://doi.org/10.48550/arXiv.2303.11720","url":null,"abstract":"Lidar is a vital sensor for estimating the depth of a scene. Typical spinning lidars emit pulses arranged in several horizontal lines and the monetary cost of the sensor increases with the number of these lines. In this work, we present the new problem of optimizing the positioning of lidar lines to find the most effective configuration for the depth completion task. We propose a solution to reduce the number of lines while retaining the up-to-the-mark quality of depth completion. Our method consists of two components, (1) line selection based on the marginal contribution of a line computed via the Shapley value and (2) incorporating line position spread to take into account its need to arrive at image-wide depth completion. Spatially-aware Shapley values (SaS) succeed in selecting line subsets that yield a depth accuracy comparable to the full lidar input while using just half of the lines.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117280268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Safe Robot Learning in Assistive Devices through Neural Network Repair 基于神经网络修复的辅助设备安全机器人学习
Pub Date : 2023-03-08 DOI: 10.48550/arXiv.2303.04431
K. Majd, Geoffrey Clark, Tanmay Khandait, Siyu Zhou, S. Sankaranarayanan, Georgios Fainekos, H. B. Amor
Assistive robotic devices are a particularly promising field of application for neural networks (NN) due to the need for personalization and hard-to-model human-machine interaction dynamics. However, NN based estimators and controllers may produce potentially unsafe outputs over previously unseen data points. In this paper, we introduce an algorithm for updating NN control policies to satisfy a given set of formal safety constraints, while also optimizing the original loss function. Given a set of mixed-integer linear constraints, we define the NN repair problem as a Mixed Integer Quadratic Program (MIQP). In extensive experiments, we demonstrate the efficacy of our repair method in generating safe policies for a lower-leg prosthesis.
由于需要个性化和难以建模的人机交互动力学,辅助机器人设备是神经网络(NN)特别有前途的应用领域。然而,基于神经网络的估计器和控制器可能会对先前未见过的数据点产生潜在的不安全输出。在本文中,我们引入了一种算法来更新神经网络控制策略以满足给定的一组形式安全约束,同时也优化了原始损失函数。给定一组混合整数线性约束,我们将神经网络修复问题定义为混合整数二次规划(MIQP)。在大量的实验中,我们证明了我们的修复方法在为下肢假体生成安全策略方面的有效性。
{"title":"Safe Robot Learning in Assistive Devices through Neural Network Repair","authors":"K. Majd, Geoffrey Clark, Tanmay Khandait, Siyu Zhou, S. Sankaranarayanan, Georgios Fainekos, H. B. Amor","doi":"10.48550/arXiv.2303.04431","DOIUrl":"https://doi.org/10.48550/arXiv.2303.04431","url":null,"abstract":"Assistive robotic devices are a particularly promising field of application for neural networks (NN) due to the need for personalization and hard-to-model human-machine interaction dynamics. However, NN based estimators and controllers may produce potentially unsafe outputs over previously unseen data points. In this paper, we introduce an algorithm for updating NN control policies to satisfy a given set of formal safety constraints, while also optimizing the original loss function. Given a set of mixed-integer linear constraints, we define the NN repair problem as a Mixed Integer Quadratic Program (MIQP). In extensive experiments, we demonstrate the efficacy of our repair method in generating safe policies for a lower-leg prosthesis.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115027916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COACH: Cooperative Robot Teaching 教练:合作机器人教学
Pub Date : 2023-02-13 DOI: 10.48550/arXiv.2302.06199
Cunjun Yu, Yiqing Xu, Linfeng Li, David Hsu
Knowledge and skills can transfer from human teachers to human students. However, such direct transfer is often not scalable for physical tasks, as they require one-to-one interaction, and human teachers are not available in sufficient numbers. Machine learning enables robots to become experts and play the role of teachers to help in this situation. In this work, we formalize cooperative robot teaching as a Markov game, consisting of four key elements: the target task, the student model, the teacher model, and the interactive teaching-learning process. Under a moderate assumption, the Markov game reduces to a partially observable Markov decision process, with an efficient approximate solution. We illustrate our approach on two cooperative tasks, one in a simulated video game and one with a real robot.
知识和技能可以从人类教师转移到人类学生。然而,这种直接转移通常不能扩展到物理任务,因为它们需要一对一的交互,而且人类教师的数量不够。机器学习使机器人成为专家,并在这种情况下扮演老师的角色。在这项工作中,我们将合作机器人教学形式化为一个马尔可夫游戏,由四个关键要素组成:目标任务、学生模型、教师模型和互动的教与学过程。在适度的假设下,马尔可夫博弈简化为部分可观察的马尔可夫决策过程,具有有效的近似解。我们用两个合作任务来说明我们的方法,一个在模拟视频游戏中,另一个在真实的机器人中。
{"title":"COACH: Cooperative Robot Teaching","authors":"Cunjun Yu, Yiqing Xu, Linfeng Li, David Hsu","doi":"10.48550/arXiv.2302.06199","DOIUrl":"https://doi.org/10.48550/arXiv.2302.06199","url":null,"abstract":"Knowledge and skills can transfer from human teachers to human students. However, such direct transfer is often not scalable for physical tasks, as they require one-to-one interaction, and human teachers are not available in sufficient numbers. Machine learning enables robots to become experts and play the role of teachers to help in this situation. In this work, we formalize cooperative robot teaching as a Markov game, consisting of four key elements: the target task, the student model, the teacher model, and the interactive teaching-learning process. Under a moderate assumption, the Markov game reduces to a partially observable Markov decision process, with an efficient approximate solution. We illustrate our approach on two cooperative tasks, one in a simulated video game and one with a real robot.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133480628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping 基于自监督奖励塑造的目标条件策略离线学习
Pub Date : 2023-01-05 DOI: 10.48550/arXiv.2301.02099
Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, A. Lazaric, Alahari Karteek
Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming. Moreover, manually designing reward functions for every single desired skill is prohibitive. Prior works targeted these challenges by learning goal-conditioned policies from offline datasets without manually specified rewards, through hindsight relabelling. These methods suffer from the issue of sparsity of rewards, and fail at long-horizon tasks. In this work, we propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model, and shape a dense reward function for learning policies offline. We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches, especially on tasks that involve long-term planning.
开发能够通过学习预先收集的数据集来执行多种技能的代理是机器人技术中的一个重要问题,因为与环境的在线交互非常耗时。此外,手动为每一项所需技能设计奖励功能是令人望而却步的。先前的工作通过后见之明重新标记,从离线数据集中学习目标条件策略,而无需手动指定奖励,从而解决了这些挑战。这些方法受到奖励稀疏性的问题的影响,并且在长期任务中失败。在这项工作中,我们在预先收集的数据集上提出了一种新的自监督学习阶段,以了解模型的结构和动态,并为离线学习策略塑造密集的奖励函数。我们在三个连续控制任务上评估了我们的方法,并表明我们的模型明显优于现有的方法,特别是在涉及长期规划的任务上。
{"title":"Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping","authors":"Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, A. Lazaric, Alahari Karteek","doi":"10.48550/arXiv.2301.02099","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02099","url":null,"abstract":"Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming. Moreover, manually designing reward functions for every single desired skill is prohibitive. Prior works targeted these challenges by learning goal-conditioned policies from offline datasets without manually specified rewards, through hindsight relabelling. These methods suffer from the issue of sparsity of rewards, and fail at long-horizon tasks. In this work, we propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model, and shape a dense reward function for learning policies offline. We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches, especially on tasks that involve long-term planning.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129411361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Learning Road Scene-level Representations via Semantic Region Prediction 通过语义区域预测学习道路场景级表示
Pub Date : 2023-01-02 DOI: 10.48550/arXiv.2301.00714
Zihao Xiao, A. Yuille, Yi-Ting Chen
In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.
在这项工作中,我们解决了自动驾驶系统中的两个重要任务,即驾驶员意图预测和基于自我中心图像的风险对象识别。我们主要研究的问题是:对于这两个任务,什么是好的道路场景级表示?我们认为,场景级表示必须在执行动作到目的地时捕获自我车辆周围交通场景的更高级别语义和几何表示。为此,我们引入了语义区域的表示,这些区域是自我车辆在采取适当行动时访问的区域(例如,在4路交叉路口左转)。我们提出了一种新的语义区域预测任务和自动语义区域标注算法来学习场景级表示。在HDD和nuScenes数据集上进行了广泛的评估,学习表征导致了驾驶员意图预测和风险对象识别的最先进性能。
{"title":"Learning Road Scene-level Representations via Semantic Region Prediction","authors":"Zihao Xiao, A. Yuille, Yi-Ting Chen","doi":"10.48550/arXiv.2301.00714","DOIUrl":"https://doi.org/10.48550/arXiv.2301.00714","url":null,"abstract":"In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115113805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Offline Reinforcement Learning for Visual Navigation 视觉导航的离线强化学习
Pub Date : 2022-12-16 DOI: 10.48550/arXiv.2212.08244
Dhruv Shah, Arjun Bhorkar, Hrish Leen, Ilya Kostrikov, Nicholas Rhinehart, S. Levine
Reinforcement learning can enable robots to navigate to distant goals while optimizing user-specified reward functions, including preferences for following lanes, staying on paved paths, or avoiding freshly mowed grass. However, online learning from trial-and-error for real-world robots is logistically challenging, and methods that instead can utilize existing datasets of robotic navigation data could be significantly more scalable and enable broader generalization. In this paper, we present ReViND, the first offline RL system for robotic navigation that can leverage previously collected data to optimize user-specified reward functions in the real-world. We evaluate our system for off-road navigation without any additional data collection or fine-tuning, and show that it can navigate to distant goals using only offline training from this dataset, and exhibit behaviors that qualitatively differ based on the user-specified reward function.
强化学习可以使机器人导航到遥远的目标,同时优化用户指定的奖励功能,包括偏好跟随车道,留在铺砌的道路上,或者避开新割的草。然而,对于现实世界的机器人来说,从试错中进行在线学习在逻辑上是具有挑战性的,而利用现有机器人导航数据集的方法可能更具可扩展性,并能实现更广泛的推广。在本文中,我们介绍了ReViND,这是第一个用于机器人导航的离线强化学习系统,可以利用先前收集的数据来优化现实世界中用户指定的奖励函数。我们在没有任何额外数据收集或微调的情况下评估了我们的越野导航系统,并表明它可以仅使用该数据集的离线训练导航到遥远的目标,并根据用户指定的奖励函数显示出定性不同的行为。
{"title":"Offline Reinforcement Learning for Visual Navigation","authors":"Dhruv Shah, Arjun Bhorkar, Hrish Leen, Ilya Kostrikov, Nicholas Rhinehart, S. Levine","doi":"10.48550/arXiv.2212.08244","DOIUrl":"https://doi.org/10.48550/arXiv.2212.08244","url":null,"abstract":"Reinforcement learning can enable robots to navigate to distant goals while optimizing user-specified reward functions, including preferences for following lanes, staying on paved paths, or avoiding freshly mowed grass. However, online learning from trial-and-error for real-world robots is logistically challenging, and methods that instead can utilize existing datasets of robotic navigation data could be significantly more scalable and enable broader generalization. In this paper, we present ReViND, the first offline RL system for robotic navigation that can leverage previously collected data to optimize user-specified reward functions in the real-world. We evaluate our system for off-road navigation without any additional data collection or fine-tuning, and show that it can navigate to distant goals using only offline training from this dataset, and exhibit behaviors that qualitatively differ based on the user-specified reward function.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121279339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
JFP: Joint Future Prediction with Interactive Multi-Agent Modeling for Autonomous Driving 基于交互式多智能体模型的自动驾驶联合未来预测
Pub Date : 2022-12-16 DOI: 10.48550/arXiv.2212.08710
Wenjie Luo, C. Park, Andre Cornman, Benjamin Sapp, Drago Anguelov
We propose JFP, a Joint Future Prediction model that can learn to generate accurate and consistent multi-agent future trajectories. For this task, many different methods have been proposed to capture social interactions in the encoding part of the model, however, considerably less focus has been placed on representing interactions in the decoder and output stages. As a result, the predicted trajectories are not necessarily consistent with each other, and often result in unrealistic trajectory overlaps. In contrast, we propose an end-to-end trainable model that learns directly the interaction between pairs of agents in a structured, graphical model formulation in order to generate consistent future trajectories. It sets new state-of-the-art results on Waymo Open Motion Dataset (WOMD) for the interactive setting. We also investigate a more complex multi-agent setting for both WOMD and a larger internal dataset, where our approach improves significantly on the trajectory overlap metrics while obtaining on-par or better performance on single-agent trajectory metrics.
我们提出JFP,一个联合未来预测模型,可以学习生成准确和一致的多智能体未来轨迹。对于这项任务,已经提出了许多不同的方法来捕获模型编码部分的社会交互,然而,对表示解码器和输出阶段中的交互的关注相当少。因此,预测的轨迹不一定彼此一致,并且经常导致不切实际的轨迹重叠。相比之下,我们提出了一个端到端的可训练模型,该模型以结构化的图形模型公式直接学习代理对之间的交互,以生成一致的未来轨迹。它为交互式设置在Waymo开放运动数据集(WOMD)上设置新的最先进的结果。我们还针对WOMD和更大的内部数据集研究了更复杂的多智能体设置,其中我们的方法在轨迹重叠指标上显著提高,同时在单智能体轨迹指标上获得同等或更好的性能。
{"title":"JFP: Joint Future Prediction with Interactive Multi-Agent Modeling for Autonomous Driving","authors":"Wenjie Luo, C. Park, Andre Cornman, Benjamin Sapp, Drago Anguelov","doi":"10.48550/arXiv.2212.08710","DOIUrl":"https://doi.org/10.48550/arXiv.2212.08710","url":null,"abstract":"We propose JFP, a Joint Future Prediction model that can learn to generate accurate and consistent multi-agent future trajectories. For this task, many different methods have been proposed to capture social interactions in the encoding part of the model, however, considerably less focus has been placed on representing interactions in the decoder and output stages. As a result, the predicted trajectories are not necessarily consistent with each other, and often result in unrealistic trajectory overlaps. In contrast, we propose an end-to-end trainable model that learns directly the interaction between pairs of agents in a structured, graphical model formulation in order to generate consistent future trajectories. It sets new state-of-the-art results on Waymo Open Motion Dataset (WOMD) for the interactive setting. We also investigate a more complex multi-agent setting for both WOMD and a larger internal dataset, where our approach improves significantly on the trajectory overlap metrics while obtaining on-par or better performance on single-agent trajectory metrics.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125621228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Learning Markerless Robot-Depth Camera Calibration and End-Effector Pose Estimation 学习无标记机器人深度摄像机标定和末端执行器姿态估计
Pub Date : 2022-12-15 DOI: 10.48550/arXiv.2212.07567
B. C. Sefercik, Barış Akgün
Traditional approaches to extrinsic calibration use fiducial markers and learning-based approaches rely heavily on simulation data. In this work, we present a learning-based markerless extrinsic calibration system that uses a depth camera and does not rely on simulation data. We learn models for end-effector (EE) segmentation, single-frame rotation prediction and keypoint detection, from automatically generated real-world data. We use a transformation trick to get EE pose estimates from rotation predictions and a matching algorithm to get EE pose estimates from keypoint predictions. We further utilize the iterative closest point algorithm, multiple-frames, filtering and outlier detection to increase calibration robustness. Our evaluations with training data from multiple camera poses and test data from previously unseen poses give sub-centimeter and sub-deciradian average calibration and pose estimation errors. We also show that a carefully selected single training pose gives comparable results.
传统的外部校准方法使用基准标记,而基于学习的方法严重依赖于模拟数据。在这项工作中,我们提出了一种基于学习的无标记外部校准系统,该系统使用深度相机,不依赖于模拟数据。我们从自动生成的真实世界数据中学习末端执行器(EE)分割,单帧旋转预测和关键点检测的模型。我们使用变换技巧从旋转预测中获得EE姿态估计,并使用匹配算法从关键点预测中获得EE姿态估计。我们进一步利用迭代最近点算法、多帧、滤波和离群点检测来提高校准鲁棒性。我们使用来自多个相机姿势的训练数据和来自以前未见过的姿势的测试数据进行评估,给出了亚厘米和亚分度的平均校准和姿势估计误差。我们还表明,精心挑选的单一训练姿势给出了可比的结果。
{"title":"Learning Markerless Robot-Depth Camera Calibration and End-Effector Pose Estimation","authors":"B. C. Sefercik, Barış Akgün","doi":"10.48550/arXiv.2212.07567","DOIUrl":"https://doi.org/10.48550/arXiv.2212.07567","url":null,"abstract":"Traditional approaches to extrinsic calibration use fiducial markers and learning-based approaches rely heavily on simulation data. In this work, we present a learning-based markerless extrinsic calibration system that uses a depth camera and does not rely on simulation data. We learn models for end-effector (EE) segmentation, single-frame rotation prediction and keypoint detection, from automatically generated real-world data. We use a transformation trick to get EE pose estimates from rotation predictions and a matching algorithm to get EE pose estimates from keypoint predictions. We further utilize the iterative closest point algorithm, multiple-frames, filtering and outlier detection to increase calibration robustness. Our evaluations with training data from multiple camera poses and test data from previously unseen poses give sub-centimeter and sub-deciradian average calibration and pose estimation errors. We also show that a carefully selected single training pose gives comparable results.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"2 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116862241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving HUM3DIL:用于自动驾驶的半监督多模态三维人体姿态估计
Pub Date : 2022-12-15 DOI: 10.48550/arXiv.2212.07729
Andrei Zanfir, M. Zanfir, Alexander N. Gorban, Jingwei Ji, Yin Zhou, Drago Anguelov, C. Sminchisescu
Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.
自动驾驶是一个令人兴奋的新兴产业,提出了重要的研究问题。在感知模块中,三维人体姿态估计是一项新兴技术,可使自动驾驶汽车感知和理解行人微妙而复杂的行为。几十年来,硬件系统和传感器都有了显著改善,汽车可能配备了复杂的激光雷达和视觉系统,用于获取这些新信息的专用数据集也在不断扩大,但利用这些新信号来解决三维人体姿态估计这一核心问题的工作却不多。我们称之为 HUM3DIL(HUMan 3D from Images and LiDAR)的方法以半监督的方式有效地利用了这些互补信号,并在很大程度上优于现有方法。它是一种快速、紧凑的机载模型。具体来说,我们将激光雷达点嵌入像素对齐的多模态特征中,并通过一系列变换器细化阶段对其进行处理。在Waymo开放数据集上进行的定量实验支持了这些说法,我们在三维姿态估计任务上取得了最先进的结果。
{"title":"HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving","authors":"Andrei Zanfir, M. Zanfir, Alexander N. Gorban, Jingwei Ji, Yin Zhou, Drago Anguelov, C. Sminchisescu","doi":"10.48550/arXiv.2212.07729","DOIUrl":"https://doi.org/10.48550/arXiv.2212.07729","url":null,"abstract":"Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125939427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Conference on Robot Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1