Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Autonomous Robots Pub Date : 2023-07-06 DOI:10.1007/s10514-023-10118-4

Tianyu Wang, Vikas Dhiman, Nikolay Atanasov

{"title":"Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning","authors":"Tianyu Wang, Vikas Dhiman, Nikolay Atanasov","doi":"10.1007/s10514-023-10118-4","DOIUrl":null,"url":null,"abstract":"<div><p>This paper focuses on inverse reinforcement learning for autonomous navigation using distance and semantic category observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert’s observations and state-control trajectory. We develop a map encoder, that infers semantic category probabilities from the observation sequence, and a cost encoder, defined as a deep neural network over the semantic features. Since the expert cost is not directly observable, the model parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. We propose a new model of expert behavior that enables error minimization using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. Our approach allows generalizing the learned behavior to new environments with new spatial configurations of the semantic categories. We analyze the different components of our model in a minigrid environment. We also demonstrate that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of buildings, sidewalks, and road lanes.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 6","pages":"809 - 830"},"PeriodicalIF":3.7000,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-023-10118-4.pdf","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Robots","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10514-023-10118-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

Abstract

This paper focuses on inverse reinforcement learning for autonomous navigation using distance and semantic category observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert’s observations and state-control trajectory. We develop a map encoder, that infers semantic category probabilities from the observation sequence, and a cost encoder, defined as a deep neural network over the semantic features. Since the expert cost is not directly observable, the model parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. We propose a new model of expert behavior that enables error minimization using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. Our approach allows generalizing the learned behavior to new environments with new spatial configurations of the semantic categories. We analyze the different components of our model in a minigrid environment. We also demonstrate that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of buildings, sidewalks, and road lanes.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于可微语义映射和规划的自主导航逆强化学习

本文重点研究了利用距离和语义类别观测进行自主导航的反向强化学习。其目的是推断出一个成本函数，该函数可以解释所演示的行为，同时仅依赖于专家的观察和状态控制轨迹。我们开发了一个映射编码器，从观察序列推断语义类别概率，以及一个成本编码器，定义为语义特征上的深度神经网络。由于专家成本不能直接观察到，因此只能通过区分所证明的控制和根据成本估计计算的控制策略之间的误差来优化模型参数。我们提出了一种新的专家行为模型，该模型使用通过运动规划算法仅在有希望的状态的子集上计算的闭式次梯度来实现误差最小化。我们的方法允许将学习到的行为推广到具有语义类别的新空间配置的新环境中。我们在小型网格环境中分析模型的不同组件。我们还证明，我们的方法通过对建筑物、人行道和车道的语义观察，在自动驾驶CARLA模拟器中学习遵守交通规则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Autonomous Robots 工程技术-机器人学

CiteScore

7.90

自引率

5.70%

发文量

审稿时长

3 months

期刊介绍： Autonomous Robots reports on the theory and applications of robotic systems capable of some degree of self-sufficiency. It features papers that include performance data on actual robots in the real world. Coverage includes: control of autonomous robots · real-time vision · autonomous wheeled and tracked vehicles · legged vehicles · computational architectures for autonomous systems · distributed architectures for learning, control and adaptation · studies of autonomous robot systems · sensor fusion · theory of autonomous systems · terrain mapping and recognition · self-calibration and self-repair for robots · self-reproducing intelligent structures · genetic algorithms as models for robot development. The focus is on the ability to move and be self-sufficient, not on whether the system is an imitation of biology. Of course, biological models for robotic systems are of major interest to the journal since living systems are prototypes for autonomous behavior.

期刊最新文献

Isolated Kalman filtering: theory and decoupled estimator design Eigen-factors a bilevel optimization for plane SLAM of 3D point clouds View: visual imitation learning with waypoints Safe and stable teleoperation of quadrotor UAVs under haptic shared autonomy Synthesizing compact behavior trees for probabilistic robotics domains