NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning

arXiv - CS - Artificial Intelligence Pub Date : 2024-08-07 DOI:arxiv-2408.04054

Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar

{"title":"NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning","authors":"Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar","doi":"arxiv-2408.04054","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) has shown remarkable progress in simulation\nenvironments, yet its application to real-world robotic tasks remains limited\ndue to challenges in exploration and generalisation. To address these issues,\nwe introduce NAVINACT, a framework that chooses when the robot should use\nclassical motion planning-based navigation and when it should learn a policy.\nTo further improve the efficiency in exploration, we use imitation data to\nbootstrap the exploration. NAVINACT dynamically switches between two modes of\noperation: navigating to a waypoint using classical techniques when away from\nthe objects and reinforcement learning for fine-grained manipulation control\nwhen about to interact with objects. NAVINACT consists of a multi-head\narchitecture composed of ModeNet for mode classification, NavNet for waypoint\nprediction, and InteractNet for precise manipulation. By combining the\nstrengths of RL and Imitation Learning (IL), NAVINACT improves sample\nefficiency and mitigates distribution shift, ensuring robust task execution. We\nevaluate our approach across multiple challenging simulation environments and\nreal-world tasks, demonstrating superior performance in terms of adaptability,\nefficiency, and generalization compared to existing methods. In both simulated\nand real-world settings, NAVINACT demonstrates robust performance. In\nsimulations, NAVINACT surpasses baseline methods by 10-15\\% in training success\nrates at 30k samples and by 30-40\\% during evaluation phases. In real-world\nscenarios, it demonstrates a 30-40\\% higher success rate on simpler tasks\ncompared to baselines and uniquely succeeds in complex, two-stage manipulation\ntasks. Datasets and supplementary materials can be found on our website:\n{https://raaslab.org/projects/NAVINACT/}.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Reinforcement Learning (RL) has shown remarkable progress in simulation environments, yet its application to real-world robotic tasks remains limited due to challenges in exploration and generalisation. To address these issues, we introduce NAVINACT, a framework that chooses when the robot should use classical motion planning-based navigation and when it should learn a policy. To further improve the efficiency in exploration, we use imitation data to bootstrap the exploration. NAVINACT dynamically switches between two modes of operation: navigating to a waypoint using classical techniques when away from the objects and reinforcement learning for fine-grained manipulation control when about to interact with objects. NAVINACT consists of a multi-head architecture composed of ModeNet for mode classification, NavNet for waypoint prediction, and InteractNet for precise manipulation. By combining the strengths of RL and Imitation Learning (IL), NAVINACT improves sample efficiency and mitigates distribution shift, ensuring robust task execution. We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods. In both simulated and real-world settings, NAVINACT demonstrates robust performance. In simulations, NAVINACT surpasses baseline methods by 10-15\% in training success rates at 30k samples and by 30-40\% during evaluation phases. In real-world scenarios, it demonstrates a 30-40\% higher success rate on simpler tasks compared to baselines and uniquely succeeds in complex, two-stage manipulation tasks. Datasets and supplementary materials can be found on our website: {https://raaslab.org/projects/NAVINACT/}.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NAVINACT：结合导航和模仿学习以引导强化学习

强化学习（RL）在模拟环境中取得了显著进展，但由于在探索和泛化方面存在挑战，其在现实世界机器人任务中的应用仍然有限。为了解决这些问题，我们引入了 NAVINACT，这是一个可以选择机器人何时应该使用基于经典运动规划的导航，何时应该学习策略的框架。NAVINACT 可在两种操作模式之间动态切换：在远离目标时使用经典技术导航至航点，而在即将与目标交互时则通过强化学习进行细粒度操纵控制。NAVINACT 包含一个多头架构，由用于模式分类的 ModeNet、用于航点预测的 NavNet 和用于精确操控的 InteractNet 组成。通过结合 RL 和模仿学习（IL）的优势，NAVINACT 提高了采样效率，减轻了分布偏移，确保了任务的稳健执行。我们在多个具有挑战性的模拟环境和真实世界任务中对我们的方法进行了评估，结果表明，与现有方法相比，我们的方法在适应性、效率和泛化方面都有卓越表现。在模拟和真实世界环境中，NAVINACT 都表现出了强大的性能。在模拟环境中，NAVINACT 的训练成功率在 30k 样本时超过基准方法 10-15%，在评估阶段超过基准方法 30-40%。在现实世界场景中，与基线方法相比，NAVINACT 在较简单任务上的成功率提高了 30-40%，在复杂的两阶段操作任务上也取得了独一无二的成功。数据集和补充材料请访问我们的网站：{https://raaslab.org/projects/NAVINACT/}。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量