NAVINACT:结合导航和模仿学习以引导强化学习

Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar
{"title":"NAVINACT:结合导航和模仿学习以引导强化学习","authors":"Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar","doi":"arxiv-2408.04054","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) has shown remarkable progress in simulation\nenvironments, yet its application to real-world robotic tasks remains limited\ndue to challenges in exploration and generalisation. To address these issues,\nwe introduce NAVINACT, a framework that chooses when the robot should use\nclassical motion planning-based navigation and when it should learn a policy.\nTo further improve the efficiency in exploration, we use imitation data to\nbootstrap the exploration. NAVINACT dynamically switches between two modes of\noperation: navigating to a waypoint using classical techniques when away from\nthe objects and reinforcement learning for fine-grained manipulation control\nwhen about to interact with objects. NAVINACT consists of a multi-head\narchitecture composed of ModeNet for mode classification, NavNet for waypoint\nprediction, and InteractNet for precise manipulation. By combining the\nstrengths of RL and Imitation Learning (IL), NAVINACT improves sample\nefficiency and mitigates distribution shift, ensuring robust task execution. We\nevaluate our approach across multiple challenging simulation environments and\nreal-world tasks, demonstrating superior performance in terms of adaptability,\nefficiency, and generalization compared to existing methods. In both simulated\nand real-world settings, NAVINACT demonstrates robust performance. In\nsimulations, NAVINACT surpasses baseline methods by 10-15\\% in training success\nrates at 30k samples and by 30-40\\% during evaluation phases. In real-world\nscenarios, it demonstrates a 30-40\\% higher success rate on simpler tasks\ncompared to baselines and uniquely succeeds in complex, two-stage manipulation\ntasks. Datasets and supplementary materials can be found on our website:\n{https://raaslab.org/projects/NAVINACT/}.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning\",\"authors\":\"Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar\",\"doi\":\"arxiv-2408.04054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement Learning (RL) has shown remarkable progress in simulation\\nenvironments, yet its application to real-world robotic tasks remains limited\\ndue to challenges in exploration and generalisation. To address these issues,\\nwe introduce NAVINACT, a framework that chooses when the robot should use\\nclassical motion planning-based navigation and when it should learn a policy.\\nTo further improve the efficiency in exploration, we use imitation data to\\nbootstrap the exploration. NAVINACT dynamically switches between two modes of\\noperation: navigating to a waypoint using classical techniques when away from\\nthe objects and reinforcement learning for fine-grained manipulation control\\nwhen about to interact with objects. NAVINACT consists of a multi-head\\narchitecture composed of ModeNet for mode classification, NavNet for waypoint\\nprediction, and InteractNet for precise manipulation. By combining the\\nstrengths of RL and Imitation Learning (IL), NAVINACT improves sample\\nefficiency and mitigates distribution shift, ensuring robust task execution. We\\nevaluate our approach across multiple challenging simulation environments and\\nreal-world tasks, demonstrating superior performance in terms of adaptability,\\nefficiency, and generalization compared to existing methods. In both simulated\\nand real-world settings, NAVINACT demonstrates robust performance. In\\nsimulations, NAVINACT surpasses baseline methods by 10-15\\\\% in training success\\nrates at 30k samples and by 30-40\\\\% during evaluation phases. In real-world\\nscenarios, it demonstrates a 30-40\\\\% higher success rate on simpler tasks\\ncompared to baselines and uniquely succeeds in complex, two-stage manipulation\\ntasks. Datasets and supplementary materials can be found on our website:\\n{https://raaslab.org/projects/NAVINACT/}.\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

强化学习(RL)在模拟环境中取得了显著进展,但由于在探索和泛化方面存在挑战,其在现实世界机器人任务中的应用仍然有限。为了解决这些问题,我们引入了 NAVINACT,这是一个可以选择机器人何时应该使用基于经典运动规划的导航,何时应该学习策略的框架。NAVINACT 可在两种操作模式之间动态切换:在远离目标时使用经典技术导航至航点,而在即将与目标交互时则通过强化学习进行细粒度操纵控制。NAVINACT 包含一个多头架构,由用于模式分类的 ModeNet、用于航点预测的 NavNet 和用于精确操控的 InteractNet 组成。通过结合 RL 和模仿学习(IL)的优势,NAVINACT 提高了采样效率,减轻了分布偏移,确保了任务的稳健执行。我们在多个具有挑战性的模拟环境和真实世界任务中对我们的方法进行了评估,结果表明,与现有方法相比,我们的方法在适应性、效率和泛化方面都有卓越表现。在模拟和真实世界环境中,NAVINACT 都表现出了强大的性能。在模拟环境中,NAVINACT 的训练成功率在 30k 样本时超过基准方法 10-15%,在评估阶段超过基准方法 30-40%。在现实世界场景中,与基线方法相比,NAVINACT 在较简单任务上的成功率提高了 30-40%,在复杂的两阶段操作任务上也取得了独一无二的成功。数据集和补充材料请访问我们的网站:{https://raaslab.org/projects/NAVINACT/}。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning
Reinforcement Learning (RL) has shown remarkable progress in simulation environments, yet its application to real-world robotic tasks remains limited due to challenges in exploration and generalisation. To address these issues, we introduce NAVINACT, a framework that chooses when the robot should use classical motion planning-based navigation and when it should learn a policy. To further improve the efficiency in exploration, we use imitation data to bootstrap the exploration. NAVINACT dynamically switches between two modes of operation: navigating to a waypoint using classical techniques when away from the objects and reinforcement learning for fine-grained manipulation control when about to interact with objects. NAVINACT consists of a multi-head architecture composed of ModeNet for mode classification, NavNet for waypoint prediction, and InteractNet for precise manipulation. By combining the strengths of RL and Imitation Learning (IL), NAVINACT improves sample efficiency and mitigates distribution shift, ensuring robust task execution. We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods. In both simulated and real-world settings, NAVINACT demonstrates robust performance. In simulations, NAVINACT surpasses baseline methods by 10-15\% in training success rates at 30k samples and by 30-40\% during evaluation phases. In real-world scenarios, it demonstrates a 30-40\% higher success rate on simpler tasks compared to baselines and uniquely succeeds in complex, two-stage manipulation tasks. Datasets and supplementary materials can be found on our website: {https://raaslab.org/projects/NAVINACT/}.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Abductive explanations of classifiers under constraints: Complexity and properties Explaining Non-monotonic Normative Reasoning using Argumentation Theory with Deontic Logic Towards Explainable Goal Recognition Using Weight of Evidence (WoE): A Human-Centered Approach A Metric Hybrid Planning Approach to Solving Pandemic Planning Problems with Simple SIR Models Neural Networks for Vehicle Routing Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1