Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar
{"title":"NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning","authors":"Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar","doi":"arxiv-2408.04054","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) has shown remarkable progress in simulation\nenvironments, yet its application to real-world robotic tasks remains limited\ndue to challenges in exploration and generalisation. To address these issues,\nwe introduce NAVINACT, a framework that chooses when the robot should use\nclassical motion planning-based navigation and when it should learn a policy.\nTo further improve the efficiency in exploration, we use imitation data to\nbootstrap the exploration. NAVINACT dynamically switches between two modes of\noperation: navigating to a waypoint using classical techniques when away from\nthe objects and reinforcement learning for fine-grained manipulation control\nwhen about to interact with objects. NAVINACT consists of a multi-head\narchitecture composed of ModeNet for mode classification, NavNet for waypoint\nprediction, and InteractNet for precise manipulation. By combining the\nstrengths of RL and Imitation Learning (IL), NAVINACT improves sample\nefficiency and mitigates distribution shift, ensuring robust task execution. We\nevaluate our approach across multiple challenging simulation environments and\nreal-world tasks, demonstrating superior performance in terms of adaptability,\nefficiency, and generalization compared to existing methods. In both simulated\nand real-world settings, NAVINACT demonstrates robust performance. In\nsimulations, NAVINACT surpasses baseline methods by 10-15\\% in training success\nrates at 30k samples and by 30-40\\% during evaluation phases. In real-world\nscenarios, it demonstrates a 30-40\\% higher success rate on simpler tasks\ncompared to baselines and uniquely succeeds in complex, two-stage manipulation\ntasks. Datasets and supplementary materials can be found on our website:\n{https://raaslab.org/projects/NAVINACT/}.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement Learning (RL) has shown remarkable progress in simulation
environments, yet its application to real-world robotic tasks remains limited
due to challenges in exploration and generalisation. To address these issues,
we introduce NAVINACT, a framework that chooses when the robot should use
classical motion planning-based navigation and when it should learn a policy.
To further improve the efficiency in exploration, we use imitation data to
bootstrap the exploration. NAVINACT dynamically switches between two modes of
operation: navigating to a waypoint using classical techniques when away from
the objects and reinforcement learning for fine-grained manipulation control
when about to interact with objects. NAVINACT consists of a multi-head
architecture composed of ModeNet for mode classification, NavNet for waypoint
prediction, and InteractNet for precise manipulation. By combining the
strengths of RL and Imitation Learning (IL), NAVINACT improves sample
efficiency and mitigates distribution shift, ensuring robust task execution. We
evaluate our approach across multiple challenging simulation environments and
real-world tasks, demonstrating superior performance in terms of adaptability,
efficiency, and generalization compared to existing methods. In both simulated
and real-world settings, NAVINACT demonstrates robust performance. In
simulations, NAVINACT surpasses baseline methods by 10-15\% in training success
rates at 30k samples and by 30-40\% during evaluation phases. In real-world
scenarios, it demonstrates a 30-40\% higher success rate on simpler tasks
compared to baselines and uniquely succeeds in complex, two-stage manipulation
tasks. Datasets and supplementary materials can be found on our website:
{https://raaslab.org/projects/NAVINACT/}.