NPE-DRL:用非专家策略引导的强化学习增强感知约束的避障

Yuhang Zhang;Chao Yan;Jiaping Xiao;Mir Feroskhan
{"title":"NPE-DRL:用非专家策略引导的强化学习增强感知约束的避障","authors":"Yuhang Zhang;Chao Yan;Jiaping Xiao;Mir Feroskhan","doi":"10.1109/TAI.2024.3464510","DOIUrl":null,"url":null,"abstract":"Obstacle avoidance under constrained visual perception presents a significant challenge, requiring rapid detection and decision-making within partially observable environments, particularly for unmanned aerial vehicles (UAVs) maneuvering agilely in 3-D space. Compared with traditional methods, obstacle avoidance algorithms based on deep reinforcement learning (DRL) offer a better comprehension of the uncertain operational environment in an end-to-end manner, reducing computational complexity, and enhancing flexibility and scalability. However, the inherent trial-and-error learning mechanism of DRL necessitates numerous iterations for policy convergence, leading to sample inefficiency issues. Meanwhile, existing sample-efficient obstacle avoidance approaches that leverage imitation learning often heavily rely on offline expert demonstrations, which are not always feasible in hazardous environments. To address these challenges, we propose a novel obstacle avoidance approach based on nonexpert policy enhanced DRL (NPE-DRL). This approach integrates a fundamental DRL framework with prior knowledge derived from a nonexpert policy-guided imitation learning. During the training phase, the agent starts by online imitating the actions generated by the nonexpert policy during interactions and progressively shifts toward autonomously exploring the environment to generate the optimal policy. Both simulation and physical experiments validate that our approach improves sample efficiency and achieves a better exploration–exploitation balance in both virtual and real-world flights. Additionally, our NPE-DRL-based obstacle avoidance approach shows better adaptability in complex environments characterized by larger scales and denser obstacle configurations, demonstrating a significant improvement in UAVs’ obstacle avoidance capability. Code available at <uri>https://github.com/zzzzzyh111/NonExpert-Guided-Visual-UAV-Navigation-Gazebo</uri>.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"184-198"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NPE-DRL: Enhancing Perception Constrained Obstacle Avoidance With Nonexpert Policy Guided Reinforcement Learning\",\"authors\":\"Yuhang Zhang;Chao Yan;Jiaping Xiao;Mir Feroskhan\",\"doi\":\"10.1109/TAI.2024.3464510\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Obstacle avoidance under constrained visual perception presents a significant challenge, requiring rapid detection and decision-making within partially observable environments, particularly for unmanned aerial vehicles (UAVs) maneuvering agilely in 3-D space. Compared with traditional methods, obstacle avoidance algorithms based on deep reinforcement learning (DRL) offer a better comprehension of the uncertain operational environment in an end-to-end manner, reducing computational complexity, and enhancing flexibility and scalability. However, the inherent trial-and-error learning mechanism of DRL necessitates numerous iterations for policy convergence, leading to sample inefficiency issues. Meanwhile, existing sample-efficient obstacle avoidance approaches that leverage imitation learning often heavily rely on offline expert demonstrations, which are not always feasible in hazardous environments. To address these challenges, we propose a novel obstacle avoidance approach based on nonexpert policy enhanced DRL (NPE-DRL). This approach integrates a fundamental DRL framework with prior knowledge derived from a nonexpert policy-guided imitation learning. During the training phase, the agent starts by online imitating the actions generated by the nonexpert policy during interactions and progressively shifts toward autonomously exploring the environment to generate the optimal policy. Both simulation and physical experiments validate that our approach improves sample efficiency and achieves a better exploration–exploitation balance in both virtual and real-world flights. Additionally, our NPE-DRL-based obstacle avoidance approach shows better adaptability in complex environments characterized by larger scales and denser obstacle configurations, demonstrating a significant improvement in UAVs’ obstacle avoidance capability. Code available at <uri>https://github.com/zzzzzyh111/NonExpert-Guided-Visual-UAV-Navigation-Gazebo</uri>.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 1\",\"pages\":\"184-198\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10684842/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10684842/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在受限视觉感知条件下避障是一项重大挑战,需要在部分可观测环境中快速检测和决策,特别是对于在三维空间中灵活机动的无人机。与传统避障方法相比,基于深度强化学习(DRL)的避障算法能够端到端更好地理解不确定的运行环境,降低了计算复杂度,增强了灵活性和可扩展性。然而,DRL固有的试错学习机制需要大量的迭代来进行策略收敛,从而导致样本效率低下的问题。同时,现有的利用模仿学习的样本高效避障方法通常严重依赖于离线专家演示,这在危险环境中并不总是可行的。为了解决这些问题,我们提出了一种基于非专家策略增强DRL (NPE-DRL)的避障方法。该方法将基本DRL框架与源自非专家策略引导模仿学习的先验知识集成在一起。在训练阶段,智能体从在线模仿非专家策略在交互过程中产生的动作开始,逐步转向自主探索环境以产生最优策略。仿真和物理实验都验证了我们的方法提高了样本效率,并在虚拟和现实世界的飞行中实现了更好的勘探-开采平衡。此外,基于npe - drl的避障方法在更大尺度和更密集障碍物配置的复杂环境中表现出更好的适应性,显著提高了无人机的避障能力。代码可从https://github.com/zzzzzyh111/NonExpert-Guided-Visual-UAV-Navigation-Gazebo获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
NPE-DRL: Enhancing Perception Constrained Obstacle Avoidance With Nonexpert Policy Guided Reinforcement Learning
Obstacle avoidance under constrained visual perception presents a significant challenge, requiring rapid detection and decision-making within partially observable environments, particularly for unmanned aerial vehicles (UAVs) maneuvering agilely in 3-D space. Compared with traditional methods, obstacle avoidance algorithms based on deep reinforcement learning (DRL) offer a better comprehension of the uncertain operational environment in an end-to-end manner, reducing computational complexity, and enhancing flexibility and scalability. However, the inherent trial-and-error learning mechanism of DRL necessitates numerous iterations for policy convergence, leading to sample inefficiency issues. Meanwhile, existing sample-efficient obstacle avoidance approaches that leverage imitation learning often heavily rely on offline expert demonstrations, which are not always feasible in hazardous environments. To address these challenges, we propose a novel obstacle avoidance approach based on nonexpert policy enhanced DRL (NPE-DRL). This approach integrates a fundamental DRL framework with prior knowledge derived from a nonexpert policy-guided imitation learning. During the training phase, the agent starts by online imitating the actions generated by the nonexpert policy during interactions and progressively shifts toward autonomously exploring the environment to generate the optimal policy. Both simulation and physical experiments validate that our approach improves sample efficiency and achieves a better exploration–exploitation balance in both virtual and real-world flights. Additionally, our NPE-DRL-based obstacle avoidance approach shows better adaptability in complex environments characterized by larger scales and denser obstacle configurations, demonstrating a significant improvement in UAVs’ obstacle avoidance capability. Code available at https://github.com/zzzzzyh111/NonExpert-Guided-Visual-UAV-Navigation-Gazebo.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
期刊最新文献
Front Cover Table of Contents IEEE Transactions on Artificial Intelligence Publication Information Table of Contents Front Cover
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1