Model inductive bias enhanced deep reinforcement learning for robot navigation in crowded environments

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2024-07-02 DOI:10.1007/s40747-024-01493-1

Man Chen, Yongjie Huang, Weiwen Wang, Yao Zhang, Lei Xu, Zhisong Pan

{"title":"Model inductive bias enhanced deep reinforcement learning for robot navigation in crowded environments","authors":"Man Chen, Yongjie Huang, Weiwen Wang, Yao Zhang, Lei Xu, Zhisong Pan","doi":"10.1007/s40747-024-01493-1","DOIUrl":null,"url":null,"abstract":"<p>Navigating mobile robots in crowded environments poses a significant challenge and is essential for the coexistence of robots and humans in future intelligent societies. As a pragmatic data-driven approach, deep reinforcement learning (DRL) holds promise for addressing this challenge. However, current DRL-based navigation methods have possible improvements in understanding agent interactions, feedback mechanism design, and decision foresight in dynamic environments. This paper introduces the model inductive bias enhanced deep reinforcement learning (MIBE-DRL) method, drawing inspiration from a fusion of data-driven and model-driven techniques. MIBE-DRL extensively incorporates model inductive bias into the deep reinforcement learning framework, enhancing the efficiency and safety of robot navigation. The proposed approach entails a multi-interaction network featuring three modules designed to comprehensively understand potential agent interactions in dynamic environments. The pedestrian interaction module can model interactions among humans, while the temporal and spatial interaction modules consider agent interactions in both temporal and spatial dimensions. Additionally, the paper constructs a reward system that fully accounts for the robot’s direction and position factors. This system's directional and positional reward functions are built based on artificial potential fields (APF) and navigation rules, respectively, which can provide reasoned evaluations for the robot's motion direction and position during training, enabling it to receive comprehensive feedback. Furthermore, the incorporation of Monte-Carlo tree search (MCTS) facilitates the development of a foresighted action strategy, enabling robots to execute actions with long-term planning considerations. Experimental results demonstrate that integrating model inductive bias significantly enhances the navigation performance of MIBE-DRL. Compared to state-of-the-art methods, MIBE-DRL achieves the highest success rate in crowded environments and demonstrates advantages in navigation time and maintaining a safe social distance from humans.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"31 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01493-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Navigating mobile robots in crowded environments poses a significant challenge and is essential for the coexistence of robots and humans in future intelligent societies. As a pragmatic data-driven approach, deep reinforcement learning (DRL) holds promise for addressing this challenge. However, current DRL-based navigation methods have possible improvements in understanding agent interactions, feedback mechanism design, and decision foresight in dynamic environments. This paper introduces the model inductive bias enhanced deep reinforcement learning (MIBE-DRL) method, drawing inspiration from a fusion of data-driven and model-driven techniques. MIBE-DRL extensively incorporates model inductive bias into the deep reinforcement learning framework, enhancing the efficiency and safety of robot navigation. The proposed approach entails a multi-interaction network featuring three modules designed to comprehensively understand potential agent interactions in dynamic environments. The pedestrian interaction module can model interactions among humans, while the temporal and spatial interaction modules consider agent interactions in both temporal and spatial dimensions. Additionally, the paper constructs a reward system that fully accounts for the robot’s direction and position factors. This system's directional and positional reward functions are built based on artificial potential fields (APF) and navigation rules, respectively, which can provide reasoned evaluations for the robot's motion direction and position during training, enabling it to receive comprehensive feedback. Furthermore, the incorporation of Monte-Carlo tree search (MCTS) facilitates the development of a foresighted action strategy, enabling robots to execute actions with long-term planning considerations. Experimental results demonstrate that integrating model inductive bias significantly enhances the navigation performance of MIBE-DRL. Compared to state-of-the-art methods, MIBE-DRL achieves the highest success rate in crowded environments and demonstrates advantages in navigation time and maintaining a safe social distance from humans.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对拥挤环境中机器人导航的模型归纳偏差增强型深度强化学习

在拥挤的环境中为移动机器人导航是一项重大挑战，也是未来智能社会中机器人与人类共存的关键。作为一种实用的数据驱动方法，深度强化学习（DRL）有望解决这一难题。然而，目前基于 DRL 的导航方法在理解代理互动、反馈机制设计和动态环境中的决策预见方面还有待改进。本文从数据驱动和模型驱动技术的融合中汲取灵感，介绍了模型归纳偏差增强型深度强化学习（MIBE-DRL）方法。MIBE-DRL 将模型归纳偏差广泛纳入深度强化学习框架，提高了机器人导航的效率和安全性。所提出的方法包含一个多交互网络，其中的三个模块旨在全面了解动态环境中潜在的代理交互。行人交互模块可以模拟人与人之间的交互，而时间和空间交互模块则考虑了代理在时间和空间维度上的交互。此外，本文还构建了一个完全考虑机器人方向和位置因素的奖励系统。该系统的方向和位置奖励函数分别基于人工势场（APF）和导航规则构建，可在训练过程中对机器人的运动方向和位置进行合理评估，使其获得全面反馈。此外，蒙特卡洛树搜索（Monte-Carlo tree search，MCTS）的加入有助于制定有预见性的行动策略，使机器人在执行行动时能够考虑长远规划。实验结果表明，整合模型归纳偏差可显著提高 MIBE-DRL 的导航性能。与最先进的方法相比，MIBE-DRL 在拥挤的环境中取得了最高的成功率，并在导航时间和与人类保持安全社交距离方面表现出优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.