Safe reinforcement learning for high-speed autonomous racing

Benjamin D. Evans, Hendrik W. Jordaan, Herman A. Engelbrecht
{"title":"Safe reinforcement learning for high-speed autonomous racing","authors":"Benjamin D. Evans,&nbsp;Hendrik W. Jordaan,&nbsp;Herman A. Engelbrecht","doi":"10.1016/j.cogr.2023.04.002","DOIUrl":null,"url":null,"abstract":"<div><p>The conventional application of deep reinforcement learning (DRL) to autonomous racing requires the agent to crash during training, thus limiting training to simulation environments. Further, many DRL approaches still exhibit high crash rates after training, making them infeasible for real-world use. This paper addresses the problem of safely training DRL agents for autonomous racing. Firstly, we present a Viability Theory-based supervisor that ensures the vehicle does not crash and remains within the friction limit while maintaining recursive feasibility. Secondly, we use the supervisor to ensure the vehicle does not crash during the training of DRL agents for high-speed racing. The evaluation in the open-source F1Tenth simulator demonstrates that our safety system can ensure the safety of a worst-case scenario planner on four test maps up to speeds of 6 m/s. Training agents to race with the supervisor significantly improves sample efficiency, requiring only 10,000 steps. Our learning formulation leads to learning more conservative, safer policies with slower lap times and a higher success rate, resulting in our method being feasible for physical vehicle racing. Enabling DRL agents to learn to race without ever crashing is a step towards using DRL on physical vehicles.</p></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"3 ","pages":"Pages 107-126"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241323000125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The conventional application of deep reinforcement learning (DRL) to autonomous racing requires the agent to crash during training, thus limiting training to simulation environments. Further, many DRL approaches still exhibit high crash rates after training, making them infeasible for real-world use. This paper addresses the problem of safely training DRL agents for autonomous racing. Firstly, we present a Viability Theory-based supervisor that ensures the vehicle does not crash and remains within the friction limit while maintaining recursive feasibility. Secondly, we use the supervisor to ensure the vehicle does not crash during the training of DRL agents for high-speed racing. The evaluation in the open-source F1Tenth simulator demonstrates that our safety system can ensure the safety of a worst-case scenario planner on four test maps up to speeds of 6 m/s. Training agents to race with the supervisor significantly improves sample efficiency, requiring only 10,000 steps. Our learning formulation leads to learning more conservative, safer policies with slower lap times and a higher success rate, resulting in our method being feasible for physical vehicle racing. Enabling DRL agents to learn to race without ever crashing is a step towards using DRL on physical vehicles.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高速自动驾驶赛车的安全强化学习
深度强化学习(DRL)在自主比赛中的传统应用要求代理在训练过程中崩溃,从而将训练限制在模拟环境中。此外,许多DRL方法在训练后仍然表现出高崩溃率,这使得它们在现实世界中不可行。本文解决了为自主赛车安全训练DRL代理的问题。首先,我们提出了一种基于可行性理论的监督器,该监督器确保车辆不会碰撞并保持在摩擦极限内,同时保持递归可行性。其次,我们使用监督员来确保车辆在DRL代理进行高速比赛训练时不会发生碰撞。开源F1Tenth模拟器中的评估表明,我们的安全系统可以确保最坏情况规划器在四张速度高达6 m/s的测试图上的安全。训练代理与主管比赛可以显著提高采样效率,只需要10000步。我们的学习公式可以学习更保守、更安全的策略,圈速更低,成功率更高,因此我们的方法适用于实体赛车。让DRL代理人学会在不发生碰撞的情况下比赛是在实体车辆上使用DRL的一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.40
自引率
0.00%
发文量
0
期刊最新文献
Optimizing Food Sample Handling and Placement Pattern Recognition with YOLO: Advanced Techniques in Robotic Object Detection Intelligent path planning for cognitive mobile robot based on Dhouib-Matrix-SPP method YOLOT: Multi-scale and diverse tire sidewall text region detection based on You-Only-Look-Once(YOLOv5) Scalable and cohesive swarm control based on reinforcement learning POMDP-based probabilistic decision making for path planning in wheeled mobile robot
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1