Multi-Reward Architecture based Reinforcement Learning for Highway Driving Policies

Wei Yuan, Ming Yang, Yuesheng He, Chunxiang Wang, B. Wang
{"title":"Multi-Reward Architecture based Reinforcement Learning for Highway Driving Policies","authors":"Wei Yuan, Ming Yang, Yuesheng He, Chunxiang Wang, B. Wang","doi":"10.1109/ITSC.2019.8917304","DOIUrl":null,"url":null,"abstract":"A safe and efficient driving policy is essential for the future autonomous highway driving. However, driving policies are hard for modeling because of the diversity of scenes and uncertainties of the interaction with surrounding vehicles. The state-of-the-art deep reinforcement learning method is unable to learn good domain knowledge for highway driving policies using single reward architecture. This paper proposes a Multi-Reward Architecture (MRA) based reinforcement learning for highway driving policies. A single reward function is decomposed to multi-reward functions for better representation of multi-dimensional driving policies. Besides the big penalty for collision, the overall reward is decomposed to three dimensional rewards: the reward for speed, the reward for overtake, and the reward for lane-change. Then, each reward trains a branch of Q-network for corresponding domain knowledge. The Q-network is divided into two parts: low-level network is shared by three branches of high-level networks, which approximate the corresponding Q-value for the different reward functions respectively. The agent car chooses the action based on the sum of Q vectors from three branches. Experiments are conducted in a simulation platform, which performs the highway driving process and the agent car is able to provide the commonly used sensor data: the image and the point cloud. Experiment results show that the proposed method performs better than the DQN method on single reward architecture with three evaluations: higher speed, lower frequency of lane-change, more quantity of overtaking, which is more efficient and safer for the future autonomous highway driving.","PeriodicalId":6717,"journal":{"name":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","volume":"24 1","pages":"3810-3815"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC.2019.8917304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

A safe and efficient driving policy is essential for the future autonomous highway driving. However, driving policies are hard for modeling because of the diversity of scenes and uncertainties of the interaction with surrounding vehicles. The state-of-the-art deep reinforcement learning method is unable to learn good domain knowledge for highway driving policies using single reward architecture. This paper proposes a Multi-Reward Architecture (MRA) based reinforcement learning for highway driving policies. A single reward function is decomposed to multi-reward functions for better representation of multi-dimensional driving policies. Besides the big penalty for collision, the overall reward is decomposed to three dimensional rewards: the reward for speed, the reward for overtake, and the reward for lane-change. Then, each reward trains a branch of Q-network for corresponding domain knowledge. The Q-network is divided into two parts: low-level network is shared by three branches of high-level networks, which approximate the corresponding Q-value for the different reward functions respectively. The agent car chooses the action based on the sum of Q vectors from three branches. Experiments are conducted in a simulation platform, which performs the highway driving process and the agent car is able to provide the commonly used sensor data: the image and the point cloud. Experiment results show that the proposed method performs better than the DQN method on single reward architecture with three evaluations: higher speed, lower frequency of lane-change, more quantity of overtaking, which is more efficient and safer for the future autonomous highway driving.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多奖励体系结构的公路驾驶策略强化学习
安全高效的驾驶策略对于未来的高速公路自动驾驶至关重要。然而,由于场景的多样性和与周围车辆相互作用的不确定性,驾驶策略很难建模。目前最先进的深度强化学习方法在使用单一奖励架构的情况下无法学习到良好的高速公路驾驶策略领域知识。提出了一种基于多奖励体系结构(Multi-Reward Architecture, MRA)的公路驾驶策略强化学习方法。为了更好地表示多维驾驶策略,将单个奖励函数分解为多个奖励函数。除了碰撞大的奖励外,整体奖励被分解为三个维度的奖励:速度奖励、超车奖励和变道奖励。然后,每个奖励训练q网络的一个分支来获取相应的领域知识。Q-network分为两部分:低级网络由高级网络的三个分支共享,它们分别近似不同奖励函数对应的q值。智能体根据三个分支的Q向量的和来选择行动。在仿真平台上进行了实验,仿真平台模拟了高速公路行驶过程,代理车能够提供常用的传感器数据:图像和点云。实验结果表明,该方法在单奖励体系上优于DQN方法,具有更高的速度、更低的变道频率、更多的超车次数三个评价指标,为未来的自动公路驾驶提供了更高的效率和安全性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reliable Monocular Ego-Motion Estimation System in Rainy Urban Environments Coarse-to-Fine Luminance Estimation for Low-Light Image Enhancement in Maritime Video Surveillance Vehicle Occupancy Detection for HOV/HOT Lanes Enforcement Road Roughness Crowd-Sensing with Smartphone Apps LACI: Low-effort Automatic Calibration of Infrastructure Sensors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1