交通专业知识与残差 RL 的结合:基于知识模型的残差强化学习用于 CAV 轨迹控制

Zihao Sheng, Zilin Huang, Sikai Chen
{"title":"交通专业知识与残差 RL 的结合:基于知识模型的残差强化学习用于 CAV 轨迹控制","authors":"Zihao Sheng, Zilin Huang, Sikai Chen","doi":"arxiv-2408.17380","DOIUrl":null,"url":null,"abstract":"Model-based reinforcement learning (RL) is anticipated to exhibit higher\nsample efficiency compared to model-free RL by utilizing a virtual environment\nmodel. However, it is challenging to obtain sufficiently accurate\nrepresentations of the environmental dynamics due to uncertainties in complex\nsystems and environments. An inaccurate environment model may degrade the\nsample efficiency and performance of model-based RL. Furthermore, while\nmodel-based RL can improve sample efficiency, it often still requires\nsubstantial training time to learn from scratch, potentially limiting its\nadvantages over model-free approaches. To address these challenges, this paper\nintroduces a knowledge-informed model-based residual reinforcement learning\nframework aimed at enhancing learning efficiency by infusing established expert\nknowledge into the learning process and avoiding the issue of beginning from\nzero. Our approach integrates traffic expert knowledge into a virtual\nenvironment model, employing the Intelligent Driver Model (IDM) for basic\ndynamics and neural networks for residual dynamics, thus ensuring adaptability\nto complex scenarios. We propose a novel strategy that combines traditional\ncontrol methods with residual RL, facilitating efficient learning and policy\noptimization without the need to learn from scratch. The proposed approach is\napplied to CAV trajectory control tasks for the dissipation of stop-and-go\nwaves in mixed traffic flow. Experimental results demonstrate that our proposed\napproach enables the CAV agent to achieve superior performance in trajectory\ncontrol compared to the baseline agents in terms of sample efficiency, traffic\nflow smoothness and traffic mobility. The source code and supplementary\nmaterials are available at https://github.com/zihaosheng/traffic-expertise-RL/.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control\",\"authors\":\"Zihao Sheng, Zilin Huang, Sikai Chen\",\"doi\":\"arxiv-2408.17380\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Model-based reinforcement learning (RL) is anticipated to exhibit higher\\nsample efficiency compared to model-free RL by utilizing a virtual environment\\nmodel. However, it is challenging to obtain sufficiently accurate\\nrepresentations of the environmental dynamics due to uncertainties in complex\\nsystems and environments. An inaccurate environment model may degrade the\\nsample efficiency and performance of model-based RL. Furthermore, while\\nmodel-based RL can improve sample efficiency, it often still requires\\nsubstantial training time to learn from scratch, potentially limiting its\\nadvantages over model-free approaches. To address these challenges, this paper\\nintroduces a knowledge-informed model-based residual reinforcement learning\\nframework aimed at enhancing learning efficiency by infusing established expert\\nknowledge into the learning process and avoiding the issue of beginning from\\nzero. Our approach integrates traffic expert knowledge into a virtual\\nenvironment model, employing the Intelligent Driver Model (IDM) for basic\\ndynamics and neural networks for residual dynamics, thus ensuring adaptability\\nto complex scenarios. We propose a novel strategy that combines traditional\\ncontrol methods with residual RL, facilitating efficient learning and policy\\noptimization without the need to learn from scratch. The proposed approach is\\napplied to CAV trajectory control tasks for the dissipation of stop-and-go\\nwaves in mixed traffic flow. Experimental results demonstrate that our proposed\\napproach enables the CAV agent to achieve superior performance in trajectory\\ncontrol compared to the baseline agents in terms of sample efficiency, traffic\\nflow smoothness and traffic mobility. The source code and supplementary\\nmaterials are available at https://github.com/zihaosheng/traffic-expertise-RL/.\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.17380\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.17380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

与无模型强化学习(RL)相比,基于模型的强化学习(RL)通过利用虚拟环境模型,有望表现出更高的样本效率。然而,由于复杂系统和环境的不确定性,要获得足够准确的环境动态描述具有挑战性。不准确的环境模型可能会降低基于模型的 RL 的采样效率和性能。此外,虽然基于模型的 RL 可以提高采样效率,但它通常仍需要大量的训练时间来从头开始学习,这可能会限制它相对于无模型方法的优势。为了应对这些挑战,本文介绍了一种基于知识模型的残差强化学习框架,旨在通过将已有的专家知识注入学习过程来提高学习效率,避免从零开始的问题。我们的方法将交通专家知识集成到虚拟环境模型中,采用智能驾驶员模型(IDM)进行基本动力学分析,采用神经网络进行残差动力学分析,从而确保对复杂场景的适应性。我们提出了一种将传统控制方法与残差 RL 相结合的新策略,有助于高效学习和策略优化,而无需从头开始学习。我们将所提出的方法应用于 CAV 轨迹控制任务,以消除混合交通流中的停顿和波浪。实验结果表明,与基线代理相比,我们提出的方法使 CAV 代理在采样效率、交通流平稳性和交通流动性方面实现了更优越的轨迹控制性能。源代码和补充材料可在 https://github.com/zihaosheng/traffic-expertise-RL/ 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency compared to model-free RL by utilizing a virtual environment model. However, it is challenging to obtain sufficiently accurate representations of the environmental dynamics due to uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performance of model-based RL. Furthermore, while model-based RL can improve sample efficiency, it often still requires substantial training time to learn from scratch, potentially limiting its advantages over model-free approaches. To address these challenges, this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero. Our approach integrates traffic expert knowledge into a virtual environment model, employing the Intelligent Driver Model (IDM) for basic dynamics and neural networks for residual dynamics, thus ensuring adaptability to complex scenarios. We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch. The proposed approach is applied to CAV trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flow. Experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared to the baseline agents in terms of sample efficiency, traffic flow smoothness and traffic mobility. The source code and supplementary materials are available at https://github.com/zihaosheng/traffic-expertise-RL/.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Abductive explanations of classifiers under constraints: Complexity and properties Explaining Non-monotonic Normative Reasoning using Argumentation Theory with Deontic Logic Towards Explainable Goal Recognition Using Weight of Evidence (WoE): A Human-Centered Approach A Metric Hybrid Planning Approach to Solving Pandemic Planning Problems with Simple SIR Models Neural Networks for Vehicle Routing Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1