{"title":"Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control","authors":"Zihao Sheng, Zilin Huang, Sikai Chen","doi":"arxiv-2408.17380","DOIUrl":null,"url":null,"abstract":"Model-based reinforcement learning (RL) is anticipated to exhibit higher\nsample efficiency compared to model-free RL by utilizing a virtual environment\nmodel. However, it is challenging to obtain sufficiently accurate\nrepresentations of the environmental dynamics due to uncertainties in complex\nsystems and environments. An inaccurate environment model may degrade the\nsample efficiency and performance of model-based RL. Furthermore, while\nmodel-based RL can improve sample efficiency, it often still requires\nsubstantial training time to learn from scratch, potentially limiting its\nadvantages over model-free approaches. To address these challenges, this paper\nintroduces a knowledge-informed model-based residual reinforcement learning\nframework aimed at enhancing learning efficiency by infusing established expert\nknowledge into the learning process and avoiding the issue of beginning from\nzero. Our approach integrates traffic expert knowledge into a virtual\nenvironment model, employing the Intelligent Driver Model (IDM) for basic\ndynamics and neural networks for residual dynamics, thus ensuring adaptability\nto complex scenarios. We propose a novel strategy that combines traditional\ncontrol methods with residual RL, facilitating efficient learning and policy\noptimization without the need to learn from scratch. The proposed approach is\napplied to CAV trajectory control tasks for the dissipation of stop-and-go\nwaves in mixed traffic flow. Experimental results demonstrate that our proposed\napproach enables the CAV agent to achieve superior performance in trajectory\ncontrol compared to the baseline agents in terms of sample efficiency, traffic\nflow smoothness and traffic mobility. The source code and supplementary\nmaterials are available at https://github.com/zihaosheng/traffic-expertise-RL/.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.17380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Model-based reinforcement learning (RL) is anticipated to exhibit higher
sample efficiency compared to model-free RL by utilizing a virtual environment
model. However, it is challenging to obtain sufficiently accurate
representations of the environmental dynamics due to uncertainties in complex
systems and environments. An inaccurate environment model may degrade the
sample efficiency and performance of model-based RL. Furthermore, while
model-based RL can improve sample efficiency, it often still requires
substantial training time to learn from scratch, potentially limiting its
advantages over model-free approaches. To address these challenges, this paper
introduces a knowledge-informed model-based residual reinforcement learning
framework aimed at enhancing learning efficiency by infusing established expert
knowledge into the learning process and avoiding the issue of beginning from
zero. Our approach integrates traffic expert knowledge into a virtual
environment model, employing the Intelligent Driver Model (IDM) for basic
dynamics and neural networks for residual dynamics, thus ensuring adaptability
to complex scenarios. We propose a novel strategy that combines traditional
control methods with residual RL, facilitating efficient learning and policy
optimization without the need to learn from scratch. The proposed approach is
applied to CAV trajectory control tasks for the dissipation of stop-and-go
waves in mixed traffic flow. Experimental results demonstrate that our proposed
approach enables the CAV agent to achieve superior performance in trajectory
control compared to the baseline agents in terms of sample efficiency, traffic
flow smoothness and traffic mobility. The source code and supplementary
materials are available at https://github.com/zihaosheng/traffic-expertise-RL/.