多路口交通信号控制的深度策略学习智能体

2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) Pub Date : 2020-09-20 DOI:10.1109/ITSC45102.2020.9294471

Chia-Cheng Yen, D. Ghosal, Michael Zhang, C. Chuah

{"title":"多路口交通信号控制的深度策略学习智能体","authors":"Chia-Cheng Yen, D. Ghosal, Michael Zhang, C. Chuah","doi":"10.1109/ITSC45102.2020.9294471","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) is being rapidly adopted in many complex environments due to its ability to leverage neural networks to learn good strategies. In traffic signal control (TSC), existing work has focused on off-policy learning (Q-learning) with neural networks. There is limited study on on-policy learning (SARSA) with neural networks. In this work, we propose a deep dueling on-policy learning method (2DSARSA) for coordinated TSC for a network of intersections that maximizes the network throughput and minimizes the average end-to-end delay. To describe the states of the environment, we propose traffic flow maps (TFMs) that capture head-of-the-line (HOL) sojourn times for traffic lanes and HOL differences for adjacent intersections. We introduce a reward function defined by the power metric which is the ratio of the network throughput to the average end-to-end delay. The proposed reward function simultaneously maximizes the network throughput and minimizes the average end-to-end delay. We show that the proposed 2DSARSA architecture has a significantly better learning performance compared to other RL architectures including Deep Q-Network (DQN) and Deep SARSA (DSARSA).","PeriodicalId":394538,"journal":{"name":"2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Deep On-Policy Learning Agent for Traffic Signal Control of Multiple Intersections\",\"authors\":\"Chia-Cheng Yen, D. Ghosal, Michael Zhang, C. Chuah\",\"doi\":\"10.1109/ITSC45102.2020.9294471\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement Learning (RL) is being rapidly adopted in many complex environments due to its ability to leverage neural networks to learn good strategies. In traffic signal control (TSC), existing work has focused on off-policy learning (Q-learning) with neural networks. There is limited study on on-policy learning (SARSA) with neural networks. In this work, we propose a deep dueling on-policy learning method (2DSARSA) for coordinated TSC for a network of intersections that maximizes the network throughput and minimizes the average end-to-end delay. To describe the states of the environment, we propose traffic flow maps (TFMs) that capture head-of-the-line (HOL) sojourn times for traffic lanes and HOL differences for adjacent intersections. We introduce a reward function defined by the power metric which is the ratio of the network throughput to the average end-to-end delay. The proposed reward function simultaneously maximizes the network throughput and minimizes the average end-to-end delay. We show that the proposed 2DSARSA architecture has a significantly better learning performance compared to other RL architectures including Deep Q-Network (DQN) and Deep SARSA (DSARSA).\",\"PeriodicalId\":394538,\"journal\":{\"name\":\"2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITSC45102.2020.9294471\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC45102.2020.9294471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

强化学习(RL)由于其利用神经网络学习良好策略的能力，在许多复杂环境中被迅速采用。在交通信号控制(TSC)中，现有的工作主要集中在基于神经网络的离策略学习(Q-learning)。基于神经网络的政策学习(SARSA)研究有限。在这项工作中，我们提出了一种深度决斗策略学习方法(2DSARSA)，用于交叉口网络的协调TSC，该方法可以最大化网络吞吐量并最小化平均端到端延迟。为了描述环境状态，我们提出了交通流图(tfm)，该图捕捉了交通车道的首线停留时间(HOL)和相邻十字路口的HOL差异。我们引入了一个由功率度量定义的奖励函数，它是网络吞吐量与平均端到端延迟的比率。所提出的奖励函数同时使网络吞吐量最大化和平均端到端延迟最小化。研究表明，与Deep Q-Network (DQN)和Deep SARSA (DSARSA)等RL体系结构相比，本文提出的2DSARSA体系结构具有更好的学习性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Deep On-Policy Learning Agent for Traffic Signal Control of Multiple Intersections

Reinforcement Learning (RL) is being rapidly adopted in many complex environments due to its ability to leverage neural networks to learn good strategies. In traffic signal control (TSC), existing work has focused on off-policy learning (Q-learning) with neural networks. There is limited study on on-policy learning (SARSA) with neural networks. In this work, we propose a deep dueling on-policy learning method (2DSARSA) for coordinated TSC for a network of intersections that maximizes the network throughput and minimizes the average end-to-end delay. To describe the states of the environment, we propose traffic flow maps (TFMs) that capture head-of-the-line (HOL) sojourn times for traffic lanes and HOL differences for adjacent intersections. We introduce a reward function defined by the power metric which is the ratio of the network throughput to the average end-to-end delay. The proposed reward function simultaneously maximizes the network throughput and minimizes the average end-to-end delay. We show that the proposed 2DSARSA architecture has a significantly better learning performance compared to other RL architectures including Deep Q-Network (DQN) and Deep SARSA (DSARSA).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)

自引率

0.00%

发文量