Decentralized learning for traffic signal control

2015 7th International Conference on Communication Systems and Networks (COMSNETS) Pub Date : 1900-01-01 DOI:10.1109/COMSNETS.2015.7098712

K. J. Prabuchandran, Hemanth Kumar A.N, S. Bhatnagar

{"title":"Decentralized learning for traffic signal control","authors":"K. J. Prabuchandran, Hemanth Kumar A.N, S. Bhatnagar","doi":"10.1109/COMSNETS.2015.7098712","DOIUrl":null,"url":null,"abstract":"In this paper, we study the problem of obtaining the optimal order of the phase sequence [14] in a road network for efficiently managing the traffic flow. We model this problem as a Markov decision process (MDP). This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller). Each agent optimizes the order of the phase sequence using Q-learning with either ∈-greedy or UCB [3] based exploration strategies. The coordination between the junctions is achieved based on the cost feedback signal received from the neighbouring junctions. The learning algorithm for each agent updates the Q-factors using this feedback signal. We show through simulations over VISSIM that our algorithms perform significantly better than the standard fixed signal timing (FST), the saturation balancing (SAT) [14] and the round-robin multi-agent reinforcement learning algorithms [11] over two real road networks.","PeriodicalId":277593,"journal":{"name":"2015 7th International Conference on Communication Systems and Networks (COMSNETS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Conference on Communication Systems and Networks (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS.2015.7098712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

In this paper, we study the problem of obtaining the optimal order of the phase sequence [14] in a road network for efficiently managing the traffic flow. We model this problem as a Markov decision process (MDP). This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller). Each agent optimizes the order of the phase sequence using Q-learning with either ∈-greedy or UCB [3] based exploration strategies. The coordination between the junctions is achieved based on the cost feedback signal received from the neighbouring junctions. The learning algorithm for each agent updates the Q-factors using this feedback signal. We show through simulations over VISSIM that our algorithms perform significantly better than the standard fixed signal timing (FST), the saturation balancing (SAT) [14] and the round-robin multi-agent reinforcement learning algorithms [11] over two real road networks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

交通信号控制的分散学习

在本文中，我们研究了获取路网中相序的最优阶数[14]的问题，以便有效地管理交通流。我们将此问题建模为马尔可夫决策过程(MDP)。当同时考虑路网中所有的交叉口时，这个问题很难解决。因此，我们提出了一种分散的多智能体强化学习(MARL)算法来解决这个问题，该算法将路网中的每个路口视为一个单独的智能体(控制器)。每个智能体使用基于∈-greedy或UCB[3]的探索策略使用Q-learning优化相序列的顺序。节点之间的协调是基于从相邻节点接收到的代价反馈信号来实现的。每个智能体的学习算法使用这个反馈信号更新q因子。我们通过在VISSIM上的模拟表明，我们的算法在两个真实道路网络上的表现明显优于标准的固定信号定时(FST)、饱和平衡(SAT)[14]和循环多智能体强化学习算法[11]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 7th International Conference on Communication Systems and Networks (COMSNETS)

自引率

0.00%

发文量

期刊最新文献

GSM-based positioning for public transportation commuters Passing the torch: Role alternation for fair energy usage in D2D group communication Performance analysis of parameters affecting power efficiency in networks BlinkToSCoAP: An end-to-end security framework for the Internet of Things Contextual sensitivity of the ambient temperature sensor in Smartphones