XLight: An interpretable multi-agent reinforcement learning approach for traffic signal control

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-05-10 Epub Date: 2025-02-16 DOI:10.1016/j.eswa.2025.126938

Sibin Cai , Jie Fang , Mengyun Xu

{"title":"XLight: An interpretable multi-agent reinforcement learning approach for traffic signal control","authors":"Sibin Cai , Jie Fang , Mengyun Xu","doi":"10.1016/j.eswa.2025.126938","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, deep reinforcement learning (DRL)-based traffic signal control (TSC) methods have garnered significant attention among researchers, achieving substantial progress. However, current research often focuses on performance improvement, neglecting interpretability. DRL-based TSC methods often face challenges in interpretability. This limitation poses significant obstacles to practical deployment, given the liability and regulatory constraints faced by governmental authorities responsible for traffic management and control. On the other hand, interpretable RL-based TSC methods offer greater flexibility to meet specific requirements. For instance, prioritizing the clearance of vehicles in a particular movement can be easily achieved by assigning higher weights to the state variables associated with that movement. To address this issue, we propose <strong><em>Xlight</em></strong>, an interpretable multi-agent reinforcement learning (MARL) approach for TSC, which enhances interpretability in three key aspects: (a) meticulously designing and selecting the state space, action space, and reward function. Especially, we propose an interpretable reward function for network-wide TSC and prove that maximizing this reward is equivalent to minimizing the average travel time (ATT) in the road network; (b) introducing more practical regulatable (i.e., interpretable) functions as TSC controllers; and (c) employing maximum entropy policy optimization, which simultaneously enhances interpretability and improves transferability. Next, to better align with practical applications of network-wide TSC, we propose several interpretable MARL-based methods. Among these, Multi-Agent Regulatable Soft Actor-Critic (MARSAC) not only possesses interpretability but also achieves superior performance. Finally, comprehensive experiments conducted across various TSC scenarios, including isolated intersection, synthetic network-wide intersections, and real-world network-wide intersections, demonstrate the effectiveness. For example, in terms of the ATT metric, our proposed method achieves improvements of 9.55%, 34.17%, 3.98%, and 42.93% compared to the Actuated Traffic Signal Control (ATSC) across a synthetic road network and 3 real-world road networks. Furthermore, in the synthetic network, our method demonstrates improvements of 4.04% and 3.21% in the Safety Score and Fuel Consumption metrics, respectively, when compared to the ATSC.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126938"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425005603","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, deep reinforcement learning (DRL)-based traffic signal control (TSC) methods have garnered significant attention among researchers, achieving substantial progress. However, current research often focuses on performance improvement, neglecting interpretability. DRL-based TSC methods often face challenges in interpretability. This limitation poses significant obstacles to practical deployment, given the liability and regulatory constraints faced by governmental authorities responsible for traffic management and control. On the other hand, interpretable RL-based TSC methods offer greater flexibility to meet specific requirements. For instance, prioritizing the clearance of vehicles in a particular movement can be easily achieved by assigning higher weights to the state variables associated with that movement. To address this issue, we propose Xlight, an interpretable multi-agent reinforcement learning (MARL) approach for TSC, which enhances interpretability in three key aspects: (a) meticulously designing and selecting the state space, action space, and reward function. Especially, we propose an interpretable reward function for network-wide TSC and prove that maximizing this reward is equivalent to minimizing the average travel time (ATT) in the road network; (b) introducing more practical regulatable (i.e., interpretable) functions as TSC controllers; and (c) employing maximum entropy policy optimization, which simultaneously enhances interpretability and improves transferability. Next, to better align with practical applications of network-wide TSC, we propose several interpretable MARL-based methods. Among these, Multi-Agent Regulatable Soft Actor-Critic (MARSAC) not only possesses interpretability but also achieves superior performance. Finally, comprehensive experiments conducted across various TSC scenarios, including isolated intersection, synthetic network-wide intersections, and real-world network-wide intersections, demonstrate the effectiveness. For example, in terms of the ATT metric, our proposed method achieves improvements of 9.55%, 34.17%, 3.98%, and 42.93% compared to the Actuated Traffic Signal Control (ATSC) across a synthetic road network and 3 real-world road networks. Furthermore, in the synthetic network, our method demonstrates improvements of 4.04% and 3.21% in the Safety Score and Fuel Consumption metrics, respectively, when compared to the ATSC.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种用于交通信号控制的可解释多智能体强化学习方法

近年来，基于深度强化学习（DRL）的交通信号控制（TSC）方法受到了研究人员的广泛关注，并取得了长足的进展。然而，目前的研究往往侧重于性能的提高，而忽视了可解释性。基于drl的TSC方法在可解释性方面经常面临挑战。鉴于负责交通管理和控制的政府当局所面临的责任和监管限制，这一限制对实际部署构成了重大障碍。另一方面，可解释的基于rl的TSC方法提供了更大的灵活性，以满足特定的需求。例如，通过给与该运动相关的状态变量分配更高的权重，可以很容易地实现特定运动中车辆间隙的优先级。为了解决这个问题，我们提出了一种用于TSC的可解释多智能体强化学习（MARL）方法Xlight，它在三个关键方面增强了可解释性：(a)精心设计和选择状态空间、动作空间和奖励函数。特别是，我们提出了一个可解释的路网TSC奖励函数，并证明了最大化该奖励等于最小化路网中的平均旅行时间（ATT）；(b)引入更实用的可调节（即可解释）功能作为TSC控制器；(c)采用最大熵策略优化，同时增强了可解释性和可移植性。接下来，为了更好地配合全网TSC的实际应用，我们提出了几种可解释的基于marl的方法。其中，多智能体可调节软行为评价（MARSAC）不仅具有可解释性，而且性能优越。最后，在各种TSC场景下进行了综合实验，包括孤立路口、合成全网路口和真实的全网路口，验证了该方法的有效性。例如，在ATT指标方面，我们提出的方法与自动交通信号控制（ATSC）相比，在合成路网和3个真实路网中实现了9.55%,34.17%，3.98%和42.93%的改进。此外，在合成网络中，与ATSC相比，我们的方法在安全得分和燃油消耗指标上分别提高了4.04%和3.21%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.