XLight: An interpretable multi-agent reinforcement learning approach for traffic signal control

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-05-10 Epub Date: 2025-02-16 DOI:10.1016/j.eswa.2025.126938
Sibin Cai , Jie Fang , Mengyun Xu
{"title":"XLight: An interpretable multi-agent reinforcement learning approach for traffic signal control","authors":"Sibin Cai ,&nbsp;Jie Fang ,&nbsp;Mengyun Xu","doi":"10.1016/j.eswa.2025.126938","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, deep reinforcement learning (DRL)-based traffic signal control (TSC) methods have garnered significant attention among researchers, achieving substantial progress. However, current research often focuses on performance improvement, neglecting interpretability. DRL-based TSC methods often face challenges in interpretability. This limitation poses significant obstacles to practical deployment, given the liability and regulatory constraints faced by governmental authorities responsible for traffic management and control. On the other hand, interpretable RL-based TSC methods offer greater flexibility to meet specific requirements. For instance, prioritizing the clearance of vehicles in a particular movement can be easily achieved by assigning higher weights to the state variables associated with that movement. To address this issue, we propose <strong><em>Xlight</em></strong>, an interpretable multi-agent reinforcement learning (MARL) approach for TSC, which enhances interpretability in three key aspects: (a) meticulously designing and selecting the state space, action space, and reward function. Especially, we propose an interpretable reward function for network-wide TSC and prove that maximizing this reward is equivalent to minimizing the average travel time (ATT) in the road network; (b) introducing more practical regulatable (i.e., interpretable) functions as TSC controllers; and (c) employing maximum entropy policy optimization, which simultaneously enhances interpretability and improves transferability. Next, to better align with practical applications of network-wide TSC, we propose several interpretable MARL-based methods. Among these, Multi-Agent Regulatable Soft Actor-Critic (MARSAC) not only possesses interpretability but also achieves superior performance. Finally, comprehensive experiments conducted across various TSC scenarios, including isolated intersection, synthetic network-wide intersections, and real-world network-wide intersections, demonstrate the effectiveness. For example, in terms of the ATT metric, our proposed method achieves improvements of 9.55%, 34.17%, 3.98%, and 42.93% compared to the Actuated Traffic Signal Control (ATSC) across a synthetic road network and 3 real-world road networks. Furthermore, in the synthetic network, our method demonstrates improvements of 4.04% and 3.21% in the Safety Score and Fuel Consumption metrics, respectively, when compared to the ATSC.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126938"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425005603","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, deep reinforcement learning (DRL)-based traffic signal control (TSC) methods have garnered significant attention among researchers, achieving substantial progress. However, current research often focuses on performance improvement, neglecting interpretability. DRL-based TSC methods often face challenges in interpretability. This limitation poses significant obstacles to practical deployment, given the liability and regulatory constraints faced by governmental authorities responsible for traffic management and control. On the other hand, interpretable RL-based TSC methods offer greater flexibility to meet specific requirements. For instance, prioritizing the clearance of vehicles in a particular movement can be easily achieved by assigning higher weights to the state variables associated with that movement. To address this issue, we propose Xlight, an interpretable multi-agent reinforcement learning (MARL) approach for TSC, which enhances interpretability in three key aspects: (a) meticulously designing and selecting the state space, action space, and reward function. Especially, we propose an interpretable reward function for network-wide TSC and prove that maximizing this reward is equivalent to minimizing the average travel time (ATT) in the road network; (b) introducing more practical regulatable (i.e., interpretable) functions as TSC controllers; and (c) employing maximum entropy policy optimization, which simultaneously enhances interpretability and improves transferability. Next, to better align with practical applications of network-wide TSC, we propose several interpretable MARL-based methods. Among these, Multi-Agent Regulatable Soft Actor-Critic (MARSAC) not only possesses interpretability but also achieves superior performance. Finally, comprehensive experiments conducted across various TSC scenarios, including isolated intersection, synthetic network-wide intersections, and real-world network-wide intersections, demonstrate the effectiveness. For example, in terms of the ATT metric, our proposed method achieves improvements of 9.55%, 34.17%, 3.98%, and 42.93% compared to the Actuated Traffic Signal Control (ATSC) across a synthetic road network and 3 real-world road networks. Furthermore, in the synthetic network, our method demonstrates improvements of 4.04% and 3.21% in the Safety Score and Fuel Consumption metrics, respectively, when compared to the ATSC.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种用于交通信号控制的可解释多智能体强化学习方法
近年来,基于深度强化学习(DRL)的交通信号控制(TSC)方法受到了研究人员的广泛关注,并取得了长足的进展。然而,目前的研究往往侧重于性能的提高,而忽视了可解释性。基于drl的TSC方法在可解释性方面经常面临挑战。鉴于负责交通管理和控制的政府当局所面临的责任和监管限制,这一限制对实际部署构成了重大障碍。另一方面,可解释的基于rl的TSC方法提供了更大的灵活性,以满足特定的需求。例如,通过给与该运动相关的状态变量分配更高的权重,可以很容易地实现特定运动中车辆间隙的优先级。为了解决这个问题,我们提出了一种用于TSC的可解释多智能体强化学习(MARL)方法Xlight,它在三个关键方面增强了可解释性:(a)精心设计和选择状态空间、动作空间和奖励函数。特别是,我们提出了一个可解释的路网TSC奖励函数,并证明了最大化该奖励等于最小化路网中的平均旅行时间(ATT);(b)引入更实用的可调节(即可解释)功能作为TSC控制器;(c)采用最大熵策略优化,同时增强了可解释性和可移植性。接下来,为了更好地配合全网TSC的实际应用,我们提出了几种可解释的基于marl的方法。其中,多智能体可调节软行为评价(MARSAC)不仅具有可解释性,而且性能优越。最后,在各种TSC场景下进行了综合实验,包括孤立路口、合成全网路口和真实的全网路口,验证了该方法的有效性。例如,在ATT指标方面,我们提出的方法与自动交通信号控制(ATSC)相比,在合成路网和3个真实路网中实现了9.55%,34.17%,3.98%和42.93%的改进。此外,在合成网络中,与ATSC相比,我们的方法在安全得分和燃油消耗指标上分别提高了4.04%和3.21%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
期刊最新文献
FairDiff: Masked condition diffusion for fairness-aware recommendation CTGAN-MNLIME: A CTGAN-boosted multidimensional nonlinear LIME method for corporate environmental indicators prediction An explainable machine learning-based scoring function using interpretable features and model explanation approaches for binding affinity prediction Hybrid fuzzy multi-criteria decision-making model for assessing sustainable waste management strategies MPGCF: Multi-objective and popularity-smoothing graph collaborative filtering for long-tail web API recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1