Hyunsoo Yun , Eui-jin Kim , Seung Woo Ham , Dong-Kyu Kim
{"title":"Navigating the non-compliance effects on system optimal route guidance using reinforcement learning","authors":"Hyunsoo Yun , Eui-jin Kim , Seung Woo Ham , Dong-Kyu Kim","doi":"10.1016/j.trc.2024.104721","DOIUrl":null,"url":null,"abstract":"<div><p>We consider a scenario where the transportation management center (TMC) guides future autonomous vehicles (AVs) toward optimal routes, aiming to bring the network in line with the system optimal (SO) principle. However, achieving this requires a joint decision-making process, while users may be non-compliant with the TMC’s route guidance for personal gain. This paper models a future transportation network with a microscopic simulation, to introduce a novel concept of mixed equilibrium. In this framework, AVs follow the TMC’s SO route guidance, while users can dynamically choose to either comply or manually override this autonomy based on their own judgment. We initially model a fully compliant scenario, where the centralized Q-network, analogous to a TMC, is trained using reinforcement learning (RL) to minimize total system travel time (TSTT), providing optimal routes to users. Subsequently, we extend the problem setting to a multi-agent reinforcement learning (MARL) scenario, where users can comply or deviate from the TMC’s guidance based on their own decision-making. Through neural fictitious self-play (NFSP), we employ a modulating hyperparameter to investigate the impact of varying degrees of non-compliance on the overall system. Results indicate that our RL approach holds significant potential for addressing the dynamic system optimal assignment problem. Remarkably, the TMC’s route guidance retains the essence of SO while integrating some level of non-compliance. However, we also demonstrate that dominant user-centric decision-making may lead to system inefficiencies while creating disparities among users. Our framework serves as an innovative tool in an AV-dominant future, offering a realistic perspective on network performance that aids in formulating effective traffic management strategies.</p></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":null,"pages":null},"PeriodicalIF":7.6000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0968090X24002420/pdfft?md5=9dc535009d27953f74f5e574dc779a57&pid=1-s2.0-S0968090X24002420-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X24002420","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
We consider a scenario where the transportation management center (TMC) guides future autonomous vehicles (AVs) toward optimal routes, aiming to bring the network in line with the system optimal (SO) principle. However, achieving this requires a joint decision-making process, while users may be non-compliant with the TMC’s route guidance for personal gain. This paper models a future transportation network with a microscopic simulation, to introduce a novel concept of mixed equilibrium. In this framework, AVs follow the TMC’s SO route guidance, while users can dynamically choose to either comply or manually override this autonomy based on their own judgment. We initially model a fully compliant scenario, where the centralized Q-network, analogous to a TMC, is trained using reinforcement learning (RL) to minimize total system travel time (TSTT), providing optimal routes to users. Subsequently, we extend the problem setting to a multi-agent reinforcement learning (MARL) scenario, where users can comply or deviate from the TMC’s guidance based on their own decision-making. Through neural fictitious self-play (NFSP), we employ a modulating hyperparameter to investigate the impact of varying degrees of non-compliance on the overall system. Results indicate that our RL approach holds significant potential for addressing the dynamic system optimal assignment problem. Remarkably, the TMC’s route guidance retains the essence of SO while integrating some level of non-compliance. However, we also demonstrate that dominant user-centric decision-making may lead to system inefficiencies while creating disparities among users. Our framework serves as an innovative tool in an AV-dominant future, offering a realistic perspective on network performance that aids in formulating effective traffic management strategies.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.