Two-Sided Deep Reinforcement Learning for Dynamic Mobility-on-Demand Management with Mixed Autonomy

IF 4.4 2区工程技术 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE Transportation Science Pub Date : 2023-07-01 DOI:10.1287/trsc.2022.1188

Jiaohong Xie, Yang Liu, Nan Chen

{"title":"Two-Sided Deep Reinforcement Learning for Dynamic Mobility-on-Demand Management with Mixed Autonomy","authors":"Jiaohong Xie, Yang Liu, Nan Chen","doi":"10.1287/trsc.2022.1188","DOIUrl":null,"url":null,"abstract":"Autonomous vehicles (AVs) are expected to operate on mobility-on-demand (MoD) platforms because AV technology enables flexible self-relocation and system-optimal coordination. Unlike the existing studies, which focus on MoD with pure AV fleet or conventional vehicles (CVs) fleet, we aim to optimize the real-time fleet management of an MoD system with a mixed autonomy of CVs and AVs. We consider a realistic case that heterogeneous boundedly rational drivers may determine and learn their relocation strategies to improve their own compensation. In contrast, AVs are fully compliant with the platform’s operational decisions. To achieve a high level of service provided by a mixed fleet, we propose that the platform prioritizes human drivers in the matching decisions when on-demand requests arrive and dynamically determines the AV relocation tasks and the optimal commission fee to influence drivers’ behavior. However, it is challenging to make efficient real-time fleet management decisions when spatiotemporal uncertainty in demand and complex interactions among human drivers and operators are anticipated and considered in the operator’s decision making. To tackle the challenges, we develop a two-sided multiagent deep reinforcement learning (DRL) approach in which the operator acts as a supervisor agent on one side and makes centralized decisions on the mixed fleet, and each CV driver acts as an individual agent on the other side and learns to make decentralized decisions noncooperatively. We establish a two-sided multiagent advantage actor-critic algorithm to simultaneously train different agents on the two sides. For the first time, a scalable algorithm is developed here for mixed fleet management. Furthermore, we formulate a two-head policy network to enable the supervisor agent to efficiently make multitask decisions based on one policy network, which greatly reduces the computational time. The two-sided multiagent DRL approach is demonstrated using a case study in New York City using real taxi trip data. Results show that our algorithm can make high-quality decisions quickly and outperform benchmark policies. The efficiency of the two-head policy network is demonstrated by comparing it with the case using two separate policy networks. Our fleet management strategy makes both the platform and the drivers better off, especially in scenarios with high demand volume. History: This paper has been accepted for the Transportation Science Special Issue on Emerging Topics in Transportation Science and Logistics. Funding: This work was supported by the Singapore Ministry of Education Academic Research [Grant MOE2019-T2-2-165] and the Singapore Ministry of Education [Grant R-266-000-135-114].","PeriodicalId":51202,"journal":{"name":"Transportation Science","volume":"21 1","pages":"0"},"PeriodicalIF":4.4000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/trsc.2022.1188","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 3

Abstract

Autonomous vehicles (AVs) are expected to operate on mobility-on-demand (MoD) platforms because AV technology enables flexible self-relocation and system-optimal coordination. Unlike the existing studies, which focus on MoD with pure AV fleet or conventional vehicles (CVs) fleet, we aim to optimize the real-time fleet management of an MoD system with a mixed autonomy of CVs and AVs. We consider a realistic case that heterogeneous boundedly rational drivers may determine and learn their relocation strategies to improve their own compensation. In contrast, AVs are fully compliant with the platform’s operational decisions. To achieve a high level of service provided by a mixed fleet, we propose that the platform prioritizes human drivers in the matching decisions when on-demand requests arrive and dynamically determines the AV relocation tasks and the optimal commission fee to influence drivers’ behavior. However, it is challenging to make efficient real-time fleet management decisions when spatiotemporal uncertainty in demand and complex interactions among human drivers and operators are anticipated and considered in the operator’s decision making. To tackle the challenges, we develop a two-sided multiagent deep reinforcement learning (DRL) approach in which the operator acts as a supervisor agent on one side and makes centralized decisions on the mixed fleet, and each CV driver acts as an individual agent on the other side and learns to make decentralized decisions noncooperatively. We establish a two-sided multiagent advantage actor-critic algorithm to simultaneously train different agents on the two sides. For the first time, a scalable algorithm is developed here for mixed fleet management. Furthermore, we formulate a two-head policy network to enable the supervisor agent to efficiently make multitask decisions based on one policy network, which greatly reduces the computational time. The two-sided multiagent DRL approach is demonstrated using a case study in New York City using real taxi trip data. Results show that our algorithm can make high-quality decisions quickly and outperform benchmark policies. The efficiency of the two-head policy network is demonstrated by comparing it with the case using two separate policy networks. Our fleet management strategy makes both the platform and the drivers better off, especially in scenarios with high demand volume. History: This paper has been accepted for the Transportation Science Special Issue on Emerging Topics in Transportation Science and Logistics. Funding: This work was supported by the Singapore Ministry of Education Academic Research [Grant MOE2019-T2-2-165] and the Singapore Ministry of Education [Grant R-266-000-135-114].

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于双向深度强化学习的混合自主动态移动性随需管理

自动驾驶汽车(AV)有望在按需移动(MoD)平台上运行，因为自动驾驶技术能够实现灵活的自我定位和系统优化协调。现有的研究主要集中在纯自动驾驶车队或传统车辆(cv)车队的MoD上，而我们的目标是优化具有混合自动驾驶和自动驾驶的MoD系统的实时车队管理。本文考虑了一个现实案例，即异质有界理性的司机可以决定和学习他们的搬迁策略，以提高自己的补偿。相比之下，自动驾驶汽车完全符合平台的运营决策。为了实现混合车队提供的高水平服务，我们提出，当按需请求到达时，平台在匹配决策中优先考虑人类驾驶员，并动态确定自动驾驶汽车的搬迁任务和最优佣金，以影响驾驶员的行为。然而，当需求的时空不确定性以及驾驶员和操作员之间复杂的相互作用在操作员的决策中被预测和考虑时，做出有效的实时车队管理决策是具有挑战性的。为了应对这些挑战，我们开发了一种双边多智能体深度强化学习(DRL)方法，其中操作员作为一侧的监督代理并对混合车队做出集中决策，而每个CV驾驶员作为另一侧的单个代理并学习非合作地做出分散决策。我们建立了一种双面多智能体优势行为者评价算法来同时训练两边不同的智能体。本文首次提出了一种可扩展的混合车队管理算法。此外，我们制定了一个双头策略网络，使监督代理能够基于一个策略网络高效地进行多任务决策，大大减少了计算时间。以纽约市的一个案例研究为例，利用真实的出租车出行数据，对双边多智能体DRL方法进行了验证。结果表明，该算法可以快速做出高质量的决策，并且优于基准策略。通过与使用两个独立策略网络的情况进行比较，证明了双头策略网络的效率。我们的车队管理策略使平台和司机都受益，特别是在高需求的情况下。历史:本文已被《运输科学与物流新课题》运输科学特刊接受。本研究由新加坡教育部学术研究[Grant MOE2019-T2-2-165]和新加坡教育部[Grant R-266-000-135-114]资助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Transportation Science 工程技术-运筹学与管理科学

CiteScore

8.30

自引率

10.90%

发文量

111

审稿时长

12 months

期刊介绍： Transportation Science, published quarterly by INFORMS, is the flagship journal of the Transportation Science and Logistics Society of INFORMS. As the foremost scientific journal in the cross-disciplinary operational research field of transportation analysis, Transportation Science publishes high-quality original contributions and surveys on phenomena associated with all modes of transportation, present and prospective, including mainly all levels of planning, design, economic, operational, and social aspects. Transportation Science focuses primarily on fundamental theories, coupled with observational and experimental studies of transportation and logistics phenomena and processes, mathematical models, advanced methodologies and novel applications in transportation and logistics systems analysis, planning and design. The journal covers a broad range of topics that include vehicular and human traffic flow theories, models and their application to traffic operations and management, strategic, tactical, and operational planning of transportation and logistics systems; performance analysis methods and system design and optimization; theories and analysis methods for network and spatial activity interaction, equilibrium and dynamics; economics of transportation system supply and evaluation; methodologies for analysis of transportation user behavior and the demand for transportation and logistics services. Transportation Science is international in scope, with editors from nations around the globe. The editorial board reflects the diverse interdisciplinary interests of the transportation science and logistics community, with members that hold primary affiliations in engineering (civil, industrial, and aeronautical), physics, economics, applied mathematics, and business.