{"title":"Behaviorally-Aware Multi-Agent RL With Dynamic Optimization for Autonomous Driving","authors":"Hamid Taghavifar;Chuan Hu;Chongfeng Wei;Ardashir Mohammadzadeh;Chunwei Zhang","doi":"10.1109/TASE.2025.3527327","DOIUrl":null,"url":null,"abstract":"This study presents a novel Multi-Agent Reinforcement Learning (MURL) architecture for autonomous vehicle (AV) navigation in complex urban traffic environments. By integrating a Social Value Orientation (SVO) model into a model-free SARSA reinforcement learning framework, our approach effectively balances individual agents’ social preferences with safety and performance objectives. A logistic regression-based risk assessment module evaluates collision probabilities in real time by analyzing spatiotemporal dynamics such as distances and velocities. Additionally, a dynamic optimizer adapts the learning rate and exploration strategies of the SARSA algorithm to provide efficient convergence to optimal policies. Extensive simulation experiments demonstrate that the proposed method significantly enhances safety and efficiency, achieving a 55.6% reduction in collision risk and increasing average rewards per episode by 2.1 compared to traditional SARSA without SVO. Furthermore, the optimized policy reduces average episode length, indicating the framework’s effectiveness in providing robust decision-making and adaptability across various traffic scenarios. Note to Practitioners—The proposed framework in this paper is driven by the demand for comprehensive navigation systems in the rapidly evolving field of connected and autonomous vehicles (CAVs), especially within complex and unpredictable urban environments and mixed traffic scenarios. As AVs are getting more and more attention, the capacity to navigate effectively among many road users, including other AVs, pedestrians, and human-driven vehicles, is essential. Our framework builds upon the SARSA algorithm to produce an optimal policy for the AV and integrates a dynamic optimization method that represents the concept of risk as the inverse logistic of potential collisions. Distinctive to our proposed model is a finely-tuned Social Value Orientation (SVO) that captures the nuanced social dynamics between multiple autonomous agents, spanning a continuum from self-interested to entirely cooperative behaviors. This allows AVs to make decisions socially and cooperatively. This framework significantly influences the AV navigation sector by contributing to the development of secure, human-centric, and reliable transportation systems. Its multi-agent focus and the incorporation of dynamic optimization emphasize its potential to facilitate a network of AVs that interact with diverse road users, thus improving scalability. The robustness and adaptability of this machine learning-powered solution are crucial for navigating the varied scenarios that characterize urban driving, ensuring that AVs can adapt to changing conditions and make decisions that benefit all road users.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"10672-10683"},"PeriodicalIF":6.4000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10833800/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This study presents a novel Multi-Agent Reinforcement Learning (MURL) architecture for autonomous vehicle (AV) navigation in complex urban traffic environments. By integrating a Social Value Orientation (SVO) model into a model-free SARSA reinforcement learning framework, our approach effectively balances individual agents’ social preferences with safety and performance objectives. A logistic regression-based risk assessment module evaluates collision probabilities in real time by analyzing spatiotemporal dynamics such as distances and velocities. Additionally, a dynamic optimizer adapts the learning rate and exploration strategies of the SARSA algorithm to provide efficient convergence to optimal policies. Extensive simulation experiments demonstrate that the proposed method significantly enhances safety and efficiency, achieving a 55.6% reduction in collision risk and increasing average rewards per episode by 2.1 compared to traditional SARSA without SVO. Furthermore, the optimized policy reduces average episode length, indicating the framework’s effectiveness in providing robust decision-making and adaptability across various traffic scenarios. Note to Practitioners—The proposed framework in this paper is driven by the demand for comprehensive navigation systems in the rapidly evolving field of connected and autonomous vehicles (CAVs), especially within complex and unpredictable urban environments and mixed traffic scenarios. As AVs are getting more and more attention, the capacity to navigate effectively among many road users, including other AVs, pedestrians, and human-driven vehicles, is essential. Our framework builds upon the SARSA algorithm to produce an optimal policy for the AV and integrates a dynamic optimization method that represents the concept of risk as the inverse logistic of potential collisions. Distinctive to our proposed model is a finely-tuned Social Value Orientation (SVO) that captures the nuanced social dynamics between multiple autonomous agents, spanning a continuum from self-interested to entirely cooperative behaviors. This allows AVs to make decisions socially and cooperatively. This framework significantly influences the AV navigation sector by contributing to the development of secure, human-centric, and reliable transportation systems. Its multi-agent focus and the incorporation of dynamic optimization emphasize its potential to facilitate a network of AVs that interact with diverse road users, thus improving scalability. The robustness and adaptability of this machine learning-powered solution are crucial for navigating the varied scenarios that characterize urban driving, ensuring that AVs can adapt to changing conditions and make decisions that benefit all road users.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.