Alba Aguilera, Miquel Albertí, Nardine Osman, Georgina Curto
In recent years, computational improvements have allowed for more nuanced, data-driven and geographically explicit agent-based simulations. So far, simulations have struggled to adequately represent the attributes that motivate the actions of the agents. In fact, existing population synthesis frameworks generate agent profiles limited to socio-demographic attributes. In this paper, we introduce a novel value-enriched population synthesis framework that integrates a motivational layer with the traditional individual and household socio-demographic layers. Our research highlights the significance of extending the profile of agents in synthetic populations by incorporating data on values, ideologies, opinions and vital priorities, which motivate the agents' behaviour. This motivational layer can help us develop a more nuanced decision-making mechanism for the agents in social simulation settings. Our methodology integrates microdata and macrodata within different Bayesian network structures. This contribution allows to generate synthetic populations with integrated value systems that preserve the inherent socio-demographic distributions of the real population in any specific region.
{"title":"Value-Enriched Population Synthesis: Integrating a Motivational Layer","authors":"Alba Aguilera, Miquel Albertí, Nardine Osman, Georgina Curto","doi":"arxiv-2408.09407","DOIUrl":"https://doi.org/arxiv-2408.09407","url":null,"abstract":"In recent years, computational improvements have allowed for more nuanced,\u0000data-driven and geographically explicit agent-based simulations. So far,\u0000simulations have struggled to adequately represent the attributes that motivate\u0000the actions of the agents. In fact, existing population synthesis frameworks\u0000generate agent profiles limited to socio-demographic attributes. In this paper,\u0000we introduce a novel value-enriched population synthesis framework that\u0000integrates a motivational layer with the traditional individual and household\u0000socio-demographic layers. Our research highlights the significance of extending\u0000the profile of agents in synthetic populations by incorporating data on values,\u0000ideologies, opinions and vital priorities, which motivate the agents'\u0000behaviour. This motivational layer can help us develop a more nuanced\u0000decision-making mechanism for the agents in social simulation settings. Our\u0000methodology integrates microdata and macrodata within different Bayesian\u0000network structures. This contribution allows to generate synthetic populations\u0000with integrated value systems that preserve the inherent socio-demographic\u0000distributions of the real population in any specific region.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In partially observable multi-agent systems, agents typically only have access to local observations. This severely hinders their ability to make precise decisions, particularly during decentralized execution. To alleviate this problem and inspired by image outpainting, we propose State Inference with Diffusion Models (SIDIFF), which uses diffusion models to reconstruct the original global state based solely on local observations. SIDIFF consists of a state generator and a state extractor, which allow agents to choose suitable actions by considering both the reconstructed global state and local observations. In addition, SIDIFF can be effortlessly incorporated into current multi-agent reinforcement learning algorithms to improve their performance. Finally, we evaluated SIDIFF on different experimental platforms, including Multi-Agent Battle City (MABC), a novel and flexible multi-agent reinforcement learning environment we developed. SIDIFF achieved desirable results and outperformed other popular algorithms.
{"title":"Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning","authors":"Zhiwei Xu, Hangyu Mao, Nianmin Zhang, Xin Xin, Pengjie Ren, Dapeng Li, Bin Zhang, Guoliang Fan, Zhumin Chen, Changwei Wang, Jiangjin Yin","doi":"arxiv-2408.09501","DOIUrl":"https://doi.org/arxiv-2408.09501","url":null,"abstract":"In partially observable multi-agent systems, agents typically only have\u0000access to local observations. This severely hinders their ability to make\u0000precise decisions, particularly during decentralized execution. To alleviate\u0000this problem and inspired by image outpainting, we propose State Inference with\u0000Diffusion Models (SIDIFF), which uses diffusion models to reconstruct the\u0000original global state based solely on local observations. SIDIFF consists of a\u0000state generator and a state extractor, which allow agents to choose suitable\u0000actions by considering both the reconstructed global state and local\u0000observations. In addition, SIDIFF can be effortlessly incorporated into current\u0000multi-agent reinforcement learning algorithms to improve their performance.\u0000Finally, we evaluated SIDIFF on different experimental platforms, including\u0000Multi-Agent Battle City (MABC), a novel and flexible multi-agent reinforcement\u0000learning environment we developed. SIDIFF achieved desirable results and\u0000outperformed other popular algorithms.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the problem of computing an approximate Nash equilibrium of a game whose strategy space is continuous without access to gradients of the utility function. Such games arise, for example, when players' strategies are represented by the parameters of a neural network. Lack of access to gradients is common in reinforcement learning settings, where the environment is treated as a black box, as well as equilibrium finding in mechanisms such as auctions, where the mechanism's payoffs are discontinuous in the players' actions. To tackle this problem, we turn to zeroth-order optimization techniques that combine pseudo-gradients with equilibrium-finding dynamics. Specifically, we introduce a new technique that requires a number of utility function evaluations per iteration that is constant rather than linear in the number of players. It achieves this by performing a single joint perturbation on all players' strategies, rather than perturbing each one individually. This yields a dramatic improvement for many-player games, especially when the utility function is expensive to compute in terms of wall time, memory, money, or other resources. We evaluate our approach on various games, including auctions, which have important real-world applications. Our approach yields a significant reduction in the run time required to reach an approximate Nash equilibrium.
{"title":"Joint-perturbation simultaneous pseudo-gradient","authors":"Carlos Martin, Tuomas Sandholm","doi":"arxiv-2408.09306","DOIUrl":"https://doi.org/arxiv-2408.09306","url":null,"abstract":"We study the problem of computing an approximate Nash equilibrium of a game\u0000whose strategy space is continuous without access to gradients of the utility\u0000function. Such games arise, for example, when players' strategies are\u0000represented by the parameters of a neural network. Lack of access to gradients\u0000is common in reinforcement learning settings, where the environment is treated\u0000as a black box, as well as equilibrium finding in mechanisms such as auctions,\u0000where the mechanism's payoffs are discontinuous in the players' actions. To\u0000tackle this problem, we turn to zeroth-order optimization techniques that\u0000combine pseudo-gradients with equilibrium-finding dynamics. Specifically, we\u0000introduce a new technique that requires a number of utility function\u0000evaluations per iteration that is constant rather than linear in the number of\u0000players. It achieves this by performing a single joint perturbation on all\u0000players' strategies, rather than perturbing each one individually. This yields\u0000a dramatic improvement for many-player games, especially when the utility\u0000function is expensive to compute in terms of wall time, memory, money, or other\u0000resources. We evaluate our approach on various games, including auctions, which\u0000have important real-world applications. Our approach yields a significant\u0000reduction in the run time required to reach an approximate Nash equilibrium.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Gao, Zhaoyang Ma, Xueyuan Li, Xiaoqiang Meng, Zirui Li
In the realm of heterogeneous mixed autonomy, vehicles experience dynamic spatial correlations and nonlinear temporal interactions in a complex, non-Euclidean space. These complexities pose significant challenges to traditional decision-making frameworks. Addressing this, we propose a hierarchical reinforcement learning framework integrated with multilevel graph representations, which effectively comprehends and models the spatiotemporal interactions among vehicles navigating through uncertain traffic conditions with varying decision-making systems. Rooted in multilevel graph representation theory, our approach encapsulates spatiotemporal relationships inherent in non-Euclidean spaces. A weighted graph represents spatiotemporal features between nodes, addressing the degree imbalance inherent in dynamic graphs. We integrate asynchronous parallel hierarchical reinforcement learning with a multilevel graph representation and a multi-head attention mechanism, which enables connected autonomous vehicles (CAVs) to exhibit capabilities akin to human cognition, facilitating consistent decision-making across various critical dimensions. The proposed decision-making strategy is validated in challenging environments characterized by high density, randomness, and dynamism on highway roads. We assess the performance of our framework through ablation studies, comparative analyses, and spatiotemporal trajectory evaluations. This study presents a quantitative analysis of decision-making mechanisms mirroring human cognitive functions in the realm of heterogeneous mixed autonomy, promoting the development of multi-dimensional decision-making strategies and a sophisticated distribution of attentional resources.
{"title":"Multilevel Graph Reinforcement Learning for Consistent Cognitive Decision-making in Heterogeneous Mixed Autonomy","authors":"Xin Gao, Zhaoyang Ma, Xueyuan Li, Xiaoqiang Meng, Zirui Li","doi":"arxiv-2408.08516","DOIUrl":"https://doi.org/arxiv-2408.08516","url":null,"abstract":"In the realm of heterogeneous mixed autonomy, vehicles experience dynamic\u0000spatial correlations and nonlinear temporal interactions in a complex,\u0000non-Euclidean space. These complexities pose significant challenges to\u0000traditional decision-making frameworks. Addressing this, we propose a\u0000hierarchical reinforcement learning framework integrated with multilevel graph\u0000representations, which effectively comprehends and models the spatiotemporal\u0000interactions among vehicles navigating through uncertain traffic conditions\u0000with varying decision-making systems. Rooted in multilevel graph representation\u0000theory, our approach encapsulates spatiotemporal relationships inherent in\u0000non-Euclidean spaces. A weighted graph represents spatiotemporal features\u0000between nodes, addressing the degree imbalance inherent in dynamic graphs. We\u0000integrate asynchronous parallel hierarchical reinforcement learning with a\u0000multilevel graph representation and a multi-head attention mechanism, which\u0000enables connected autonomous vehicles (CAVs) to exhibit capabilities akin to\u0000human cognition, facilitating consistent decision-making across various\u0000critical dimensions. The proposed decision-making strategy is validated in\u0000challenging environments characterized by high density, randomness, and\u0000dynamism on highway roads. We assess the performance of our framework through\u0000ablation studies, comparative analyses, and spatiotemporal trajectory\u0000evaluations. This study presents a quantitative analysis of decision-making\u0000mechanisms mirroring human cognitive functions in the realm of heterogeneous\u0000mixed autonomy, promoting the development of multi-dimensional decision-making\u0000strategies and a sophisticated distribution of attentional resources.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debashis Gupta, Aditi Golder, Luis Fernendez, Miles Silman, Greg Lersen, Fan Yang, Bob Plemmons, Sarra Alqahtani, Paul Victor Pauca
Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly destructive mining practice, leading to environmental disasters across the world's tropical watersheds. The topic of ASGM spans multiple domains of research and information, including natural and social systems, and knowledge is often atomized across a diversity of media and documents. We therefore introduce a knowledge graph (ASGM-KG) that consolidates and provides crucial information about ASGM practices and their environmental effects. The current version of ASGM-KG consists of 1,899 triples extracted using a large language model (LLM) from documents and reports published by both non-governmental and governmental organizations. These documents were carefully selected by a group of tropical ecologists with expertise in ASGM. This knowledge graph was validated using two methods. First, a small team of ASGM experts reviewed and labeled triples as factual or non-factual. Second, we devised and applied an automated factual reduction framework that relies on a search engine and an LLM for labeling triples. Our framework performs as well as five baselines on a publicly available knowledge graph and achieves over 90 accuracy on our ASGM-KG validated by domain experts. ASGM-KG demonstrates an advancement in knowledge aggregation and representation for complex, interdisciplinary environmental crises such as ASGM.
{"title":"ASGM-KG: Unveiling Alluvial Gold Mining Through Knowledge Graphs","authors":"Debashis Gupta, Aditi Golder, Luis Fernendez, Miles Silman, Greg Lersen, Fan Yang, Bob Plemmons, Sarra Alqahtani, Paul Victor Pauca","doi":"arxiv-2408.08972","DOIUrl":"https://doi.org/arxiv-2408.08972","url":null,"abstract":"Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly\u0000destructive mining practice, leading to environmental disasters across the\u0000world's tropical watersheds. The topic of ASGM spans multiple domains of\u0000research and information, including natural and social systems, and knowledge\u0000is often atomized across a diversity of media and documents. We therefore\u0000introduce a knowledge graph (ASGM-KG) that consolidates and provides crucial\u0000information about ASGM practices and their environmental effects. The current\u0000version of ASGM-KG consists of 1,899 triples extracted using a large language\u0000model (LLM) from documents and reports published by both non-governmental and\u0000governmental organizations. These documents were carefully selected by a group\u0000of tropical ecologists with expertise in ASGM. This knowledge graph was\u0000validated using two methods. First, a small team of ASGM experts reviewed and\u0000labeled triples as factual or non-factual. Second, we devised and applied an\u0000automated factual reduction framework that relies on a search engine and an LLM\u0000for labeling triples. Our framework performs as well as five baselines on a\u0000publicly available knowledge graph and achieves over 90 accuracy on our ASGM-KG\u0000validated by domain experts. ASGM-KG demonstrates an advancement in knowledge\u0000aggregation and representation for complex, interdisciplinary environmental\u0000crises such as ASGM.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lukas Kirchdorfer, Robert Blümel, Timotheus Kampik, Han van der Aa, Heiner Stuckenschmidt
Business process simulation (BPS) is a versatile technique for estimating process performance across various scenarios. Traditionally, BPS approaches employ a control-flow-first perspective by enriching a process model with simulation parameters. Although such approaches can mimic the behavior of centrally orchestrated processes, such as those supported by workflow systems, current control-flow-first approaches cannot faithfully capture the dynamics of real-world processes that involve distinct resource behavior and decentralized decision-making. Recognizing this issue, this paper introduces AgentSimulator, a resource-first BPS approach that discovers a multi-agent system from an event log, modeling distinct resource behaviors and interaction patterns to simulate the underlying process. Our experiments show that AgentSimulator achieves state-of-the-art simulation accuracy with significantly lower computation times than existing approaches while providing high interpretability and adaptability to different types of process-execution scenarios.
{"title":"AgentSimulator: An Agent-based Approach for Data-driven Business Process Simulation","authors":"Lukas Kirchdorfer, Robert Blümel, Timotheus Kampik, Han van der Aa, Heiner Stuckenschmidt","doi":"arxiv-2408.08571","DOIUrl":"https://doi.org/arxiv-2408.08571","url":null,"abstract":"Business process simulation (BPS) is a versatile technique for estimating\u0000process performance across various scenarios. Traditionally, BPS approaches\u0000employ a control-flow-first perspective by enriching a process model with\u0000simulation parameters. Although such approaches can mimic the behavior of\u0000centrally orchestrated processes, such as those supported by workflow systems,\u0000current control-flow-first approaches cannot faithfully capture the dynamics of\u0000real-world processes that involve distinct resource behavior and decentralized\u0000decision-making. Recognizing this issue, this paper introduces AgentSimulator,\u0000a resource-first BPS approach that discovers a multi-agent system from an event\u0000log, modeling distinct resource behaviors and interaction patterns to simulate\u0000the underlying process. Our experiments show that AgentSimulator achieves\u0000state-of-the-art simulation accuracy with significantly lower computation times\u0000than existing approaches while providing high interpretability and adaptability\u0000to different types of process-execution scenarios.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markov Potential Games (MPGs) form an important sub-class of Markov games, which are a common framework to model multi-agent reinforcement learning problems. In particular, MPGs include as a special case the identical-interest setting where all the agents share the same reward function. Scaling the performance of Nash equilibrium learning algorithms to a large number of agents is crucial for multi-agent systems. To address this important challenge, we focus on the independent learning setting where agents can only have access to their local information to update their own policy. In prior work on MPGs, the iteration complexity for obtaining $epsilon$-Nash regret scales linearly with the number of agents $N$. In this work, we investigate the iteration complexity of an independent policy mirror descent (PMD) algorithm for MPGs. We show that PMD with KL regularization, also known as natural policy gradient, enjoys a better $sqrt{N}$ dependence on the number of agents, improving over PMD with Euclidean regularization and prior work. Furthermore, the iteration complexity is also independent of the sizes of the agents' action spaces.
{"title":"Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players","authors":"Pragnya Alatur, Anas Barakat, Niao He","doi":"arxiv-2408.08075","DOIUrl":"https://doi.org/arxiv-2408.08075","url":null,"abstract":"Markov Potential Games (MPGs) form an important sub-class of Markov games,\u0000which are a common framework to model multi-agent reinforcement learning\u0000problems. In particular, MPGs include as a special case the identical-interest\u0000setting where all the agents share the same reward function. Scaling the\u0000performance of Nash equilibrium learning algorithms to a large number of agents\u0000is crucial for multi-agent systems. To address this important challenge, we\u0000focus on the independent learning setting where agents can only have access to\u0000their local information to update their own policy. In prior work on MPGs, the\u0000iteration complexity for obtaining $epsilon$-Nash regret scales linearly with\u0000the number of agents $N$. In this work, we investigate the iteration complexity\u0000of an independent policy mirror descent (PMD) algorithm for MPGs. We show that\u0000PMD with KL regularization, also known as natural policy gradient, enjoys a\u0000better $sqrt{N}$ dependence on the number of agents, improving over PMD with\u0000Euclidean regularization and prior work. Furthermore, the iteration complexity\u0000is also independent of the sizes of the agents' action spaces.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Bovo, Steven Abreu, Karan Ahuja, Eric J Gonzalez, Li-Te Cheng, Mar Gonzalez-Franco
XR devices running chat-bots powered by Large Language Models (LLMs) have tremendous potential as always-on agents that can enable much better productivity scenarios. However, screen based chat-bots do not take advantage of the the full-suite of natural inputs available in XR, including inward facing sensor data, instead they over-rely on explicit voice or text prompts, sometimes paired with multi-modal data dropped as part of the query. We propose a solution that leverages an attention framework that derives context implicitly from user actions, eye-gaze, and contextual memory within the XR environment. This minimizes the need for engineered explicit prompts, fostering grounded and intuitive interactions that glean user insights for the chat-bot. Our user studies demonstrate the imminent feasibility and transformative potential of our approach to streamline user interaction in XR with chat-bots, while offering insights for the design of future XR-embodied LLM agents.
{"title":"EmBARDiment: an Embodied AI Agent for Productivity in XR","authors":"Riccardo Bovo, Steven Abreu, Karan Ahuja, Eric J Gonzalez, Li-Te Cheng, Mar Gonzalez-Franco","doi":"arxiv-2408.08158","DOIUrl":"https://doi.org/arxiv-2408.08158","url":null,"abstract":"XR devices running chat-bots powered by Large Language Models (LLMs) have\u0000tremendous potential as always-on agents that can enable much better\u0000productivity scenarios. However, screen based chat-bots do not take advantage\u0000of the the full-suite of natural inputs available in XR, including inward\u0000facing sensor data, instead they over-rely on explicit voice or text prompts,\u0000sometimes paired with multi-modal data dropped as part of the query. We propose\u0000a solution that leverages an attention framework that derives context\u0000implicitly from user actions, eye-gaze, and contextual memory within the XR\u0000environment. This minimizes the need for engineered explicit prompts, fostering\u0000grounded and intuitive interactions that glean user insights for the chat-bot.\u0000Our user studies demonstrate the imminent feasibility and transformative\u0000potential of our approach to streamline user interaction in XR with chat-bots,\u0000while offering insights for the design of future XR-embodied LLM agents.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-agent systems must learn to communicate and understand interactions between agents to achieve cooperative goals in partially observed tasks. However, existing approaches lack a dynamic directed communication mechanism and rely on global states, thus diminishing the role of communication in centralized training. Thus, we propose the transformer-based graph coarsening network (TGCNet), a novel multi-agent reinforcement learning (MARL) algorithm. TGCNet learns the topological structure of a dynamic directed graph to represent the communication policy and integrates graph coarsening networks to approximate the representation of global state during training. It also utilizes the transformer decoder for feature extraction during execution. Experiments on multiple cooperative MARL benchmarks demonstrate state-of-the-art performance compared to popular MARL algorithms. Further ablation studies validate the effectiveness of our dynamic directed graph communication mechanism and graph coarsening networks.
{"title":"Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems","authors":"Zhuohui Zhang, Bin He, Bin Cheng, Gang Li","doi":"arxiv-2408.07397","DOIUrl":"https://doi.org/arxiv-2408.07397","url":null,"abstract":"Multi-agent systems must learn to communicate and understand interactions\u0000between agents to achieve cooperative goals in partially observed tasks.\u0000However, existing approaches lack a dynamic directed communication mechanism\u0000and rely on global states, thus diminishing the role of communication in\u0000centralized training. Thus, we propose the transformer-based graph coarsening\u0000network (TGCNet), a novel multi-agent reinforcement learning (MARL) algorithm.\u0000TGCNet learns the topological structure of a dynamic directed graph to\u0000represent the communication policy and integrates graph coarsening networks to\u0000approximate the representation of global state during training. It also\u0000utilizes the transformer decoder for feature extraction during execution.\u0000Experiments on multiple cooperative MARL benchmarks demonstrate\u0000state-of-the-art performance compared to popular MARL algorithms. Further\u0000ablation studies validate the effectiveness of our dynamic directed graph\u0000communication mechanism and graph coarsening networks.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Gao, Xueyuan Li, Hao Liu, Ao Li, Zhaoyang Ma, Zirui Li
Platooning technology is renowned for its precise vehicle control, traffic flow optimization, and energy efficiency enhancement. However, in large-scale mixed platoons, vehicle heterogeneity and unpredictable traffic conditions lead to virtual bottlenecks. These bottlenecks result in reduced traffic throughput and increased energy consumption within the platoon. To address these challenges, we introduce a decision-making strategy based on nested graph reinforcement learning. This strategy improves collaborative decision-making, ensuring energy efficiency and alleviating congestion. We propose a theory of nested traffic graph representation that maps dynamic interactions between vehicles and platoons in non-Euclidean spaces. By incorporating spatio-temporal weighted graph into a multi-head attention mechanism, we further enhance the model's capacity to process both local and global data. Additionally, we have developed a nested graph reinforcement learning framework to enhance the self-iterative learning capabilities of platooning. Using the I-24 dataset, we designed and conducted comparative algorithm experiments, generalizability testing, and permeability ablation experiments, thereby validating the proposed strategy's effectiveness. Compared to the baseline, our strategy increases throughput by 10% and decreases energy use by 9%. Specifically, increasing the penetration rate of CAVs significantly enhances traffic throughput, though it also increases energy consumption.
{"title":"A Nested Graph Reinforcement Learning-based Decision-making Strategy for Eco-platooning","authors":"Xin Gao, Xueyuan Li, Hao Liu, Ao Li, Zhaoyang Ma, Zirui Li","doi":"arxiv-2408.07578","DOIUrl":"https://doi.org/arxiv-2408.07578","url":null,"abstract":"Platooning technology is renowned for its precise vehicle control, traffic\u0000flow optimization, and energy efficiency enhancement. However, in large-scale\u0000mixed platoons, vehicle heterogeneity and unpredictable traffic conditions lead\u0000to virtual bottlenecks. These bottlenecks result in reduced traffic throughput\u0000and increased energy consumption within the platoon. To address these\u0000challenges, we introduce a decision-making strategy based on nested graph\u0000reinforcement learning. This strategy improves collaborative decision-making,\u0000ensuring energy efficiency and alleviating congestion. We propose a theory of\u0000nested traffic graph representation that maps dynamic interactions between\u0000vehicles and platoons in non-Euclidean spaces. By incorporating spatio-temporal\u0000weighted graph into a multi-head attention mechanism, we further enhance the\u0000model's capacity to process both local and global data. Additionally, we have\u0000developed a nested graph reinforcement learning framework to enhance the\u0000self-iterative learning capabilities of platooning. Using the I-24 dataset, we\u0000designed and conducted comparative algorithm experiments, generalizability\u0000testing, and permeability ablation experiments, thereby validating the proposed\u0000strategy's effectiveness. Compared to the baseline, our strategy increases\u0000throughput by 10% and decreases energy use by 9%. Specifically, increasing the\u0000penetration rate of CAVs significantly enhances traffic throughput, though it\u0000also increases energy consumption.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}