Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han
In a multi-agent system (MAS), action semantics indicates the different influences of agents' actions toward other entities, and can be used to divide agents into groups in a physically heterogeneous MAS. Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents without careful discrimination of different action semantics. This common implementation decreases the cooperation and coordination between agents in complex situations. However, fully independent agent parameters dramatically increase the computational cost and training difficulty. In order to benefit from the usage of different action semantics while also maintaining a proper parameter-sharing structure, we introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is the union set of all agent actions with different semantics. All agents first calculate their unified representation in the UAS, and then generate their heterogeneous action policies using different available-action-masks. To further improve the training of extra UAS parameters, we introduce a Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the trajectory information. As a universal method for solving the physically heterogeneous MARL problem, we implement the UAS adding to both value-based and policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and U-MAPPO. Experimental results in the SMAC environment prove the effectiveness of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.
{"title":"Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space","authors":"Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han","doi":"arxiv-2408.07395","DOIUrl":"https://doi.org/arxiv-2408.07395","url":null,"abstract":"In a multi-agent system (MAS), action semantics indicates the different\u0000influences of agents' actions toward other entities, and can be used to divide\u0000agents into groups in a physically heterogeneous MAS. Previous multi-agent\u0000reinforcement learning (MARL) algorithms apply global parameter-sharing across\u0000different types of heterogeneous agents without careful discrimination of\u0000different action semantics. This common implementation decreases the\u0000cooperation and coordination between agents in complex situations. However,\u0000fully independent agent parameters dramatically increase the computational cost\u0000and training difficulty. In order to benefit from the usage of different action\u0000semantics while also maintaining a proper parameter-sharing structure, we\u0000introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is\u0000the union set of all agent actions with different semantics. All agents first\u0000calculate their unified representation in the UAS, and then generate their\u0000heterogeneous action policies using different available-action-masks. To\u0000further improve the training of extra UAS parameters, we introduce a\u0000Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the\u0000trajectory information. As a universal method for solving the physically\u0000heterogeneous MARL problem, we implement the UAS adding to both value-based and\u0000policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and\u0000U-MAPPO. Experimental results in the SMAC environment prove the effectiveness\u0000of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generative Flow Networks (GFlowNets) aim to generate diverse trajectories from a distribution in which the final states of the trajectories are proportional to the reward, serving as a powerful alternative to reinforcement learning for exploratory control tasks. However, the individual-flow matching constraint in GFlowNets limits their applications for multi-agent systems, especially continuous joint-control problems. In this paper, we propose a novel Multi-Agent generative Continuous Flow Networks (MACFN) method to enable multiple agents to perform cooperative exploration for various compositional continuous objects. Technically, MACFN trains decentralized individual-flow-based policies in a centralized global-flow-based matching fashion. During centralized training, MACFN introduces a continuous flow decomposition network to deduce the flow contributions of each agent in the presence of only global rewards. Then agents can deliver actions solely based on their assigned local flow in a decentralized way, forming a joint policy distribution proportional to the rewards. To guarantee the expressiveness of continuous flow decomposition, we theoretically derive a consistency condition on the decomposition network. Experimental results demonstrate that the proposed method yields results superior to the state-of-the-art counterparts and better exploration capability. Our code is available at https://github.com/isluoshuang/MACFN.
{"title":"Multi-Agent Continuous Control with Generative Flow Networks","authors":"Shuang Luo, Yinchuan Li, Shunyu Liu, Xu Zhang, Yunfeng Shao, Chao Wu","doi":"arxiv-2408.06920","DOIUrl":"https://doi.org/arxiv-2408.06920","url":null,"abstract":"Generative Flow Networks (GFlowNets) aim to generate diverse trajectories\u0000from a distribution in which the final states of the trajectories are\u0000proportional to the reward, serving as a powerful alternative to reinforcement\u0000learning for exploratory control tasks. However, the individual-flow matching\u0000constraint in GFlowNets limits their applications for multi-agent systems,\u0000especially continuous joint-control problems. In this paper, we propose a novel\u0000Multi-Agent generative Continuous Flow Networks (MACFN) method to enable\u0000multiple agents to perform cooperative exploration for various compositional\u0000continuous objects. Technically, MACFN trains decentralized\u0000individual-flow-based policies in a centralized global-flow-based matching\u0000fashion. During centralized training, MACFN introduces a continuous flow\u0000decomposition network to deduce the flow contributions of each agent in the\u0000presence of only global rewards. Then agents can deliver actions solely based\u0000on their assigned local flow in a decentralized way, forming a joint policy\u0000distribution proportional to the rewards. To guarantee the expressiveness of\u0000continuous flow decomposition, we theoretically derive a consistency condition\u0000on the decomposition network. Experimental results demonstrate that the\u0000proposed method yields results superior to the state-of-the-art counterparts\u0000and better exploration capability. Our code is available at\u0000https://github.com/isluoshuang/MACFN.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growing demand for offline PDF chatbots in automotive industrial production environments, optimizing the deployment of large language models (LLMs) in local, low-performance settings has become increasingly important. This study focuses on enhancing Retrieval-Augmented Generation (RAG) techniques for processing complex automotive industry documents using locally deployed Ollama models. Based on the Langchain framework, we propose a multi-dimensional optimization approach for Ollama's local RAG implementation. Our method addresses key challenges in automotive document processing, including multi-column layouts and technical specifications. We introduce improvements in PDF processing, retrieval mechanisms, and context compression, tailored to the unique characteristics of automotive industry documents. Additionally, we design custom classes supporting embedding pipelines and an agent supporting self-RAG based on LangGraph best practices. To evaluate our approach, we constructed a proprietary dataset comprising typical automotive industry documents, including technical reports and corporate regulations. We compared our optimized RAG model and self-RAG agent against a naive RAG baseline across three datasets: our automotive industry dataset, QReCC, and CoQA. Results demonstrate significant improvements in context precision, context recall, answer relevancy, and faithfulness, with particularly notable performance on the automotive industry dataset. Our optimization scheme provides an effective solution for deploying local RAG systems in the automotive sector, addressing the specific needs of PDF chatbots in industrial production environments. This research has important implications for advancing information processing and intelligent production in the automotive industry.
随着汽车工业生产环境中对离线 PDF 聊天机器人的需求日益增长,在本地、低性能环境中优化大型语言模型(LLM)的部署变得越来越重要。本研究的重点是利用本地部署的 Ollama 模型增强检索增强生成(RAG)技术,以处理复杂的汽车行业文档。基于 Langchain 框架,我们为 Ollama 的本地 RAG 实现提出了一种多维优化方法。我们的方法解决了汽车文档处理中的关键难题,包括多列布局和技术规范。我们针对汽车行业文档的独特性,在 PDF 处理、检索机制和上下文压缩方面进行了改进。此外,我们还设计了支持嵌入管道的自定义类,以及基于 LangGraph 最佳实践的支持自 RAG 的代理。为了评估我们的方法,我们构建了一个专有数据集,其中包括典型的汽车行业文档,包括技术报告和公司法规。我们在三个数据集(汽车行业数据集、QReCC 和 CoQA)上比较了我们的优化 RAG 模型和自 RAG 代理与原始 RAG 基线。结果表明,在上下文精确度、上下文召回率、答案相关性和忠实性方面都有显著提高,在汽车行业数据集上的表现尤为突出。我们的优化方案为在汽车行业部署本地 RAG 系统提供了有效的解决方案,满足了工业生产环境中 PDF 聊天机器人的特定需求。这项研究对推动汽车行业的信息处理和智能生产具有重要意义。
{"title":"Optimizing RAG Techniques for Automotive Industry PDF Chatbots: A Case Study with Locally Deployed Ollama Models","authors":"Fei Liu, Zejun Kang, Xing Han","doi":"arxiv-2408.05933","DOIUrl":"https://doi.org/arxiv-2408.05933","url":null,"abstract":"With the growing demand for offline PDF chatbots in automotive industrial\u0000production environments, optimizing the deployment of large language models\u0000(LLMs) in local, low-performance settings has become increasingly important.\u0000This study focuses on enhancing Retrieval-Augmented Generation (RAG) techniques\u0000for processing complex automotive industry documents using locally deployed\u0000Ollama models. Based on the Langchain framework, we propose a multi-dimensional\u0000optimization approach for Ollama's local RAG implementation. Our method\u0000addresses key challenges in automotive document processing, including\u0000multi-column layouts and technical specifications. We introduce improvements in\u0000PDF processing, retrieval mechanisms, and context compression, tailored to the\u0000unique characteristics of automotive industry documents. Additionally, we\u0000design custom classes supporting embedding pipelines and an agent supporting\u0000self-RAG based on LangGraph best practices. To evaluate our approach, we\u0000constructed a proprietary dataset comprising typical automotive industry\u0000documents, including technical reports and corporate regulations. We compared\u0000our optimized RAG model and self-RAG agent against a naive RAG baseline across\u0000three datasets: our automotive industry dataset, QReCC, and CoQA. Results\u0000demonstrate significant improvements in context precision, context recall,\u0000answer relevancy, and faithfulness, with particularly notable performance on\u0000the automotive industry dataset. Our optimization scheme provides an effective\u0000solution for deploying local RAG systems in the automotive sector, addressing\u0000the specific needs of PDF chatbots in industrial production environments. This\u0000research has important implications for advancing information processing and\u0000intelligent production in the automotive industry.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In multi-agent cooperative tasks, the presence of heterogeneous agents is familiar. Compared to cooperation among homogeneous agents, collaboration requires considering the best-suited sub-tasks for each agent. However, the operation of multi-agent systems often involves a large amount of complex interaction information, making it more challenging to learn heterogeneous strategies. Related multi-agent reinforcement learning methods sometimes use grouping mechanisms to form smaller cooperative groups or leverage prior domain knowledge to learn strategies for different roles. In contrast, agents should learn deeper role features without relying on additional information. Therefore, we propose QTypeMix, which divides the value decomposition process into homogeneous and heterogeneous stages. QTypeMix learns to extract type features from local historical observations through the TE loss. In addition, we introduce advanced network structures containing attention mechanisms and hypernets to enhance the representation capability and achieve the value decomposition process. The results of testing the proposed method on 14 maps from SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performance in tasks of varying difficulty.
在多代理合作任务中,异质代理的存在并不陌生。与同质代理之间的合作相比,合作需要考虑最适合每个代理的子任务。然而,多代理系统的运行往往涉及大量复杂的交互信息,这使得学习异构策略更具挑战性。相关的多代理强化学习方法有时会使用分组机制来形成较小的合作小组,或利用先前的领域知识来学习不同角色的策略。因此,我们提出了 QTypeMix,它将价值分解过程分为同质和异质两个阶段。QTypeMix 通过 TE loss 学习从本地历史观测中提取类型特征。此外,我们还引入了包含注意力机制和超网络的高级网络结构,以增强表示能力并实现值分解过程。对来自 SMAC 和 SMACv2 的 14 幅地图的测试结果表明,QTypeMix 在不同难度的任务中都取得了最先进的性能。
{"title":"QTypeMix: Enhancing Multi-Agent Cooperative Strategies through Heterogeneous and Homogeneous Value Decomposition","authors":"Songchen Fu, Shaojing Zhao, Ta Li, YongHong Yan","doi":"arxiv-2408.07098","DOIUrl":"https://doi.org/arxiv-2408.07098","url":null,"abstract":"In multi-agent cooperative tasks, the presence of heterogeneous agents is\u0000familiar. Compared to cooperation among homogeneous agents, collaboration\u0000requires considering the best-suited sub-tasks for each agent. However, the\u0000operation of multi-agent systems often involves a large amount of complex\u0000interaction information, making it more challenging to learn heterogeneous\u0000strategies. Related multi-agent reinforcement learning methods sometimes use\u0000grouping mechanisms to form smaller cooperative groups or leverage prior domain\u0000knowledge to learn strategies for different roles. In contrast, agents should\u0000learn deeper role features without relying on additional information.\u0000Therefore, we propose QTypeMix, which divides the value decomposition process\u0000into homogeneous and heterogeneous stages. QTypeMix learns to extract type\u0000features from local historical observations through the TE loss. In addition,\u0000we introduce advanced network structures containing attention mechanisms and\u0000hypernets to enhance the representation capability and achieve the value\u0000decomposition process. The results of testing the proposed method on 14 maps\u0000from SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performance\u0000in tasks of varying difficulty.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article describes a novel game structure for autonomously optimizing decentralized manufacturing systems with multi-objective optimization challenges, namely Distributed Stackelberg Strategies in State-Based Potential Games (DS2-SbPG). DS2-SbPG integrates potential games and Stackelberg games, which improves the cooperative trade-off capabilities of potential games and the multi-objective optimization handling by Stackelberg games. Notably, all training procedures remain conducted in a fully distributed manner. DS2-SbPG offers a promising solution to finding optimal trade-offs between objectives by eliminating the complexities of setting up combined objective optimization functions for individual players in self-learning domains, particularly in real-world industrial settings with diverse and numerous objectives between the sub-systems. We further prove that DS2-SbPG constitutes a dynamic potential game that results in corresponding converge guarantees. Experimental validation conducted on a laboratory-scale testbed highlights the efficacy of DS2-SbPG and its two variants, such as DS2-SbPG for single-leader-follower and Stack DS2-SbPG for multi-leader-follower. The results show significant reductions in power consumption and improvements in overall performance, which signals the potential of DS2-SbPG in real-world applications.
{"title":"Distributed Stackelberg Strategies in State-based Potential Games for Autonomous Decentralized Learning Manufacturing Systems","authors":"Steve Yuwono, Dorothea Schwung, Andreas Schwung","doi":"arxiv-2408.06397","DOIUrl":"https://doi.org/arxiv-2408.06397","url":null,"abstract":"This article describes a novel game structure for autonomously optimizing\u0000decentralized manufacturing systems with multi-objective optimization\u0000challenges, namely Distributed Stackelberg Strategies in State-Based Potential\u0000Games (DS2-SbPG). DS2-SbPG integrates potential games and Stackelberg games,\u0000which improves the cooperative trade-off capabilities of potential games and\u0000the multi-objective optimization handling by Stackelberg games. Notably, all\u0000training procedures remain conducted in a fully distributed manner. DS2-SbPG\u0000offers a promising solution to finding optimal trade-offs between objectives by\u0000eliminating the complexities of setting up combined objective optimization\u0000functions for individual players in self-learning domains, particularly in\u0000real-world industrial settings with diverse and numerous objectives between the\u0000sub-systems. We further prove that DS2-SbPG constitutes a dynamic potential\u0000game that results in corresponding converge guarantees. Experimental validation\u0000conducted on a laboratory-scale testbed highlights the efficacy of DS2-SbPG and\u0000its two variants, such as DS2-SbPG for single-leader-follower and Stack\u0000DS2-SbPG for multi-leader-follower. The results show significant reductions in\u0000power consumption and improvements in overall performance, which signals the\u0000potential of DS2-SbPG in real-world applications.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Remote education has proliferated in the twenty-first century, yielding rise to intelligent tutoring systems. In particular, research has found multi-armed bandit (MAB) intelligent tutors to have notable abilities in traversing the exploration-exploitation trade-off landscape for student problem recommendations. Prior literature, however, contains a significant lack of open-sourced MAB intelligent tutors, which impedes potential applications of these educational MAB recommendation systems. In this paper, we combine recent literature on MAB intelligent tutoring techniques into an open-sourced and simply deployable hierarchical MAB algorithm, capable of progressing students concurrently through concepts and problems, determining ideal recommended problem difficulties, and assessing latent memory decay. We evaluate our algorithm using simulated groups of 500 students, utilizing Bayesian Knowledge Tracing to estimate students' content mastery. Results suggest that our algorithm, when turned difficulty-agnostic, significantly boosts student success, and that the further addition of problem-difficulty adaptation notably improves this metric.
二十一世纪,远程教育如雨后春笋般涌现,智能辅导系统也应运而生。特别是,研究发现多臂带位(MAB)智能导师在为学生推荐问题时具有显著的探索-开发权衡能力。然而,先前的文献中严重缺乏开源的 MAB 智能辅导员,这阻碍了这些教育 MAB 推荐系统的潜在应用。在本文中,我们将有关人机对话智能辅导技术的最新文献结合到一个开源且可简单部署的分层人机对话算法中,该算法能够让学生同时学习概念和问题,确定理想的推荐问题难度,并评估潜在记忆衰减。我们利用贝叶斯知识追踪来评估学生对内容的掌握程度,并使用 500 名学生组成的模拟组来评估我们的算法。结果表明,我们的算法在与难度无关的情况下能显著提高学生的成功率,而进一步增加问题难度适应性则能明显改善这一指标。
{"title":"Hierarchical Multi-Armed Bandits for the Concurrent Intelligent Tutoring of Concepts and Problems of Varying Difficulty Levels","authors":"Blake Castleman, Uzay Macar, Ansaf Salleb-Aouissi","doi":"arxiv-2408.07208","DOIUrl":"https://doi.org/arxiv-2408.07208","url":null,"abstract":"Remote education has proliferated in the twenty-first century, yielding rise\u0000to intelligent tutoring systems. In particular, research has found multi-armed\u0000bandit (MAB) intelligent tutors to have notable abilities in traversing the\u0000exploration-exploitation trade-off landscape for student problem\u0000recommendations. Prior literature, however, contains a significant lack of\u0000open-sourced MAB intelligent tutors, which impedes potential applications of\u0000these educational MAB recommendation systems. In this paper, we combine recent\u0000literature on MAB intelligent tutoring techniques into an open-sourced and\u0000simply deployable hierarchical MAB algorithm, capable of progressing students\u0000concurrently through concepts and problems, determining ideal recommended\u0000problem difficulties, and assessing latent memory decay. We evaluate our\u0000algorithm using simulated groups of 500 students, utilizing Bayesian Knowledge\u0000Tracing to estimate students' content mastery. Results suggest that our\u0000algorithm, when turned difficulty-agnostic, significantly boosts student\u0000success, and that the further addition of problem-difficulty adaptation notably\u0000improves this metric.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
António Góis, Mehrnaz Mofakhami, Fernando P. Santos, Simon Lacoste-Julien, Gauthier Gidel
Predictions often influence the reality which they aim to predict, an effect known as performativity. Existing work focuses on accuracy maximization under this effect, but model deployment may have important unintended impacts, especially in multiagent scenarios. In this work, we investigate performative prediction in a concrete game-theoretic setting where social welfare is an alternative objective to accuracy maximization. We explore a collective risk dilemma scenario where maximising accuracy can negatively impact social welfare, when predicting collective behaviours. By assuming knowledge of a Bayesian agent behavior model, we then show how to achieve better trade-offs and use them for mechanism design.
{"title":"Performative Prediction on Games and Mechanism Design","authors":"António Góis, Mehrnaz Mofakhami, Fernando P. Santos, Simon Lacoste-Julien, Gauthier Gidel","doi":"arxiv-2408.05146","DOIUrl":"https://doi.org/arxiv-2408.05146","url":null,"abstract":"Predictions often influence the reality which they aim to predict, an effect\u0000known as performativity. Existing work focuses on accuracy maximization under\u0000this effect, but model deployment may have important unintended impacts,\u0000especially in multiagent scenarios. In this work, we investigate performative\u0000prediction in a concrete game-theoretic setting where social welfare is an\u0000alternative objective to accuracy maximization. We explore a collective risk\u0000dilemma scenario where maximising accuracy can negatively impact social\u0000welfare, when predicting collective behaviours. By assuming knowledge of a\u0000Bayesian agent behavior model, we then show how to achieve better trade-offs\u0000and use them for mechanism design.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"119 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Puneet Jain, Chaitanya Dwivedi, Vigynesh Bhatt, Nick Smith, Michael A Goodrich
A hub-based colony consists of multiple agents who share a common nest site called the hub. Agents perform tasks away from the hub like foraging for food or gathering information about future nest sites. Modeling hub-based colonies is challenging because the size of the collective state space grows rapidly as the number of agents grows. This paper presents a graph-based representation of the colony that can be combined with graph-based encoders to create low-dimensional representations of collective state that can scale to many agents for a best-of-N colony problem. We demonstrate how the information in the low-dimensional embedding can be used with two experiments. First, we show how the information in the tensor can be used to cluster collective states by the probability of choosing the best site for a very small problem. Second, we show how structured collective trajectories emerge when a graph encoder is used to learn the low-dimensional embedding, and these trajectories have information that can be used to predict swarm performance.
基于集线器的蚁群由多个代理组成,它们共享一个共同的巢穴(称为集线器)。代理在远离中心的地方执行任务,如觅食或收集有关未来巢址的信息。对基于集线器的蚁群进行建模具有挑战性,因为随着代理数量的增加,集体状态空间的大小也会迅速增长。本文介绍了一种基于图的蚁群表示法,它可以与基于图的编码器相结合,创建集体状态的低维表示法,这种表示法可以扩展到 N 种最佳蚁群问题中的多个代理。我们通过两个实验展示了如何利用低维嵌入信息。首先,我们展示了如何利用张量中的信息,按照在一个很小的问题中选择最佳地点的概率,对集体状态进行聚类。其次,我们展示了当使用图编码器学习低维嵌入时,结构化的集体轨迹是如何出现的,这些轨迹中的信息可用于预测蜂群的表现。
{"title":"Performance Prediction of Hub-Based Swarms","authors":"Puneet Jain, Chaitanya Dwivedi, Vigynesh Bhatt, Nick Smith, Michael A Goodrich","doi":"arxiv-2408.04822","DOIUrl":"https://doi.org/arxiv-2408.04822","url":null,"abstract":"A hub-based colony consists of multiple agents who share a common nest site\u0000called the hub. Agents perform tasks away from the hub like foraging for food\u0000or gathering information about future nest sites. Modeling hub-based colonies\u0000is challenging because the size of the collective state space grows rapidly as\u0000the number of agents grows. This paper presents a graph-based representation of\u0000the colony that can be combined with graph-based encoders to create\u0000low-dimensional representations of collective state that can scale to many\u0000agents for a best-of-N colony problem. We demonstrate how the information in\u0000the low-dimensional embedding can be used with two experiments. First, we show\u0000how the information in the tensor can be used to cluster collective states by\u0000the probability of choosing the best site for a very small problem. Second, we\u0000show how structured collective trajectories emerge when a graph encoder is used\u0000to learn the low-dimensional embedding, and these trajectories have information\u0000that can be used to predict swarm performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Altruistic cooperation is costly yet socially desirable. As a result, agents struggle to learn cooperative policies through independent reinforcement learning (RL). Indirect reciprocity, where agents consider their interaction partner's reputation, has been shown to stabilise cooperation in homogeneous, idealised populations. However, more realistic settings are comprised of heterogeneous agents with different characteristics and group-based social identities. We study cooperation when agents are stratified into two such groups, and allow reputation updates and actions to depend on group information. We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria. We observe that a defecting majority leads the minority group to defect, but not the inverse. Moreover, changing the norms that judge in and out-group interactions can steer a system towards either fair or unfair cooperation. This is made clearer when moving beyond equilibrium analysis to independent RL agents, where convergence to fair cooperation occurs with a narrower set of norms. Our results highlight that, in heterogeneous populations with reputations, carefully defining interaction norms is fundamental to tackle both dilemmas of cooperation and of fairness.
{"title":"Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity","authors":"Martin Smit, Fernando P. Santos","doi":"arxiv-2408.04549","DOIUrl":"https://doi.org/arxiv-2408.04549","url":null,"abstract":"Altruistic cooperation is costly yet socially desirable. As a result, agents\u0000struggle to learn cooperative policies through independent reinforcement\u0000learning (RL). Indirect reciprocity, where agents consider their interaction\u0000partner's reputation, has been shown to stabilise cooperation in homogeneous,\u0000idealised populations. However, more realistic settings are comprised of\u0000heterogeneous agents with different characteristics and group-based social\u0000identities. We study cooperation when agents are stratified into two such\u0000groups, and allow reputation updates and actions to depend on group\u0000information. We consider two modelling approaches: evolutionary game theory,\u0000where we comprehensively search for social norms (i.e., rules to assign\u0000reputations) leading to cooperation and fairness; and RL, where we consider how\u0000the stochastic dynamics of policy learning affects the analytically identified\u0000equilibria. We observe that a defecting majority leads the minority group to\u0000defect, but not the inverse. Moreover, changing the norms that judge in and\u0000out-group interactions can steer a system towards either fair or unfair\u0000cooperation. This is made clearer when moving beyond equilibrium analysis to\u0000independent RL agents, where convergence to fair cooperation occurs with a\u0000narrower set of norms. Our results highlight that, in heterogeneous populations\u0000with reputations, carefully defining interaction norms is fundamental to tackle\u0000both dilemmas of cooperation and of fairness.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Altmann, Julian Schönberger, Steffen Illium, Maximilian Zorn, Fabian Ritz, Tom Haider, Simon Burton, Thomas Gabor
Emergent effects can arise in multi-agent systems (MAS) where execution is decentralized and reliant on local information. These effects may range from minor deviations in behavior to catastrophic system failures. To formally define these effects, we identify misalignments between the global inherent specification (the true specification) and its local approximation (such as the configuration of different reward components or observations). Using established safety terminology, we develop a framework to understand these emergent effects. To showcase the resulting implications, we use two broadly configurable exemplary gridworld scenarios, where insufficient specification leads to unintended behavior deviations when derived independently. Recognizing that a global adaptation might not always be feasible, we propose adjusting the underlying parameterizations to mitigate these issues, thereby improving the system's alignment and reducing the risk of emergent failures.
{"title":"Emergence in Multi-Agent Systems: A Safety Perspective","authors":"Philipp Altmann, Julian Schönberger, Steffen Illium, Maximilian Zorn, Fabian Ritz, Tom Haider, Simon Burton, Thomas Gabor","doi":"arxiv-2408.04514","DOIUrl":"https://doi.org/arxiv-2408.04514","url":null,"abstract":"Emergent effects can arise in multi-agent systems (MAS) where execution is\u0000decentralized and reliant on local information. These effects may range from\u0000minor deviations in behavior to catastrophic system failures. To formally\u0000define these effects, we identify misalignments between the global inherent\u0000specification (the true specification) and its local approximation (such as the\u0000configuration of different reward components or observations). Using\u0000established safety terminology, we develop a framework to understand these\u0000emergent effects. To showcase the resulting implications, we use two broadly\u0000configurable exemplary gridworld scenarios, where insufficient specification\u0000leads to unintended behavior deviations when derived independently. Recognizing\u0000that a global adaptation might not always be feasible, we propose adjusting the\u0000underlying parameterizations to mitigate these issues, thereby improving the\u0000system's alignment and reducing the risk of emergent failures.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}