arXiv - CS - Multiagent Systems最新文献_第8页

Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space 利用统一行动空间改进物理异构多代理强化学习中的全局参数共享

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-14 DOI: arxiv-2408.07395

Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han

In a multi-agent system (MAS), action semantics indicates the differentinfluences of agents' actions toward other entities, and can be used to divideagents into groups in a physically heterogeneous MAS. Previous multi-agentreinforcement learning (MARL) algorithms apply global parameter-sharing acrossdifferent types of heterogeneous agents without careful discrimination ofdifferent action semantics. This common implementation decreases thecooperation and coordination between agents in complex situations. However,fully independent agent parameters dramatically increase the computational costand training difficulty. In order to benefit from the usage of different actionsemantics while also maintaining a proper parameter-sharing structure, weintroduce the Unified Action Space (UAS) to fulfill the requirement. The UAS isthe union set of all agent actions with different semantics. All agents firstcalculate their unified representation in the UAS, and then generate theirheterogeneous action policies using different available-action-masks. Tofurther improve the training of extra UAS parameters, we introduce aCross-Group Inverse (CGI) loss to predict other groups' agent policies with thetrajectory information. As a universal method for solving the physicallyheterogeneous MARL problem, we implement the UAS adding to both value-based andpolicy-based MARL algorithms, and propose two practical algorithms: U-QMIX andU-MAPPO. Experimental results in the SMAC environment prove the effectivenessof both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.

在多代理系统（MAS）中，行动语义表示代理的行动对其他实体的不同影响，可用于将物理异构 MAS 中的代理分成不同的组。以前的多代理强化学习（MARL）算法在不同类型的异构代理之间应用全局参数共享，而没有仔细区分不同的行动语义。这种常见的实现方式降低了复杂情况下代理之间的合作与协调。然而，完全独立的代理参数大大增加了计算成本和训练难度。为了从不同行动语义的使用中获益，同时保持适当的参数共享结构，我们引入了统一行动空间（UAS）来满足这一要求。统一行动空间是所有具有不同语义的代理行动的联合集。所有代理首先计算它们在 UAS 中的统一表示，然后使用不同的可用行动掩码生成它们的异构行动策略。为了进一步改进 UAS 额外参数的训练，我们引入了跨组反演（CGI）损失，利用轨迹信息预测其他组的代理策略。作为解决物理异构 MARL 问题的通用方法，我们将 UAS 添加到基于值和基于策略的 MARL 算法中，并提出了两种实用算法：U-QMIX 和 U-MAPPO。在 SMAC 环境中的实验结果证明，与几种最先进的 MARL 方法相比，U-QMIX 和 U-MAPPO 都很有效。

{"title":"Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space","authors":"Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han","doi":"arxiv-2408.07395","DOIUrl":"https://doi.org/arxiv-2408.07395","url":null,"abstract":"In a multi-agent system (MAS), action semantics indicates the different\u0000influences of agents' actions toward other entities, and can be used to divide\u0000agents into groups in a physically heterogeneous MAS. Previous multi-agent\u0000reinforcement learning (MARL) algorithms apply global parameter-sharing across\u0000different types of heterogeneous agents without careful discrimination of\u0000different action semantics. This common implementation decreases the\u0000cooperation and coordination between agents in complex situations. However,\u0000fully independent agent parameters dramatically increase the computational cost\u0000and training difficulty. In order to benefit from the usage of different action\u0000semantics while also maintaining a proper parameter-sharing structure, we\u0000introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is\u0000the union set of all agent actions with different semantics. All agents first\u0000calculate their unified representation in the UAS, and then generate their\u0000heterogeneous action policies using different available-action-masks. To\u0000further improve the training of extra UAS parameters, we introduce a\u0000Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the\u0000trajectory information. As a universal method for solving the physically\u0000heterogeneous MARL problem, we implement the UAS adding to both value-based and\u0000policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and\u0000U-MAPPO. Experimental results in the SMAC environment prove the effectiveness\u0000of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Agent Continuous Control with Generative Flow Networks 多代理连续控制与生成流网络

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-13 DOI: arxiv-2408.06920

Shuang Luo, Yinchuan Li, Shunyu Liu, Xu Zhang, Yunfeng Shao, Chao Wu

Generative Flow Networks (GFlowNets) aim to generate diverse trajectoriesfrom a distribution in which the final states of the trajectories areproportional to the reward, serving as a powerful alternative to reinforcementlearning for exploratory control tasks. However, the individual-flow matchingconstraint in GFlowNets limits their applications for multi-agent systems,especially continuous joint-control problems. In this paper, we propose a novelMulti-Agent generative Continuous Flow Networks (MACFN) method to enablemultiple agents to perform cooperative exploration for various compositionalcontinuous objects. Technically, MACFN trains decentralizedindividual-flow-based policies in a centralized global-flow-based matchingfashion. During centralized training, MACFN introduces a continuous flowdecomposition network to deduce the flow contributions of each agent in thepresence of only global rewards. Then agents can deliver actions solely basedon their assigned local flow in a decentralized way, forming a joint policydistribution proportional to the rewards. To guarantee the expressiveness ofcontinuous flow decomposition, we theoretically derive a consistency conditionon the decomposition network. Experimental results demonstrate that theproposed method yields results superior to the state-of-the-art counterpartsand better exploration capability. Our code is available athttps://github.com/isluoshuang/MACFN.

生成流网络（GFlowNets）旨在从一个分布中生成多样化的轨迹，在该分布中，轨迹的最终状态与奖励成正比，是探索性控制任务中强化学习的有力替代品。然而，GFlow 网络中的个体流匹配约束限制了其在多机器人系统中的应用，尤其是连续联合控制问题。在本文中，我们提出了一种新颖的多代理连续流网络（Multi-Agent generative Continuous Flow Networks，MACFN）方法，使多个代理能够对各种连续组成对象进行合作探索。从技术上讲，MACFN 以基于全局流的集中匹配方式训练基于个体流的分散策略。在集中式训练过程中，MACFN 引入了一个连续流分解网络，以推导出每个代理在只有全局奖励的情况下的流量贡献。然后，代理可以完全根据其分配的本地流量以分散的方式采取行动，形成与奖励成比例的联合策略分配。为了保证连续流分解的表现力，我们从理论上推导出了分解网络的一致性条件。实验结果表明，所提出的方法产生的结果优于最先进的同行方法，并具有更好的探索能力。我们的代码可在https://github.com/isluoshuang/MACFN。

{"title":"Multi-Agent Continuous Control with Generative Flow Networks","authors":"Shuang Luo, Yinchuan Li, Shunyu Liu, Xu Zhang, Yunfeng Shao, Chao Wu","doi":"arxiv-2408.06920","DOIUrl":"https://doi.org/arxiv-2408.06920","url":null,"abstract":"Generative Flow Networks (GFlowNets) aim to generate diverse trajectories\u0000from a distribution in which the final states of the trajectories are\u0000proportional to the reward, serving as a powerful alternative to reinforcement\u0000learning for exploratory control tasks. However, the individual-flow matching\u0000constraint in GFlowNets limits their applications for multi-agent systems,\u0000especially continuous joint-control problems. In this paper, we propose a novel\u0000Multi-Agent generative Continuous Flow Networks (MACFN) method to enable\u0000multiple agents to perform cooperative exploration for various compositional\u0000continuous objects. Technically, MACFN trains decentralized\u0000individual-flow-based policies in a centralized global-flow-based matching\u0000fashion. During centralized training, MACFN introduces a continuous flow\u0000decomposition network to deduce the flow contributions of each agent in the\u0000presence of only global rewards. Then agents can deliver actions solely based\u0000on their assigned local flow in a decentralized way, forming a joint policy\u0000distribution proportional to the rewards. To guarantee the expressiveness of\u0000continuous flow decomposition, we theoretically derive a consistency condition\u0000on the decomposition network. Experimental results demonstrate that the\u0000proposed method yields results superior to the state-of-the-art counterparts\u0000and better exploration capability. Our code is available at\u0000https://github.com/isluoshuang/MACFN.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing RAG Techniques for Automotive Industry PDF Chatbots: A Case Study with Locally Deployed Ollama Models 优化汽车行业 PDF 聊天机器人的 RAG 技术：本地部署的 Ollama 模型案例研究

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-12 DOI: arxiv-2408.05933

Fei Liu, Zejun Kang, Xing Han

With the growing demand for offline PDF chatbots in automotive industrialproduction environments, optimizing the deployment of large language models(LLMs) in local, low-performance settings has become increasingly important.This study focuses on enhancing Retrieval-Augmented Generation (RAG) techniquesfor processing complex automotive industry documents using locally deployedOllama models. Based on the Langchain framework, we propose a multi-dimensionaloptimization approach for Ollama's local RAG implementation. Our methodaddresses key challenges in automotive document processing, includingmulti-column layouts and technical specifications. We introduce improvements inPDF processing, retrieval mechanisms, and context compression, tailored to theunique characteristics of automotive industry documents. Additionally, wedesign custom classes supporting embedding pipelines and an agent supportingself-RAG based on LangGraph best practices. To evaluate our approach, weconstructed a proprietary dataset comprising typical automotive industrydocuments, including technical reports and corporate regulations. We comparedour optimized RAG model and self-RAG agent against a naive RAG baseline acrossthree datasets: our automotive industry dataset, QReCC, and CoQA. Resultsdemonstrate significant improvements in context precision, context recall,answer relevancy, and faithfulness, with particularly notable performance onthe automotive industry dataset. Our optimization scheme provides an effectivesolution for deploying local RAG systems in the automotive sector, addressingthe specific needs of PDF chatbots in industrial production environments. Thisresearch has important implications for advancing information processing andintelligent production in the automotive industry.

随着汽车工业生产环境中对离线 PDF 聊天机器人的需求日益增长，在本地、低性能环境中优化大型语言模型（LLM）的部署变得越来越重要。本研究的重点是利用本地部署的 Ollama 模型增强检索增强生成（RAG）技术，以处理复杂的汽车行业文档。基于 Langchain 框架，我们为 Ollama 的本地 RAG 实现提出了一种多维优化方法。我们的方法解决了汽车文档处理中的关键难题，包括多列布局和技术规范。我们针对汽车行业文档的独特性，在 PDF 处理、检索机制和上下文压缩方面进行了改进。此外，我们还设计了支持嵌入管道的自定义类，以及基于 LangGraph 最佳实践的支持自 RAG 的代理。为了评估我们的方法，我们构建了一个专有数据集，其中包括典型的汽车行业文档，包括技术报告和公司法规。我们在三个数据集（汽车行业数据集、QReCC 和 CoQA）上比较了我们的优化 RAG 模型和自 RAG 代理与原始 RAG 基线。结果表明，在上下文精确度、上下文召回率、答案相关性和忠实性方面都有显著提高，在汽车行业数据集上的表现尤为突出。我们的优化方案为在汽车行业部署本地 RAG 系统提供了有效的解决方案，满足了工业生产环境中 PDF 聊天机器人的特定需求。这项研究对推动汽车行业的信息处理和智能生产具有重要意义。

{"title":"Optimizing RAG Techniques for Automotive Industry PDF Chatbots: A Case Study with Locally Deployed Ollama Models","authors":"Fei Liu, Zejun Kang, Xing Han","doi":"arxiv-2408.05933","DOIUrl":"https://doi.org/arxiv-2408.05933","url":null,"abstract":"With the growing demand for offline PDF chatbots in automotive industrial\u0000production environments, optimizing the deployment of large language models\u0000(LLMs) in local, low-performance settings has become increasingly important.\u0000This study focuses on enhancing Retrieval-Augmented Generation (RAG) techniques\u0000for processing complex automotive industry documents using locally deployed\u0000Ollama models. Based on the Langchain framework, we propose a multi-dimensional\u0000optimization approach for Ollama's local RAG implementation. Our method\u0000addresses key challenges in automotive document processing, including\u0000multi-column layouts and technical specifications. We introduce improvements in\u0000PDF processing, retrieval mechanisms, and context compression, tailored to the\u0000unique characteristics of automotive industry documents. Additionally, we\u0000design custom classes supporting embedding pipelines and an agent supporting\u0000self-RAG based on LangGraph best practices. To evaluate our approach, we\u0000constructed a proprietary dataset comprising typical automotive industry\u0000documents, including technical reports and corporate regulations. We compared\u0000our optimized RAG model and self-RAG agent against a naive RAG baseline across\u0000three datasets: our automotive industry dataset, QReCC, and CoQA. Results\u0000demonstrate significant improvements in context precision, context recall,\u0000answer relevancy, and faithfulness, with particularly notable performance on\u0000the automotive industry dataset. Our optimization scheme provides an effective\u0000solution for deploying local RAG systems in the automotive sector, addressing\u0000the specific needs of PDF chatbots in industrial production environments. This\u0000research has important implications for advancing information processing and\u0000intelligent production in the automotive industry.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QTypeMix: Enhancing Multi-Agent Cooperative Strategies through Heterogeneous and Homogeneous Value Decomposition QTypeMix：通过异质和同质价值分解增强多代理合作策略

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-12 DOI: arxiv-2408.07098

Songchen Fu, Shaojing Zhao, Ta Li, YongHong Yan

In multi-agent cooperative tasks, the presence of heterogeneous agents isfamiliar. Compared to cooperation among homogeneous agents, collaborationrequires considering the best-suited sub-tasks for each agent. However, theoperation of multi-agent systems often involves a large amount of complexinteraction information, making it more challenging to learn heterogeneousstrategies. Related multi-agent reinforcement learning methods sometimes usegrouping mechanisms to form smaller cooperative groups or leverage prior domainknowledge to learn strategies for different roles. In contrast, agents shouldlearn deeper role features without relying on additional information.Therefore, we propose QTypeMix, which divides the value decomposition processinto homogeneous and heterogeneous stages. QTypeMix learns to extract typefeatures from local historical observations through the TE loss. In addition,we introduce advanced network structures containing attention mechanisms andhypernets to enhance the representation capability and achieve the valuedecomposition process. The results of testing the proposed method on 14 mapsfrom SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performancein tasks of varying difficulty.

在多代理合作任务中，异质代理的存在并不陌生。与同质代理之间的合作相比，合作需要考虑最适合每个代理的子任务。然而，多代理系统的运行往往涉及大量复杂的交互信息，这使得学习异构策略更具挑战性。相关的多代理强化学习方法有时会使用分组机制来形成较小的合作小组，或利用先前的领域知识来学习不同角色的策略。因此，我们提出了 QTypeMix，它将价值分解过程分为同质和异质两个阶段。QTypeMix 通过 TE loss 学习从本地历史观测中提取类型特征。此外，我们还引入了包含注意力机制和超网络的高级网络结构，以增强表示能力并实现值分解过程。对来自 SMAC 和 SMACv2 的 14 幅地图的测试结果表明，QTypeMix 在不同难度的任务中都取得了最先进的性能。

{"title":"QTypeMix: Enhancing Multi-Agent Cooperative Strategies through Heterogeneous and Homogeneous Value Decomposition","authors":"Songchen Fu, Shaojing Zhao, Ta Li, YongHong Yan","doi":"arxiv-2408.07098","DOIUrl":"https://doi.org/arxiv-2408.07098","url":null,"abstract":"In multi-agent cooperative tasks, the presence of heterogeneous agents is\u0000familiar. Compared to cooperation among homogeneous agents, collaboration\u0000requires considering the best-suited sub-tasks for each agent. However, the\u0000operation of multi-agent systems often involves a large amount of complex\u0000interaction information, making it more challenging to learn heterogeneous\u0000strategies. Related multi-agent reinforcement learning methods sometimes use\u0000grouping mechanisms to form smaller cooperative groups or leverage prior domain\u0000knowledge to learn strategies for different roles. In contrast, agents should\u0000learn deeper role features without relying on additional information.\u0000Therefore, we propose QTypeMix, which divides the value decomposition process\u0000into homogeneous and heterogeneous stages. QTypeMix learns to extract type\u0000features from local historical observations through the TE loss. In addition,\u0000we introduce advanced network structures containing attention mechanisms and\u0000hypernets to enhance the representation capability and achieve the value\u0000decomposition process. The results of testing the proposed method on 14 maps\u0000from SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performance\u0000in tasks of varying difficulty.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distributed Stackelberg Strategies in State-based Potential Games for Autonomous Decentralized Learning Manufacturing Systems 自主分散学习制造系统基于状态的潜能博弈中的分布式斯塔克尔伯格策略

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-12 DOI: arxiv-2408.06397

Steve Yuwono, Dorothea Schwung, Andreas Schwung

This article describes a novel game structure for autonomously optimizingdecentralized manufacturing systems with multi-objective optimizationchallenges, namely Distributed Stackelberg Strategies in State-Based PotentialGames (DS2-SbPG). DS2-SbPG integrates potential games and Stackelberg games,which improves the cooperative trade-off capabilities of potential games andthe multi-objective optimization handling by Stackelberg games. Notably, alltraining procedures remain conducted in a fully distributed manner. DS2-SbPGoffers a promising solution to finding optimal trade-offs between objectives byeliminating the complexities of setting up combined objective optimizationfunctions for individual players in self-learning domains, particularly inreal-world industrial settings with diverse and numerous objectives between thesub-systems. We further prove that DS2-SbPG constitutes a dynamic potentialgame that results in corresponding converge guarantees. Experimental validationconducted on a laboratory-scale testbed highlights the efficacy of DS2-SbPG andits two variants, such as DS2-SbPG for single-leader-follower and StackDS2-SbPG for multi-leader-follower. The results show significant reductions inpower consumption and improvements in overall performance, which signals thepotential of DS2-SbPG in real-world applications.

本文介绍了一种用于自主优化具有多目标优化挑战的分散式制造系统的新型博弈结构，即基于状态的潜在博弈中的分布式斯台克尔伯格策略（DS2-SbPG）。DS2-SbPG 整合了潜能博弈和斯台克尔伯格博弈，提高了潜能博弈的合作权衡能力和斯台克尔伯格博弈的多目标优化处理能力。值得注意的是，所有训练过程仍以完全分布式的方式进行。DS2-SbPG 为寻找目标间的最优权衡提供了一个很有前景的解决方案，它消除了在自学领域为单个参与者设置组合目标优化函数的复杂性，特别是在子系统间目标多样、数量众多的现实世界工业环境中。我们还进一步证明，DS2-SbPG 构成了一个动态势能博弈，能带来相应的收敛保证。在实验室规模的测试平台上进行的实验验证凸显了 DS2-SbPG 及其两个变体的功效，如用于单领导-追随者的 DS2-SbPG 和用于多领导-追随者的 StackDS2-SbPG。结果表明，功耗明显降低，整体性能显著提高，这预示着 DS2-SbPG 在实际应用中大有可为。

{"title":"Distributed Stackelberg Strategies in State-based Potential Games for Autonomous Decentralized Learning Manufacturing Systems","authors":"Steve Yuwono, Dorothea Schwung, Andreas Schwung","doi":"arxiv-2408.06397","DOIUrl":"https://doi.org/arxiv-2408.06397","url":null,"abstract":"This article describes a novel game structure for autonomously optimizing\u0000decentralized manufacturing systems with multi-objective optimization\u0000challenges, namely Distributed Stackelberg Strategies in State-Based Potential\u0000Games (DS2-SbPG). DS2-SbPG integrates potential games and Stackelberg games,\u0000which improves the cooperative trade-off capabilities of potential games and\u0000the multi-objective optimization handling by Stackelberg games. Notably, all\u0000training procedures remain conducted in a fully distributed manner. DS2-SbPG\u0000offers a promising solution to finding optimal trade-offs between objectives by\u0000eliminating the complexities of setting up combined objective optimization\u0000functions for individual players in self-learning domains, particularly in\u0000real-world industrial settings with diverse and numerous objectives between the\u0000sub-systems. We further prove that DS2-SbPG constitutes a dynamic potential\u0000game that results in corresponding converge guarantees. Experimental validation\u0000conducted on a laboratory-scale testbed highlights the efficacy of DS2-SbPG and\u0000its two variants, such as DS2-SbPG for single-leader-follower and Stack\u0000DS2-SbPG for multi-leader-follower. The results show significant reductions in\u0000power consumption and improvements in overall performance, which signals the\u0000potential of DS2-SbPG in real-world applications.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical Multi-Armed Bandits for the Concurrent Intelligent Tutoring of Concepts and Problems of Varying Difficulty Levels 用于不同难度概念和问题并发智能辅导的分层多臂匪帮

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-10 DOI: arxiv-2408.07208

Blake Castleman, Uzay Macar, Ansaf Salleb-Aouissi

Remote education has proliferated in the twenty-first century, yielding riseto intelligent tutoring systems. In particular, research has found multi-armedbandit (MAB) intelligent tutors to have notable abilities in traversing theexploration-exploitation trade-off landscape for student problemrecommendations. Prior literature, however, contains a significant lack ofopen-sourced MAB intelligent tutors, which impedes potential applications ofthese educational MAB recommendation systems. In this paper, we combine recentliterature on MAB intelligent tutoring techniques into an open-sourced andsimply deployable hierarchical MAB algorithm, capable of progressing studentsconcurrently through concepts and problems, determining ideal recommendedproblem difficulties, and assessing latent memory decay. We evaluate ouralgorithm using simulated groups of 500 students, utilizing Bayesian KnowledgeTracing to estimate students' content mastery. Results suggest that ouralgorithm, when turned difficulty-agnostic, significantly boosts studentsuccess, and that the further addition of problem-difficulty adaptation notablyimproves this metric.

二十一世纪，远程教育如雨后春笋般涌现，智能辅导系统也应运而生。特别是，研究发现多臂带位（MAB）智能导师在为学生推荐问题时具有显著的探索-开发权衡能力。然而，先前的文献中严重缺乏开源的 MAB 智能辅导员，这阻碍了这些教育 MAB 推荐系统的潜在应用。在本文中，我们将有关人机对话智能辅导技术的最新文献结合到一个开源且可简单部署的分层人机对话算法中，该算法能够让学生同时学习概念和问题，确定理想的推荐问题难度，并评估潜在记忆衰减。我们利用贝叶斯知识追踪来评估学生对内容的掌握程度，并使用 500 名学生组成的模拟组来评估我们的算法。结果表明，我们的算法在与难度无关的情况下能显著提高学生的成功率，而进一步增加问题难度适应性则能明显改善这一指标。

{"title":"Hierarchical Multi-Armed Bandits for the Concurrent Intelligent Tutoring of Concepts and Problems of Varying Difficulty Levels","authors":"Blake Castleman, Uzay Macar, Ansaf Salleb-Aouissi","doi":"arxiv-2408.07208","DOIUrl":"https://doi.org/arxiv-2408.07208","url":null,"abstract":"Remote education has proliferated in the twenty-first century, yielding rise\u0000to intelligent tutoring systems. In particular, research has found multi-armed\u0000bandit (MAB) intelligent tutors to have notable abilities in traversing the\u0000exploration-exploitation trade-off landscape for student problem\u0000recommendations. Prior literature, however, contains a significant lack of\u0000open-sourced MAB intelligent tutors, which impedes potential applications of\u0000these educational MAB recommendation systems. In this paper, we combine recent\u0000literature on MAB intelligent tutoring techniques into an open-sourced and\u0000simply deployable hierarchical MAB algorithm, capable of progressing students\u0000concurrently through concepts and problems, determining ideal recommended\u0000problem difficulties, and assessing latent memory decay. We evaluate our\u0000algorithm using simulated groups of 500 students, utilizing Bayesian Knowledge\u0000Tracing to estimate students' content mastery. Results suggest that our\u0000algorithm, when turned difficulty-agnostic, significantly boosts student\u0000success, and that the further addition of problem-difficulty adaptation notably\u0000improves this metric.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performative Prediction on Games and Mechanism Design 关于游戏和机制设计的表演性预测

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-09 DOI: arxiv-2408.05146

António Góis, Mehrnaz Mofakhami, Fernando P. Santos, Simon Lacoste-Julien, Gauthier Gidel

Predictions often influence the reality which they aim to predict, an effectknown as performativity. Existing work focuses on accuracy maximization underthis effect, but model deployment may have important unintended impacts,especially in multiagent scenarios. In this work, we investigate performativeprediction in a concrete game-theoretic setting where social welfare is analternative objective to accuracy maximization. We explore a collective riskdilemma scenario where maximising accuracy can negatively impact socialwelfare, when predicting collective behaviours. By assuming knowledge of aBayesian agent behavior model, we then show how to achieve better trade-offsand use them for mechanism design.

预测往往会影响其旨在预测的现实，这种效应被称为 "执行效应"。现有研究主要关注在这种效应下的准确性最大化，但模型部署可能会产生意想不到的重要影响，尤其是在多代理场景中。在这项工作中，我们研究了具体博弈论环境下的执行预测，在这种环境下，社会福利是准确性最大化的替代目标。我们探索了一种集体风险困境情景，在这种情景下，预测集体行为时，准确率最大化可能会对社会福利产生负面影响。通过假设贝叶斯代理行为模型的知识，我们展示了如何实现更好的权衡，并将其用于机制设计。

引用次数: 0

Performance Prediction of Hub-Based Swarms 基于集线器的蜂群性能预测

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-09 DOI: arxiv-2408.04822

Puneet Jain, Chaitanya Dwivedi, Vigynesh Bhatt, Nick Smith, Michael A Goodrich

A hub-based colony consists of multiple agents who share a common nest sitecalled the hub. Agents perform tasks away from the hub like foraging for foodor gathering information about future nest sites. Modeling hub-based coloniesis challenging because the size of the collective state space grows rapidly asthe number of agents grows. This paper presents a graph-based representation ofthe colony that can be combined with graph-based encoders to createlow-dimensional representations of collective state that can scale to manyagents for a best-of-N colony problem. We demonstrate how the information inthe low-dimensional embedding can be used with two experiments. First, we showhow the information in the tensor can be used to cluster collective states bythe probability of choosing the best site for a very small problem. Second, weshow how structured collective trajectories emerge when a graph encoder is usedto learn the low-dimensional embedding, and these trajectories have informationthat can be used to predict swarm performance.

基于集线器的蚁群由多个代理组成，它们共享一个共同的巢穴（称为集线器）。代理在远离中心的地方执行任务，如觅食或收集有关未来巢址的信息。对基于集线器的蚁群进行建模具有挑战性，因为随着代理数量的增加，集体状态空间的大小也会迅速增长。本文介绍了一种基于图的蚁群表示法，它可以与基于图的编码器相结合，创建集体状态的低维表示法，这种表示法可以扩展到 N 种最佳蚁群问题中的多个代理。我们通过两个实验展示了如何利用低维嵌入信息。首先，我们展示了如何利用张量中的信息，按照在一个很小的问题中选择最佳地点的概率，对集体状态进行聚类。其次，我们展示了当使用图编码器学习低维嵌入时，结构化的集体轨迹是如何出现的，这些轨迹中的信息可用于预测蜂群的表现。

引用次数: 0

Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity 在具有间接互惠性的混合动机游戏中学习公平合作

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-08 DOI: arxiv-2408.04549

Martin Smit, Fernando P. Santos

Altruistic cooperation is costly yet socially desirable. As a result, agentsstruggle to learn cooperative policies through independent reinforcementlearning (RL). Indirect reciprocity, where agents consider their interactionpartner's reputation, has been shown to stabilise cooperation in homogeneous,idealised populations. However, more realistic settings are comprised ofheterogeneous agents with different characteristics and group-based socialidentities. We study cooperation when agents are stratified into two suchgroups, and allow reputation updates and actions to depend on groupinformation. We consider two modelling approaches: evolutionary game theory,where we comprehensively search for social norms (i.e., rules to assignreputations) leading to cooperation and fairness; and RL, where we consider howthe stochastic dynamics of policy learning affects the analytically identifiedequilibria. We observe that a defecting majority leads the minority group todefect, but not the inverse. Moreover, changing the norms that judge in andout-group interactions can steer a system towards either fair or unfaircooperation. This is made clearer when moving beyond equilibrium analysis toindependent RL agents, where convergence to fair cooperation occurs with anarrower set of norms. Our results highlight that, in heterogeneous populationswith reputations, carefully defining interaction norms is fundamental to tackleboth dilemmas of cooperation and of fairness.

利他主义合作代价高昂，但却符合社会需求。因此，行为主体很难通过独立的强化学习（RL）来学习合作政策。间接互惠，即代理考虑其互动伙伴的声誉，已被证明能稳定同质理想化群体中的合作。然而，更现实的环境是由具有不同特征和基于群体的社会身份的异质代理组成的。我们研究了当代理分层为两个这样的群体时的合作，并允许声誉更新和行动取决于群体信息。我们考虑了两种建模方法：一是进化博弈论，即全面寻找导致合作与公平的社会规范（即分配声誉的规则）；二是 RL，即考虑政策学习的随机动态如何影响分析确定的均衡。我们观察到，多数人的叛变会导致少数人的叛变，但反之不会。此外，改变判断群体内和群体外互动的准则，可以引导系统走向公平或不公平的合作。当超越均衡分析，转而分析独立的 RL 代理时，这一点就变得更加清晰了。我们的研究结果突出表明，在有声誉的异质群体中，仔细定义互动规范是解决合作和公平两难问题的基础。

{"title":"Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity","authors":"Martin Smit, Fernando P. Santos","doi":"arxiv-2408.04549","DOIUrl":"https://doi.org/arxiv-2408.04549","url":null,"abstract":"Altruistic cooperation is costly yet socially desirable. As a result, agents\u0000struggle to learn cooperative policies through independent reinforcement\u0000learning (RL). Indirect reciprocity, where agents consider their interaction\u0000partner's reputation, has been shown to stabilise cooperation in homogeneous,\u0000idealised populations. However, more realistic settings are comprised of\u0000heterogeneous agents with different characteristics and group-based social\u0000identities. We study cooperation when agents are stratified into two such\u0000groups, and allow reputation updates and actions to depend on group\u0000information. We consider two modelling approaches: evolutionary game theory,\u0000where we comprehensively search for social norms (i.e., rules to assign\u0000reputations) leading to cooperation and fairness; and RL, where we consider how\u0000the stochastic dynamics of policy learning affects the analytically identified\u0000equilibria. We observe that a defecting majority leads the minority group to\u0000defect, but not the inverse. Moreover, changing the norms that judge in and\u0000out-group interactions can steer a system towards either fair or unfair\u0000cooperation. This is made clearer when moving beyond equilibrium analysis to\u0000independent RL agents, where convergence to fair cooperation occurs with a\u0000narrower set of norms. Our results highlight that, in heterogeneous populations\u0000with reputations, carefully defining interaction norms is fundamental to tackle\u0000both dilemmas of cooperation and of fairness.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Emergence in Multi-Agent Systems: A Safety Perspective 多代理系统中的涌现：安全视角

arXiv - CS - Multiagent Systems

Pub Date : 2024-08-08 DOI: arxiv-2408.04514

Philipp Altmann, Julian Schönberger, Steffen Illium, Maximilian Zorn, Fabian Ritz, Tom Haider, Simon Burton, Thomas Gabor

Emergent effects can arise in multi-agent systems (MAS) where execution isdecentralized and reliant on local information. These effects may range fromminor deviations in behavior to catastrophic system failures. To formallydefine these effects, we identify misalignments between the global inherentspecification (the true specification) and its local approximation (such as theconfiguration of different reward components or observations). Usingestablished safety terminology, we develop a framework to understand theseemergent effects. To showcase the resulting implications, we use two broadlyconfigurable exemplary gridworld scenarios, where insufficient specificationleads to unintended behavior deviations when derived independently. Recognizingthat a global adaptation might not always be feasible, we propose adjusting theunderlying parameterizations to mitigate these issues, thereby improving thesystem's alignment and reducing the risk of emergent failures.

在多代理系统（MAS）中，执行是非集中化的，依赖于本地信息，因此会产生突发效应。这些效应的范围可能从行为上的微小偏差到灾难性的系统故障。为了正式定义这些效应，我们确定了全局固有规范（真正的规范）与其局部近似值（如不同奖励组件的配置或观察结果）之间的错位。我们使用既定的安全术语，建立了一个框架来理解这些突发效应。为了展示由此产生的影响，我们使用了两个可广泛配置的示例网格世界场景，在这些场景中，当独立推导时，规范不足会导致意外的行为偏差。我们认识到全局适应可能并不总是可行的，因此建议调整基础参数化以缓解这些问题，从而改善系统的一致性并降低突发故障的风险。

引用次数: 0