首页 > 最新文献

arXiv - CS - Multiagent Systems最新文献

英文 中文
Managing multiple agents by automatically adjusting incentives 通过自动调整激励措施管理多个代理
Pub Date : 2024-09-03 DOI: arxiv-2409.02960
Shunichi Akatsuka, Yaemi Teramoto, Aaron Courville
In the coming years, AI agents will be used for making more complexdecisions, including in situations involving many different groups of people.One big challenge is that AI agent tends to act in its own interest, unlikehumans who often think about what will be the best for everyone in the longrun. In this paper, we explore a method to get self-interested agents to worktowards goals that benefit society as a whole. We propose a method to add amanager agent to mediate agent interactions by assigning incentives to certainactions. We tested our method with a supply-chain management problem and showedthat this framework (1) increases the raw reward by 22.2%, (2) increases theagents' reward by 23.8%, and (3) increases the manager's reward by 20.1%.
未来几年,人工智能代理将被用于做出更复杂的决定,包括在涉及许多不同群体的情况下。一个巨大的挑战是,人工智能代理倾向于从自身利益出发,而人类则不同,他们经常考虑的是,从长远来看,什么对每个人都是最好的。在本文中,我们将探索一种方法,让自利的人工智能代理朝着有利于整个社会的目标努力。我们提出了一种添加管理者代理的方法,通过为某些行为分配激励机制来调解代理之间的互动。我们用一个供应链管理问题对我们的方法进行了测试,结果表明这个框架(1)使原始奖励增加了 22.2%,(2)使代理奖励增加了 23.8%,(3)使管理者奖励增加了 20.1%。
{"title":"Managing multiple agents by automatically adjusting incentives","authors":"Shunichi Akatsuka, Yaemi Teramoto, Aaron Courville","doi":"arxiv-2409.02960","DOIUrl":"https://doi.org/arxiv-2409.02960","url":null,"abstract":"In the coming years, AI agents will be used for making more complex\u0000decisions, including in situations involving many different groups of people.\u0000One big challenge is that AI agent tends to act in its own interest, unlike\u0000humans who often think about what will be the best for everyone in the long\u0000run. In this paper, we explore a method to get self-interested agents to work\u0000towards goals that benefit society as a whole. We propose a method to add a\u0000manager agent to mediate agent interactions by assigning incentives to certain\u0000actions. We tested our method with a supply-chain management problem and showed\u0000that this framework (1) increases the raw reward by 22.2%, (2) increases the\u0000agents' reward by 23.8%, and (3) increases the manager's reward by 20.1%.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AIvril: AI-Driven RTL Generation With Verification In-The-Loop AIvril:通过环内验证实现人工智能驱动的 RTL 生成
Pub Date : 2024-09-03 DOI: arxiv-2409.11411
Mubashir ul Islam, Humza Sami, Pierre-Emmanuel Gaillardon, Valerio Tenace
Large Language Models (LLMs) are computational models capable of performingcomplex natural language processing tasks. Leveraging these capabilities, LLMshold the potential to transform the entire hardware design stack, withpredictions suggesting that front-end and back-end tasks could be fullyautomated in the near future. Currently, LLMs show great promise instreamlining Register Transfer Level (RTL) generation, enhancing efficiency,and accelerating innovation. However, their probabilistic nature makes themprone to inaccuracies - a significant drawback in RTL design, where reliabilityand precision are essential. To address these challenges, this paper introduces AIvril, an advancedframework designed to enhance the accuracy and reliability of RTL-aware LLMs.AIvril employs a multi-agent, LLM-agnostic system for automatic syntaxcorrection and functional verification, significantly reducing - and in manycases, completely eliminating - instances of erroneous code generation.Experimental results conducted on the VerilogEval-Human dataset show that ourframework improves code quality by nearly 2x when compared to previous works,while achieving an 88.46% success rate in meeting verification objectives. Thisrepresents a critical step toward automating and optimizing hardware designworkflows, offering a more dependable methodology for AI-driven RTL design.
大型语言模型(LLM)是能够执行复杂自然语言处理任务的计算模型。利用这些能力,LLM 有可能改变整个硬件设计堆栈,据预测,在不久的将来,前端和后端任务可以完全自动化。目前,LLM 在简化寄存器传输层(RTL)生成、提高效率和加速创新方面大有可为。然而,LLM 的概率性质使其容易出现误差,这在 RTL 设计中是一个重大缺陷,因为可靠性和精确性对 RTL 设计至关重要。为了应对这些挑战,本文介绍了 AIvril,这是一个先进的框架,旨在提高 RTL 感知 LLM 的准确性和可靠性。AIvril 采用了一个多代理、LLM 识别系统,用于自动语法校正和功能验证,显著减少--在许多情况下甚至完全消除--错误代码生成的情况。在 VerilogEval-Human 数据集上进行的实验结果表明,与以前的工作相比,我们的框架将代码质量提高了近 2 倍,同时在实现验证目标方面达到了 88.46% 的成功率。这是向自动化和优化硬件设计工作流程迈出的关键一步,为人工智能驱动的 RTL 设计提供了更可靠的方法。
{"title":"AIvril: AI-Driven RTL Generation With Verification In-The-Loop","authors":"Mubashir ul Islam, Humza Sami, Pierre-Emmanuel Gaillardon, Valerio Tenace","doi":"arxiv-2409.11411","DOIUrl":"https://doi.org/arxiv-2409.11411","url":null,"abstract":"Large Language Models (LLMs) are computational models capable of performing\u0000complex natural language processing tasks. Leveraging these capabilities, LLMs\u0000hold the potential to transform the entire hardware design stack, with\u0000predictions suggesting that front-end and back-end tasks could be fully\u0000automated in the near future. Currently, LLMs show great promise in\u0000streamlining Register Transfer Level (RTL) generation, enhancing efficiency,\u0000and accelerating innovation. However, their probabilistic nature makes them\u0000prone to inaccuracies - a significant drawback in RTL design, where reliability\u0000and precision are essential. To address these challenges, this paper introduces AIvril, an advanced\u0000framework designed to enhance the accuracy and reliability of RTL-aware LLMs.\u0000AIvril employs a multi-agent, LLM-agnostic system for automatic syntax\u0000correction and functional verification, significantly reducing - and in many\u0000cases, completely eliminating - instances of erroneous code generation.\u0000Experimental results conducted on the VerilogEval-Human dataset show that our\u0000framework improves code quality by nearly 2x when compared to previous works,\u0000while achieving an 88.46% success rate in meeting verification objectives. This\u0000represents a critical step toward automating and optimizing hardware design\u0000workflows, offering a more dependable methodology for AI-driven RTL design.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolution of Social Norms in LLM Agents using Natural Language 使用自然语言演化 LLM 代理中的社会规范
Pub Date : 2024-09-02 DOI: arxiv-2409.00993
Ilya Horiguchi, Takahide Yoshida, Takashi Ikegami
Recent advancements in Large Language Models (LLMs) have spurred a surge ofinterest in leveraging these models for game-theoretical simulations, whereLLMs act as individual agents engaging in social interactions. This studyexplores the potential for LLM agents to spontaneously generate and adhere tonormative strategies through natural language discourse, building upon thefoundational work of Axelrod's metanorm games. Our experiments demonstrate thatthrough dialogue, LLM agents can form complex social norms, such asmetanorms-norms enforcing the punishment of those who do not punishcheating-purely through natural language interaction. The results affirm theeffectiveness of using LLM agents for simulating social interactions andunderstanding the emergence and evolution of complex strategies and normsthrough natural language. Future work may extend these findings byincorporating a wider range of scenarios and agent characteristics, aiming touncover more nuanced mechanisms behind social norm formation.
大语言模型(LLM)的最新进展激发了人们对利用这些模型进行博弈论模拟的浓厚兴趣,在博弈论模拟中,大语言模型作为个体代理参与社会互动。本研究以阿克塞尔罗德(Axelrod)的元规范游戏(metanorm games)为基础,探索了 LLM 代理通过自然语言对话自发生成并遵守规范策略的潜力。我们的实验证明,通过对话,LLM代理可以形成复杂的社会规范,如元规范--强制惩罚那些不惩罚偷吃者的规范--纯粹是通过自然语言交互实现的。这些结果肯定了使用 LLM 代理模拟社会互动以及通过自然语言理解复杂策略和规范的出现和演化的有效性。未来的工作可能会通过纳入更广泛的情景和代理特征来扩展这些发现,旨在探索社会规范形成背后更细微的机制。
{"title":"Evolution of Social Norms in LLM Agents using Natural Language","authors":"Ilya Horiguchi, Takahide Yoshida, Takashi Ikegami","doi":"arxiv-2409.00993","DOIUrl":"https://doi.org/arxiv-2409.00993","url":null,"abstract":"Recent advancements in Large Language Models (LLMs) have spurred a surge of\u0000interest in leveraging these models for game-theoretical simulations, where\u0000LLMs act as individual agents engaging in social interactions. This study\u0000explores the potential for LLM agents to spontaneously generate and adhere to\u0000normative strategies through natural language discourse, building upon the\u0000foundational work of Axelrod's metanorm games. Our experiments demonstrate that\u0000through dialogue, LLM agents can form complex social norms, such as\u0000metanorms-norms enforcing the punishment of those who do not punish\u0000cheating-purely through natural language interaction. The results affirm the\u0000effectiveness of using LLM agents for simulating social interactions and\u0000understanding the emergence and evolution of complex strategies and norms\u0000through natural language. Future work may extend these findings by\u0000incorporating a wider range of scenarios and agent characteristics, aiming to\u0000uncover more nuanced mechanisms behind social norm formation.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques 从人类反馈中进行多代理强化学习:数据覆盖和算法技术
Pub Date : 2024-09-01 DOI: arxiv-2409.00717
Natalia Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du
We initiate the study of Multi-Agent Reinforcement Learning from HumanFeedback (MARLHF), exploring both theoretical foundations and empiricalvalidations. We define the task as identifying Nash equilibrium from apreference-only offline dataset in general-sum games, a problem marked by thechallenge of sparse feedback signals. Our theory establishes the uppercomplexity bounds for Nash Equilibrium in effective MARLHF, demonstrating thatsingle-policy coverage is inadequate and highlighting the importance ofunilateral dataset coverage. These theoretical insights are verified throughcomprehensive experiments. To enhance the practical performance, we furtherintroduce two algorithmic techniques. (1) We propose a Mean Squared Error (MSE)regularization along the time axis to achieve a more uniform rewarddistribution and improve reward learning outcomes. (2) We utilize imitationlearning to approximate the reference policy, ensuring stability andeffectiveness in training. Our findings underscore the multifaceted approachrequired for MARLHF, paving the way for effective preference-based multi-agentsystems.
我们开始研究从人类反馈进行多代理强化学习(MARLHF),探索理论基础和经验验证。我们将任务定义为从仅有参考的离线数据集中识别通和博弈中的纳什均衡,这是一个以反馈信号稀疏为特征的难题。我们的理论确立了有效 MARLHF 中纳什均衡的复杂度上限,证明了单一政策覆盖是不够的,并强调了单边数据集覆盖的重要性。这些理论见解通过全面的实验得到了验证。为了提高实际性能,我们进一步引入了两种算法技术。(1) 我们提出了沿时间轴的均方误差(MSE)正则化,以实现更均匀的奖励分布,提高奖励学习效果。(2) 我们利用模仿学习来近似参考策略,确保训练的稳定性和有效性。我们的研究结果强调了 MARLHF 所需的多方面方法,为有效的基于偏好的多代理系统铺平了道路。
{"title":"Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques","authors":"Natalia Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du","doi":"arxiv-2409.00717","DOIUrl":"https://doi.org/arxiv-2409.00717","url":null,"abstract":"We initiate the study of Multi-Agent Reinforcement Learning from Human\u0000Feedback (MARLHF), exploring both theoretical foundations and empirical\u0000validations. We define the task as identifying Nash equilibrium from a\u0000preference-only offline dataset in general-sum games, a problem marked by the\u0000challenge of sparse feedback signals. Our theory establishes the upper\u0000complexity bounds for Nash Equilibrium in effective MARLHF, demonstrating that\u0000single-policy coverage is inadequate and highlighting the importance of\u0000unilateral dataset coverage. These theoretical insights are verified through\u0000comprehensive experiments. To enhance the practical performance, we further\u0000introduce two algorithmic techniques. (1) We propose a Mean Squared Error (MSE)\u0000regularization along the time axis to achieve a more uniform reward\u0000distribution and improve reward learning outcomes. (2) We utilize imitation\u0000learning to approximate the reference policy, ensuring stability and\u0000effectiveness in training. Our findings underscore the multifaceted approach\u0000required for MARLHF, paving the way for effective preference-based multi-agent\u0000systems.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Learnable Agent Collaboration Network Framework for Personalized Multimodal AI Search Engine 用于个性化多模态人工智能搜索引擎的可学习代理协作网络框架
Pub Date : 2024-09-01 DOI: arxiv-2409.00636
Yunxiao Shi, Min Xu, Haimin Zhang, Xing Zi, Qiang Wu
Large language models (LLMs) and retrieval-augmented generation (RAG)techniques have revolutionized traditional information access, enabling AIagent to search and summarize information on behalf of users during dynamicdialogues. Despite their potential, current AI search engines exhibitconsiderable room for improvement in several critical areas. These areasinclude the support for multimodal information, the delivery of personalizedresponses, the capability to logically answer complex questions, and thefacilitation of more flexible interactions. This paper proposes a novel AISearch Engine framework called the Agent Collaboration Network (ACN). The ACNframework consists of multiple specialized agents working collaboratively, eachwith distinct roles such as Account Manager, Solution Strategist, InformationManager, and Content Creator. This framework integrates mechanisms for picturecontent understanding, user profile tracking, and online evolution, enhancingthe AI search engine's response quality, personalization, and interactivity. Ahighlight of the ACN is the introduction of a Reflective Forward Optimizationmethod (RFO), which supports the online synergistic adjustment among agents.This feature endows the ACN with online learning capabilities, ensuring thatthe system has strong interactive flexibility and can promptly adapt to userfeedback. This learning method may also serve as an optimization approach foragent-based systems, potentially influencing other domains of agentapplications.
大型语言模型(LLM)和检索增强生成(RAG)技术彻底改变了传统的信息访问方式,使人工智能代理能够在动态对话过程中代表用户搜索和总结信息。尽管潜力巨大,但当前的人工智能搜索引擎在几个关键领域仍有相当大的改进空间。这些领域包括支持多模态信息、提供个性化回复、逻辑回答复杂问题的能力以及促进更灵活的交互。本文提出了一种名为 "代理协作网络(ACN)"的新型 AIS 搜索引擎框架。ACN 框架由多个协同工作的专业代理组成,每个代理都有不同的角色,如客户经理、解决方案策略师、信息管理员和内容创建者。该框架集成了图片内容理解、用户资料跟踪和在线演进机制,从而提高了人工智能搜索引擎的响应质量、个性化和互动性。ACN的一大亮点是引入了反思前向优化方法(Reflective Forward Optimizationmethod,RFO),该方法支持代理之间的在线协同调整。这种学习方法也可以作为基于代理的系统的优化方法,对其他领域的代理应用产生潜在影响。
{"title":"A Learnable Agent Collaboration Network Framework for Personalized Multimodal AI Search Engine","authors":"Yunxiao Shi, Min Xu, Haimin Zhang, Xing Zi, Qiang Wu","doi":"arxiv-2409.00636","DOIUrl":"https://doi.org/arxiv-2409.00636","url":null,"abstract":"Large language models (LLMs) and retrieval-augmented generation (RAG)\u0000techniques have revolutionized traditional information access, enabling AI\u0000agent to search and summarize information on behalf of users during dynamic\u0000dialogues. Despite their potential, current AI search engines exhibit\u0000considerable room for improvement in several critical areas. These areas\u0000include the support for multimodal information, the delivery of personalized\u0000responses, the capability to logically answer complex questions, and the\u0000facilitation of more flexible interactions. This paper proposes a novel AI\u0000Search Engine framework called the Agent Collaboration Network (ACN). The ACN\u0000framework consists of multiple specialized agents working collaboratively, each\u0000with distinct roles such as Account Manager, Solution Strategist, Information\u0000Manager, and Content Creator. This framework integrates mechanisms for picture\u0000content understanding, user profile tracking, and online evolution, enhancing\u0000the AI search engine's response quality, personalization, and interactivity. A\u0000highlight of the ACN is the introduction of a Reflective Forward Optimization\u0000method (RFO), which supports the online synergistic adjustment among agents.\u0000This feature endows the ACN with online learning capabilities, ensuring that\u0000the system has strong interactive flexibility and can promptly adapt to user\u0000feedback. This learning method may also serve as an optimization approach for\u0000agent-based systems, potentially influencing other domains of agent\u0000applications.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Hybrid Agent-Based Models and Fuzzy Cognitive Maps: How to Combine Agents who Think Alike? 加速基于代理的混合模型和模糊认知地图:如何将思维相似的代理结合起来?
Pub Date : 2024-09-01 DOI: arxiv-2409.00824
Philippe J. Giabbanelli, Jack T. Beerman
While Agent-Based Models can create detailed artificial societies based onindividual differences and local context, they can be computationallyintensive. Modelers may offset these costs through a parsimonious use of themodel, for example by using smaller population sizes (which limits analyses insub-populations), running fewer what-if scenarios, or accepting moreuncertainty by performing fewer simulations. Alternatively, researchers mayaccelerate simulations via hardware solutions (e.g., GPU parallelism) orapproximation approaches that operate a tradeoff between accuracy and computetime. In this paper, we present an approximation that combines agents who`think alike', thus reducing the population size and the compute time. Ourinnovation relies on representing agent behaviors as networks of rules (FuzzyCognitive Maps) and empirically evaluating different measures of distancebetween these networks. Then, we form groups of think-alike agents viacommunity detection and simplify them to a representative agent. Case studiesshow that our simplifications remain accuracy.
虽然基于代理的模型可以根据个体差异和当地环境创建详细的人工社会,但它们可能是计算密集型的。建模者可以通过对模型的简化使用来抵消这些成本,例如使用较小的种群规模(这限制了对子种群的分析),运行较少的假设情景,或通过执行较少的模拟来接受更多的不确定性。另外,研究人员也可以通过硬件解决方案(如 GPU 并行性)或近似方法来加速模拟,在精度和计算时间之间进行权衡。在本文中,我们提出了一种近似方法,将 "思维相似 "的代理结合在一起,从而减少了群体规模和计算时间。我们的创新依赖于将代理行为表示为规则网络(模糊认知图),并通过经验评估这些网络之间的不同距离度量。然后,我们通过社群检测组建思维相似的代理群体,并将其简化为一个代表性代理。案例研究表明,我们的简化仍然是准确的。
{"title":"Accelerating Hybrid Agent-Based Models and Fuzzy Cognitive Maps: How to Combine Agents who Think Alike?","authors":"Philippe J. Giabbanelli, Jack T. Beerman","doi":"arxiv-2409.00824","DOIUrl":"https://doi.org/arxiv-2409.00824","url":null,"abstract":"While Agent-Based Models can create detailed artificial societies based on\u0000individual differences and local context, they can be computationally\u0000intensive. Modelers may offset these costs through a parsimonious use of the\u0000model, for example by using smaller population sizes (which limits analyses in\u0000sub-populations), running fewer what-if scenarios, or accepting more\u0000uncertainty by performing fewer simulations. Alternatively, researchers may\u0000accelerate simulations via hardware solutions (e.g., GPU parallelism) or\u0000approximation approaches that operate a tradeoff between accuracy and compute\u0000time. In this paper, we present an approximation that combines agents who\u0000`think alike', thus reducing the population size and the compute time. Our\u0000innovation relies on representing agent behaviors as networks of rules (Fuzzy\u0000Cognitive Maps) and empirically evaluating different measures of distance\u0000between these networks. Then, we form groups of think-alike agents via\u0000community detection and simplify them to a representative agent. Case studies\u0000show that our simplifications remain accuracy.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis 识别和聚类 PvP 游戏中团队组合的对抗关系,实现高效的平衡分析
Pub Date : 2024-08-30 DOI: arxiv-2408.17180
Chiu-Chou Lin, Yu-Wei Shih, Kuei-Ting Kuo, Yu-Cheng Chen, Chien-Hua Chen, Wei-Chen Chiu, I-Chen Wu
How can balance be quantified in game settings? This question is crucial forgame designers, especially in player-versus-player (PvP) games, where analyzingthe strength relations among predefined team compositions-such as herocombinations in multiplayer online battle arena (MOBA) games or decks in cardgames-is essential for enhancing gameplay and achieving balance. We havedeveloped two advanced measures that extend beyond the simplistic win rate toquantify balance in zero-sum competitive scenarios. These measures are derivedfrom win value estimations, which employ strength rating approximations via theBradley-Terry model and counter relationship approximations via vectorquantization, significantly reducing the computational complexity associatedwith traditional win value estimations. Throughout the learning process ofthese models, we identify useful categories of compositions and pinpoint theircounter relationships, aligning with the experiences of human players withoutrequiring specific game knowledge. Our methodology hinges on a simple techniqueto enhance codebook utilization in discrete representation with a deterministicvector quantization process for an extremely small state space. Our frameworkhas been validated in popular online games, including Age of Empires II,Hearthstone, Brawl Stars, and League of Legends. The accuracy of the observedstrength relations in these games is comparable to traditional pairwise winvalue predictions, while also offering a more manageable complexity foranalysis. Ultimately, our findings contribute to a deeper understanding of PvPgame dynamics and present a methodology that significantly improves gamebalance evaluation and design.
如何在游戏设置中量化平衡?这个问题对于游戏设计者来说至关重要,尤其是在玩家对玩家(PvP)游戏中,分析预定团队组合(如多人在线竞技场(MOBA)游戏中的英雄组合或卡牌游戏中的卡组)之间的实力关系对于增强游戏性和实现平衡至关重要。我们开发了两种先进的测量方法,它们超越了简单的胜率,可以量化零和竞技场景中的平衡性。这些测量方法源于胜值估算,通过布拉德利-特里模型(Bradley-Terry model)使用强度等级近似值,通过向量量化使用对抗关系近似值,大大降低了与传统胜值估算相关的计算复杂度。在这些模型的整个学习过程中,我们会识别出有用的组合类别,并精确定位它们之间的对抗关系,从而与人类玩家的经验保持一致,而无需特定的游戏知识。我们的方法依赖于一种简单的技术,即在离散表示法中使用确定性向量量化过程来提高代码集的利用率,从而获得极小的状态空间。我们的框架已经在《帝国时代 II》、《炉石传说》、《乱斗星际》和《英雄联盟》等热门网络游戏中得到了验证。在这些游戏中观察到的强度关系的准确性与传统的成对胜值预测相当,同时还提供了更易于管理的分析复杂性。最终,我们的研究结果有助于加深对 PvP 游戏动态的理解,并提出了一种能显著改善游戏平衡评估和设计的方法。
{"title":"Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis","authors":"Chiu-Chou Lin, Yu-Wei Shih, Kuei-Ting Kuo, Yu-Cheng Chen, Chien-Hua Chen, Wei-Chen Chiu, I-Chen Wu","doi":"arxiv-2408.17180","DOIUrl":"https://doi.org/arxiv-2408.17180","url":null,"abstract":"How can balance be quantified in game settings? This question is crucial for\u0000game designers, especially in player-versus-player (PvP) games, where analyzing\u0000the strength relations among predefined team compositions-such as hero\u0000combinations in multiplayer online battle arena (MOBA) games or decks in card\u0000games-is essential for enhancing gameplay and achieving balance. We have\u0000developed two advanced measures that extend beyond the simplistic win rate to\u0000quantify balance in zero-sum competitive scenarios. These measures are derived\u0000from win value estimations, which employ strength rating approximations via the\u0000Bradley-Terry model and counter relationship approximations via vector\u0000quantization, significantly reducing the computational complexity associated\u0000with traditional win value estimations. Throughout the learning process of\u0000these models, we identify useful categories of compositions and pinpoint their\u0000counter relationships, aligning with the experiences of human players without\u0000requiring specific game knowledge. Our methodology hinges on a simple technique\u0000to enhance codebook utilization in discrete representation with a deterministic\u0000vector quantization process for an extremely small state space. Our framework\u0000has been validated in popular online games, including Age of Empires II,\u0000Hearthstone, Brawl Stars, and League of Legends. The accuracy of the observed\u0000strength relations in these games is comparable to traditional pairwise win\u0000value predictions, while also offering a more manageable complexity for\u0000analysis. Ultimately, our findings contribute to a deeper understanding of PvP\u0000game dynamics and present a methodology that significantly improves game\u0000balance evaluation and design.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale MAPF-GPT:多代理规模寻路的模仿学习
Pub Date : 2024-08-29 DOI: arxiv-2409.00134
Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik
Multi-agent pathfinding (MAPF) is a challenging computational problem thattypically requires to find collision-free paths for multiple agents in a sharedenvironment. Solving MAPF optimally is NP-hard, yet efficient solutions arecritical for numerous applications, including automated warehouses andtransportation systems. Recently, learning-based approaches to MAPF have gainedattention, particularly those leveraging deep reinforcement learning. Followingcurrent trends in machine learning, we have created a foundation model for theMAPF problems called MAPF-GPT. Using imitation learning, we have trained apolicy on a set of pre-collected sub-optimal expert trajectories that cangenerate actions in conditions of partial observability without additionalheuristics, reward functions, or communication with other agents. The resultingMAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPFproblem instances that were not present in the training dataset. We show thatMAPF-GPT notably outperforms the current best-performing learnable-MAPF solverson a diverse range of problem instances and is efficient in terms ofcomputation (in the inference mode).
多代理寻路(MAPF)是一个具有挑战性的计算问题,通常需要为共享环境中的多个代理寻找无碰撞路径。以最佳方式求解 MAPF 是 NP 难题,但高效的解决方案对自动化仓库和运输系统等众多应用至关重要。最近,基于学习的 MAPF 方法备受关注,尤其是那些利用深度强化学习的方法。顺应当前机器学习的发展趋势,我们为 MAPF 问题创建了一个名为 MAPF-GPT 的基础模型。利用模仿学习,我们在一组预先收集的次优专家轨迹上训练了政策,这些轨迹可以在部分可观测条件下生成行动,而无需额外的启发式方法、奖励函数或与其他代理的通信。由此产生的 MAPF-GPT 模型在解决训练数据集中不存在的 MAPFproblem 实例时,表现出了 "零 "学习能力。我们的研究表明,MAPF-GPT 在各种问题实例中的表现明显优于目前表现最好的可学习 MAPF 求解器,而且在计算方面(推理模式下)也很高效。
{"title":"MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale","authors":"Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik","doi":"arxiv-2409.00134","DOIUrl":"https://doi.org/arxiv-2409.00134","url":null,"abstract":"Multi-agent pathfinding (MAPF) is a challenging computational problem that\u0000typically requires to find collision-free paths for multiple agents in a shared\u0000environment. Solving MAPF optimally is NP-hard, yet efficient solutions are\u0000critical for numerous applications, including automated warehouses and\u0000transportation systems. Recently, learning-based approaches to MAPF have gained\u0000attention, particularly those leveraging deep reinforcement learning. Following\u0000current trends in machine learning, we have created a foundation model for the\u0000MAPF problems called MAPF-GPT. Using imitation learning, we have trained a\u0000policy on a set of pre-collected sub-optimal expert trajectories that can\u0000generate actions in conditions of partial observability without additional\u0000heuristics, reward functions, or communication with other agents. The resulting\u0000MAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF\u0000problem instances that were not present in the training dataset. We show that\u0000MAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers\u0000on a diverse range of problem instances and is efficient in terms of\u0000computation (in the inference mode).","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative Graph Alignment 迭代图对齐
Pub Date : 2024-08-29 DOI: arxiv-2408.16667
Fangyuan Yu, Hardeep Singh Arora, Matt Johnson
By compressing diverse narratives, LLMs go beyond memorization, achievingintelligence by capturing generalizable causal relationships. However, theysuffer from local 'representation gaps' due to insufficient training datadiversity, limiting their real-world utility, especially in tasks requiringstrict alignment to rules. Traditional alignment methods relying on heavy humanannotations are inefficient and unscalable. Recent self-alignment techniquesalso fall short, as they often depend on self-selection based prompting andmemorization-based learning. To address these issues, we introduce IterativeGraph Alignment (IGA), an annotation-free rule-based alignment algorithm. Ateacher model (VLM) employs Iterative Graph Prompting (IGP) to create logicalgraphs and reference answers. The student model (LLM) identifies localknowledge gaps by attempting to align its responses with these references,collaborating with helper models to generate diverse answers. These alignedresponses are then used for iterative supervised fine-tuning (SFT). Ourevaluations across five rule-based scenarios demonstrate IGP's effectiveness,with a 73.12% alignment improvement in Claude Sonnet 3.5, andLlama3-8B-Instruct achieving an 86.20% improvement, outperforming ClaudeSonnet 3.5 in rule-based alignment.
通过压缩不同的叙述,LLM 超越了记忆,通过捕捉可概括的因果关系实现智能。然而,由于训练数据的多样性不足,它们受到局部 "表征差距 "的困扰,从而限制了它们在现实世界中的实用性,尤其是在需要与规则严格对齐的任务中。传统的配准方法依赖于大量的人工标注,效率低下且不可扩展。最新的自对齐技术也存在不足,因为它们通常依赖于基于提示的自选择和基于记忆的学习。为了解决这些问题,我们引入了迭代图配准(IGA)--一种无需注释、基于规则的配准算法。教师模型(VLM)采用迭代图提示(IGP)来创建逻辑图和参考答案。学生模型(LLM)通过尝试将自己的回答与这些参考答案对齐来识别本地知识差距,并与辅助模型协作生成不同的答案。这些对齐后的答案将用于迭代监督微调(SFT)。对五个基于规则的场景进行的评估证明了IGP的有效性,在Claude Sonnet 3.5中,对齐率提高了73.12%;在基于规则的对齐中,Llama3-8B-Instruct提高了86.20%,超过了Claude Sonnet 3.5。
{"title":"Iterative Graph Alignment","authors":"Fangyuan Yu, Hardeep Singh Arora, Matt Johnson","doi":"arxiv-2408.16667","DOIUrl":"https://doi.org/arxiv-2408.16667","url":null,"abstract":"By compressing diverse narratives, LLMs go beyond memorization, achieving\u0000intelligence by capturing generalizable causal relationships. However, they\u0000suffer from local 'representation gaps' due to insufficient training data\u0000diversity, limiting their real-world utility, especially in tasks requiring\u0000strict alignment to rules. Traditional alignment methods relying on heavy human\u0000annotations are inefficient and unscalable. Recent self-alignment techniques\u0000also fall short, as they often depend on self-selection based prompting and\u0000memorization-based learning. To address these issues, we introduce Iterative\u0000Graph Alignment (IGA), an annotation-free rule-based alignment algorithm. A\u0000teacher model (VLM) employs Iterative Graph Prompting (IGP) to create logical\u0000graphs and reference answers. The student model (LLM) identifies local\u0000knowledge gaps by attempting to align its responses with these references,\u0000collaborating with helper models to generate diverse answers. These aligned\u0000responses are then used for iterative supervised fine-tuning (SFT). Our\u0000evaluations across five rule-based scenarios demonstrate IGP's effectiveness,\u0000with a 73.12% alignment improvement in Claude Sonnet 3.5, and\u0000Llama3-8B-Instruct achieving an 86.20% improvement, outperforming Claude\u0000Sonnet 3.5 in rule-based alignment.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Different Facets for Different Experts: A Framework for Streamlining The Integration of Qualitative Insights into ABM Development 不同专家不同面:简化将定性洞察纳入 ABM 开发的框架
Pub Date : 2024-08-28 DOI: arxiv-2408.15725
Vivek Nallur, Pedram Aghaei, Graham Finlay
A key problem in agent-based simulation is that integrating qualitativeinsights from multiple discipline experts is extremely hard. In mostsimulations, agent capabilities and corresponding behaviour needs to beprogrammed into the agent. We report on the architecture of a tool thatdisconnects the programmed functions of the agent, from the acquisition ofcapability and displayed behaviour. This allows multiple different domainexperts to represent qualitative insights, without the need for code to bechanged. It also allows a continuous integration (or even change) ofqualitative behaviour processes, as more insights are gained. The consequentbehaviour observed in the model is both, more faithful to the expert's insightas well as able to be contrasted against other models representing otherinsights.
基于代理的仿真中的一个关键问题是,整合来自多个学科专家的定性意见极其困难。在大多数模拟中,需要将代理能力和相应行为编程到代理中。我们报告了一种工具的结构,它能将代理的编程功能与获取能力和显示行为分离开来。这样,多个不同领域的专家就可以表达定性的见解,而无需更改代码。它还允许在获得更多见解时,持续整合(甚至改变)定性行为过程。因此,在模型中观察到的行为既更忠实于专家的见解,又能与代表其他见解的其他模型进行对比。
{"title":"Different Facets for Different Experts: A Framework for Streamlining The Integration of Qualitative Insights into ABM Development","authors":"Vivek Nallur, Pedram Aghaei, Graham Finlay","doi":"arxiv-2408.15725","DOIUrl":"https://doi.org/arxiv-2408.15725","url":null,"abstract":"A key problem in agent-based simulation is that integrating qualitative\u0000insights from multiple discipline experts is extremely hard. In most\u0000simulations, agent capabilities and corresponding behaviour needs to be\u0000programmed into the agent. We report on the architecture of a tool that\u0000disconnects the programmed functions of the agent, from the acquisition of\u0000capability and displayed behaviour. This allows multiple different domain\u0000experts to represent qualitative insights, without the need for code to be\u0000changed. It also allows a continuous integration (or even change) of\u0000qualitative behaviour processes, as more insights are gained. The consequent\u0000behaviour observed in the model is both, more faithful to the expert's insight\u0000as well as able to be contrasted against other models representing other\u0000insights.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Multiagent Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1