Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version)

Muhammad Najib, Giuseppe Perelli
{"title":"Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version)","authors":"Muhammad Najib, Giuseppe Perelli","doi":"arxiv-2408.10074","DOIUrl":null,"url":null,"abstract":"Mechanism design is a well-established game-theoretic paradigm for designing\ngames to achieve desired outcomes. This paper addresses a closely related but\ndistinct concept, equilibrium design. Unlike mechanism design, the designer's\nauthority in equilibrium design is more constrained; she can only modify the\nincentive structures in a given game to achieve certain outcomes without the\nability to create the game from scratch. We study the problem of equilibrium\ndesign using dynamic incentive structures, known as reward machines. We use\nweighted concurrent game structures for the game model, with goals (for the\nplayers and the designer) defined as mean-payoff objectives. We show how reward\nmachines can be used to represent dynamic incentives that allocate rewards in a\nmanner that optimises the designer's goal. We also introduce the main decision\nproblem within our framework, the payoff improvement problem. This problem\nessentially asks whether there exists a dynamic incentive (represented by some\nreward machine) that can improve the designer's payoff by more than a given\nthreshold value. We present two variants of the problem: strong and weak. We\ndemonstrate that both can be solved in polynomial time using a Turing machine\nequipped with an NP oracle. Furthermore, we also establish that these variants\nare either NP-hard or coNP-hard. Finally, we show how to synthesise the\ncorresponding reward machine if it exists.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Mechanism design is a well-established game-theoretic paradigm for designing games to achieve desired outcomes. This paper addresses a closely related but distinct concept, equilibrium design. Unlike mechanism design, the designer's authority in equilibrium design is more constrained; she can only modify the incentive structures in a given game to achieve certain outcomes without the ability to create the game from scratch. We study the problem of equilibrium design using dynamic incentive structures, known as reward machines. We use weighted concurrent game structures for the game model, with goals (for the players and the designer) defined as mean-payoff objectives. We show how reward machines can be used to represent dynamic incentives that allocate rewards in a manner that optimises the designer's goal. We also introduce the main decision problem within our framework, the payoff improvement problem. This problem essentially asks whether there exists a dynamic incentive (represented by some reward machine) that can improve the designer's payoff by more than a given threshold value. We present two variants of the problem: strong and weak. We demonstrate that both can be solved in polynomial time using a Turing machine equipped with an NP oracle. Furthermore, we also establish that these variants are either NP-hard or coNP-hard. Finally, we show how to synthesise the corresponding reward machine if it exists.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多代理平衡设计奖励机制的合成(完整版)
机制设计是一种成熟的博弈论范式,用于设计游戏以实现预期结果。本文讨论的是一个密切相关但又不同的概念--均衡设计。与机制设计不同,平衡设计中设计者的权力受到更多限制;她只能修改给定博弈中的激励结构来实现特定结果,而无法从头开始创建博弈。我们使用动态激励结构(即奖励机器)来研究均衡设计问题。我们使用加权并发博弈结构作为博弈模型,目标(对于玩家和设计者)定义为平均报酬目标。我们展示了如何使用奖励机来表示动态激励机制,以优化设计者目标的方式分配奖励。我们还介绍了我们框架中的主要决策问题--报酬改进问题。这个问题本质上是问,是否存在一种动态激励机制(由某个奖励机表示)能使设计者的报酬提高超过给定的阈值。我们提出了这个问题的两个变体:强激励和弱激励。我们证明,使用配备 NP 甲骨文的图灵机,可以在多项式时间内解决这两个问题。此外,我们还确定这些变体要么是 NP 难,要么是 coNP 难。最后,我们展示了如何合成相应的奖励机器(如果存在的话)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Multi-agent Path Finding in Continuous Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1