Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version)

arXiv - CS - Multiagent Systems Pub Date : 2024-08-19 DOI:arxiv-2408.10074

Muhammad Najib, Giuseppe Perelli

{"title":"Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version)","authors":"Muhammad Najib, Giuseppe Perelli","doi":"arxiv-2408.10074","DOIUrl":null,"url":null,"abstract":"Mechanism design is a well-established game-theoretic paradigm for designing\ngames to achieve desired outcomes. This paper addresses a closely related but\ndistinct concept, equilibrium design. Unlike mechanism design, the designer's\nauthority in equilibrium design is more constrained; she can only modify the\nincentive structures in a given game to achieve certain outcomes without the\nability to create the game from scratch. We study the problem of equilibrium\ndesign using dynamic incentive structures, known as reward machines. We use\nweighted concurrent game structures for the game model, with goals (for the\nplayers and the designer) defined as mean-payoff objectives. We show how reward\nmachines can be used to represent dynamic incentives that allocate rewards in a\nmanner that optimises the designer's goal. We also introduce the main decision\nproblem within our framework, the payoff improvement problem. This problem\nessentially asks whether there exists a dynamic incentive (represented by some\nreward machine) that can improve the designer's payoff by more than a given\nthreshold value. We present two variants of the problem: strong and weak. We\ndemonstrate that both can be solved in polynomial time using a Turing machine\nequipped with an NP oracle. Furthermore, we also establish that these variants\nare either NP-hard or coNP-hard. Finally, we show how to synthesise the\ncorresponding reward machine if it exists.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Mechanism design is a well-established game-theoretic paradigm for designing games to achieve desired outcomes. This paper addresses a closely related but distinct concept, equilibrium design. Unlike mechanism design, the designer's authority in equilibrium design is more constrained; she can only modify the incentive structures in a given game to achieve certain outcomes without the ability to create the game from scratch. We study the problem of equilibrium design using dynamic incentive structures, known as reward machines. We use weighted concurrent game structures for the game model, with goals (for the players and the designer) defined as mean-payoff objectives. We show how reward machines can be used to represent dynamic incentives that allocate rewards in a manner that optimises the designer's goal. We also introduce the main decision problem within our framework, the payoff improvement problem. This problem essentially asks whether there exists a dynamic incentive (represented by some reward machine) that can improve the designer's payoff by more than a given threshold value. We present two variants of the problem: strong and weak. We demonstrate that both can be solved in polynomial time using a Turing machine equipped with an NP oracle. Furthermore, we also establish that these variants are either NP-hard or coNP-hard. Finally, we show how to synthesise the corresponding reward machine if it exists.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多代理平衡设计奖励机制的合成（完整版）

机制设计是一种成熟的博弈论范式，用于设计游戏以实现预期结果。本文讨论的是一个密切相关但又不同的概念--均衡设计。与机制设计不同，平衡设计中设计者的权力受到更多限制；她只能修改给定博弈中的激励结构来实现特定结果，而无法从头开始创建博弈。我们使用动态激励结构（即奖励机器）来研究均衡设计问题。我们使用加权并发博弈结构作为博弈模型，目标（对于玩家和设计者）定义为平均报酬目标。我们展示了如何使用奖励机来表示动态激励机制，以优化设计者目标的方式分配奖励。我们还介绍了我们框架中的主要决策问题--报酬改进问题。这个问题本质上是问，是否存在一种动态激励机制（由某个奖励机表示）能使设计者的报酬提高超过给定的阈值。我们提出了这个问题的两个变体：强激励和弱激励。我们证明，使用配备 NP 甲骨文的图灵机，可以在多项式时间内解决这两个问题。此外，我们还确定这些变体要么是 NP 难，要么是 coNP 难。最后，我们展示了如何合成相应的奖励机器（如果存在的话）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量