Adaptive Agents and Multi-Agent Systems最新文献_第5页

A Novel Demand Response Model and Method for Peak Reduction in Smart Grids - PowerTAC 一种新的智能电网需求响应模型与降峰方法——PowerTAC

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-24 DOI: 10.48550/arXiv.2302.12520

Sanjay Chandlekar, Arthik Boroju, Shweta Jain, Sujit Gujar

One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers' (agents') usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS--ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a Multi-Armed Bandit (MAB) based online algorithm, namely MJSUCB--ExpResponse, to learn RRs. Experimentally we show that it exhibits sublinear regret. Finally, we showcase the efficacy of the proposed algorithm in mitigating demand peaks in a real-world smart grid system using the PowerTAC simulator as a test bed.

在智能电网中广泛使用的减峰方法之一是需求响应，即根据配电公司的信号分析客户(代理)使用模式的变化。通常，这些信号以提供给代理人的激励形式出现。这项工作在现实世界的智能电网模拟器PowerTAC中研究了激励对接受此类报价概率的影响。我们首先证明存在一个函数，它将代理减少其负载的概率描述为提供给他们的折扣的函数。我们称之为还原概率(RP)。RP函数通过还原速率(RR)进一步参数化，每个试剂的还原速率可能不同。我们提供了一个最优算法，MJS- express，它在预算约束下通过最大化预期减少来输出每个代理的折扣。当rrr未知时，我们提出了一种基于多臂班迪(MAB)的在线算法，即mjsuch - express，来学习rrr。实验表明，它表现出亚线性后悔。最后，我们使用PowerTAC模拟器作为测试平台，在现实世界的智能电网系统中展示了所提出算法在缓解需求峰值方面的有效性。

{"title":"A Novel Demand Response Model and Method for Peak Reduction in Smart Grids - PowerTAC","authors":"Sanjay Chandlekar, Arthik Boroju, Shweta Jain, Sujit Gujar","doi":"10.48550/arXiv.2302.12520","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12520","url":null,"abstract":"One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers' (agents') usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS--ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a Multi-Armed Bandit (MAB) based online algorithm, namely MJSUCB--ExpResponse, to learn RRs. Experimentally we show that it exhibits sublinear regret. Finally, we showcase the efficacy of the proposed algorithm in mitigating demand peaks in a real-world smart grid system using the PowerTAC simulator as a test bed.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114356210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Computationally Efficient Responsibility Attribution in Decentralized Partially Observable MDPs 分散部分可观察mdp中计算效率的责任归属

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-24 DOI: 10.48550/arXiv.2302.12676

Stelios Triantafyllou, Goran Radanovic

Responsibility attribution is a key concept of accountable multi-agent decision making. Given a sequence of actions, responsibility attribution mechanisms quantify the impact of each participating agent to the final outcome. One such popular mechanism is based on actual causality, and it assigns (causal) responsibility based on the actions that were found to be pivotal for the considered outcome. However, the inherent problem of pinpointing actual causes and consequently determining the exact responsibility assignment has shown to be computationally intractable. In this paper, we aim to provide a practical algorithmic solution to the problem of responsibility attribution under a computational budget. We first formalize the problem in the framework of Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) augmented by a specific class of Structural Causal Models (SCMs). Under this framework, we introduce a Monte Carlo Tree Search (MCTS) type of method which efficiently approximates the agents' degrees of responsibility. This method utilizes the structure of a novel search tree and a pruning technique, both tailored to the problem of responsibility attribution. Other novel components of our method are (a) a child selection policy based on linear scalarization and (b) a backpropagation procedure that accounts for a minimality condition that is typically used to define actual causality. We experimentally evaluate the efficacy of our algorithm through a simulation-based test-bed, which includes three team-based card games.

责任归因是多主体问责决策的一个重要概念。给定一系列行动，责任归因机制量化每个参与主体对最终结果的影响。其中一种流行的机制是基于实际的因果关系，它根据被发现对所考虑的结果至关重要的行为来分配(因果)责任。然而，确定实际原因并因此确定确切责任分配的固有问题在计算上是难以解决的。在本文中，我们的目标是提供一个实用的算法解决在计算预算下的责任归属问题。我们首先在分散部分可观察马尔可夫决策过程(deco - pomdp)的框架中形式化了这个问题，该决策过程由一类特定的结构因果模型(scm)扩充。在此框架下，我们引入了一种蒙特卡罗树搜索(MCTS)类型的方法，该方法有效地逼近了智能体的责任程度。该方法利用了一种新的搜索树结构和修剪技术，这两种技术都是针对责任归属问题量身定制的。我们方法的其他新颖组成部分是(a)基于线性标量化的子选择策略和(b)考虑最小条件的反向传播过程，该最小条件通常用于定义实际因果关系。我们通过一个基于模拟的测试平台实验评估了我们算法的有效性，其中包括三个基于团队的纸牌游戏。

{"title":"Towards Computationally Efficient Responsibility Attribution in Decentralized Partially Observable MDPs","authors":"Stelios Triantafyllou, Goran Radanovic","doi":"10.48550/arXiv.2302.12676","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12676","url":null,"abstract":"Responsibility attribution is a key concept of accountable multi-agent decision making. Given a sequence of actions, responsibility attribution mechanisms quantify the impact of each participating agent to the final outcome. One such popular mechanism is based on actual causality, and it assigns (causal) responsibility based on the actions that were found to be pivotal for the considered outcome. However, the inherent problem of pinpointing actual causes and consequently determining the exact responsibility assignment has shown to be computationally intractable. In this paper, we aim to provide a practical algorithmic solution to the problem of responsibility attribution under a computational budget. We first formalize the problem in the framework of Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) augmented by a specific class of Structural Causal Models (SCMs). Under this framework, we introduce a Monte Carlo Tree Search (MCTS) type of method which efficiently approximates the agents' degrees of responsibility. This method utilizes the structure of a novel search tree and a pruning technique, both tailored to the problem of responsibility attribution. Other novel components of our method are (a) a child selection policy based on linear scalarization and (b) a backpropagation procedure that accounts for a minimality condition that is typically used to define actual causality. We experimentally evaluate the efficacy of our algorithm through a simulation-based test-bed, which includes three team-based card games.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114457717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations 反事实-强化学习:通过视觉反事实解释理解强化学习代理的策略

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-24 DOI: 10.48550/arXiv.2302.12689

Tobias Huber, Maximilian Demmler, Silvan Mertes, Matthew Lyle Olson, Elisabeth Andr'e

Counterfactual explanations are a common tool to explain artificial intelligence models. For Reinforcement Learning (RL) agents, they answer"Why not?"or"What if?"questions by illustrating what minimal change to a state is needed such that an agent chooses a different action. Generating counterfactual explanations for RL agents with visual input is especially challenging because of their large state spaces and because their decisions are part of an overarching policy, which includes long-term decision-making. However, research focusing on counterfactual explanations, specifically for RL agents with visual input, is scarce and does not go beyond identifying defective agents. It is unclear whether counterfactual explanations are still helpful for more complex tasks like analyzing the learned strategies of different agents or choosing a fitting agent for a specific task. We propose a novel but simple method to generate counterfactual explanations for RL agents by formulating the problem as a domain transfer problem which allows the use of adversarial learning techniques like StarGAN. Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics. Furthermore, we show in a user study that our method performs best when analyzing which strategies different agents pursue.

反事实解释是解释人工智能模型的常用工具。对于强化学习(RL)智能体，它们会回答“为什么不?”或“如果?”的问题，说明需要对状态进行多大的最小改变，才能让智能体选择不同的行为。为具有视觉输入的强化学习代理生成反事实解释尤其具有挑战性，因为它们的状态空间很大，而且它们的决策是总体政策的一部分，其中包括长期决策。然而，专注于反事实解释的研究，特别是针对具有视觉输入的RL代理的研究，很少，而且没有超越识别有缺陷的代理。目前尚不清楚反事实解释是否仍然有助于更复杂的任务，如分析不同代理的学习策略或为特定任务选择合适的代理。我们提出了一种新颖而简单的方法，通过将问题表述为允许使用StarGAN等对抗性学习技术的领域转移问题，为强化学习代理生成反事实解释。我们的方法是完全模型不可知的，并且我们证明它在几个计算指标上优于以前唯一的方法。此外，我们在用户研究中表明，我们的方法在分析不同代理所采取的策略时表现最好。

{"title":"GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations","authors":"Tobias Huber, Maximilian Demmler, Silvan Mertes, Matthew Lyle Olson, Elisabeth Andr'e","doi":"10.48550/arXiv.2302.12689","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12689","url":null,"abstract":"Counterfactual explanations are a common tool to explain artificial intelligence models. For Reinforcement Learning (RL) agents, they answer\"Why not?\"or\"What if?\"questions by illustrating what minimal change to a state is needed such that an agent chooses a different action. Generating counterfactual explanations for RL agents with visual input is especially challenging because of their large state spaces and because their decisions are part of an overarching policy, which includes long-term decision-making. However, research focusing on counterfactual explanations, specifically for RL agents with visual input, is scarce and does not go beyond identifying defective agents. It is unclear whether counterfactual explanations are still helpful for more complex tasks like analyzing the learned strategies of different agents or choosing a fitting agent for a specific task. We propose a novel but simple method to generate counterfactual explanations for RL agents by formulating the problem as a domain transfer problem which allows the use of adversarial learning techniques like StarGAN. Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics. Furthermore, we show in a user study that our method performs best when analyzing which strategies different agents pursue.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126350122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Strategyproof Social Decision Schemes on Super Condorcet Domains 超级孔多塞域的无策略社会决策方案

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-23 DOI: 10.48550/arXiv.2302.12140

F. Brandt, Patrick Lederer, Sascha Tausch

One of the central economic paradigms in multi-agent systems is that agents should not be better off by acting dishonestly. In the context of collective decision-making, this axiom is known as strategyproofness and turns out to be rather prohibitive, even when allowing for randomization. In particular, Gibbard's random dictatorship theorem shows that only rather unattractive social decision schemes (SDSs) satisfy strategyproofness on the full domain of preferences. In this paper, we obtain more positive results by investigating strategyproof SDSs on the Condorcet domain, which consists of all preference profiles that admit a Condorcet winner. In more detail, we show that, if the number of voters $n$ is odd, every strategyproof and non-imposing SDS on the Condorcet domain can be represented as a mixture of dictatorial SDSs and the Condorcet rule (which chooses the Condorcet winner with probability $1$). Moreover, we prove that the Condorcet domain is a maximal connected domain that allows for attractive strategyproof SDSs if $n$ is odd as only random dictatorships are strategyproof and non-imposing on any sufficiently connected superset of it. We also derive analogous results for even $n$ by slightly extending the Condorcet domain. Finally, we also characterize the set of group-strategyproof and non-imposing SDSs on the Condorcet domain and its supersets. These characterizations strengthen Gibbard's random dictatorship theorem and establish that the Condorcet domain is essentially a maximal domain that allows for attractive strategyproof SDSs.

多主体系统的核心经济范式之一是，主体不应该通过不诚实的行为而变得更好。在集体决策的背景下，这个公理被称为“策略证明”，即使在允许随机化的情况下，它也被证明是相当令人望而却步的。特别地，吉巴德的随机专政定理表明，只有相当不吸引人的社会决策方案(SDSs)在全偏好域上满足策略证明性。在本文中，我们通过研究孔多塞域上的无策略sds得到了更积极的结果，孔多塞域由所有承认孔多塞赢家的偏好配置文件组成。更详细地说，我们表明，如果选民的数量$n$是奇数，那么孔多塞域上的每个策略证明和非强加的SDS都可以表示为独裁SDS和孔多塞规则的混合(孔多塞规则以概率$1$选择孔多塞赢家)。此外，我们证明了Condorcet域是一个最大连通域，如果$n$是奇数，则允许有吸引力的防策略sds，因为只有随机专政是防策略的，并且不强加于它的任何充分连接的超集。通过稍微扩展Condorcet定义域，我们也得到了偶n的类似结果。最后，我们还对Condorcet域及其超集上的群策略证明和非强加sds集进行了刻画。这些特征强化了吉伯特的随机专政定理，并建立了孔多塞域本质上是一个允许有吸引力的无策略sds的极大域。

{"title":"Strategyproof Social Decision Schemes on Super Condorcet Domains","authors":"F. Brandt, Patrick Lederer, Sascha Tausch","doi":"10.48550/arXiv.2302.12140","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12140","url":null,"abstract":"One of the central economic paradigms in multi-agent systems is that agents should not be better off by acting dishonestly. In the context of collective decision-making, this axiom is known as strategyproofness and turns out to be rather prohibitive, even when allowing for randomization. In particular, Gibbard's random dictatorship theorem shows that only rather unattractive social decision schemes (SDSs) satisfy strategyproofness on the full domain of preferences. In this paper, we obtain more positive results by investigating strategyproof SDSs on the Condorcet domain, which consists of all preference profiles that admit a Condorcet winner. In more detail, we show that, if the number of voters $n$ is odd, every strategyproof and non-imposing SDS on the Condorcet domain can be represented as a mixture of dictatorial SDSs and the Condorcet rule (which chooses the Condorcet winner with probability $1$). Moreover, we prove that the Condorcet domain is a maximal connected domain that allows for attractive strategyproof SDSs if $n$ is odd as only random dictatorships are strategyproof and non-imposing on any sufficiently connected superset of it. We also derive analogous results for even $n$ by slightly extending the Condorcet domain. Finally, we also characterize the set of group-strategyproof and non-imposing SDSs on the Condorcet domain and its supersets. These characterizations strengthen Gibbard's random dictatorship theorem and establish that the Condorcet domain is essentially a maximal domain that allows for attractive strategyproof SDSs.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129602058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Targeted Search Control in AlphaZero for Effective Policy Improvement AlphaZero的目标搜索控制，以实现有效的策略改进

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-23 DOI: 10.48550/arXiv.2302.12359

Alexandre Trudeau, Michael H. Bowling

AlphaZero is a self-play reinforcement learning algorithm that achieves superhuman play in chess, shogi, and Go via policy iteration. To be an effective policy improvement operator, AlphaZero's search requires accurate value estimates for the states appearing in its search tree. AlphaZero trains upon self-play matches beginning from the initial state of a game and only samples actions over the first few moves, limiting its exploration of states deeper in the game tree. We introduce Go-Exploit, a novel search control strategy for AlphaZero. Go-Exploit samples the start state of its self-play trajectories from an archive of states of interest. Beginning self-play trajectories from varied starting states enables Go-Exploit to more effectively explore the game tree and to learn a value function that generalizes better. Producing shorter self-play trajectories allows Go-Exploit to train upon more independent value targets, improving value training. Finally, the exploration inherent in Go-Exploit reduces its need for exploratory actions, enabling it to train under more exploitative policies. In the games of Connect Four and 9x9 Go, we show that Go-Exploit learns with a greater sample efficiency than standard AlphaZero, resulting in stronger performance against reference opponents and in head-to-head play. We also compare Go-Exploit to KataGo, a more sample efficient reimplementation of AlphaZero, and demonstrate that Go-Exploit has a more effective search control strategy. Furthermore, Go-Exploit's sample efficiency improves when KataGo's other innovations are incorporated.

AlphaZero是一种自我对弈强化学习算法，通过策略迭代在国际象棋、将棋和围棋中实现超人的对弈。为了成为一个有效的策略改进算子，AlphaZero的搜索需要对其搜索树中出现的状态进行准确的值估计。AlphaZero从游戏的初始状态开始进行自我对弈训练，并且只在最初的几个移动中采样操作，限制了它对游戏树中更深层次状态的探索。我们介绍了Go-Exploit，一种新的AlphaZero搜索控制策略。Go-Exploit从兴趣状态的存档中采样其自我游戏轨迹的开始状态。从不同的初始状态开始自我游戏轨迹可以让Go-Exploit更有效地探索游戏树，并学习更好的泛化价值函数。生成更短的自我游戏轨迹允许Go-Exploit在更独立的价值目标上进行训练，从而改进价值训练。最后，Go-Exploit固有的探索能力减少了它对探索行为的需求，使其能够在更具剥削性的政策下进行训练。在Connect 4和9x9围棋的游戏中，我们发现Go- exploit比标准AlphaZero具有更高的样本效率，从而在对抗参考对手和正面比赛中表现更强。我们还将Go-Exploit与KataGo进行了比较，KataGo是AlphaZero的更高效的重新实现，并证明Go-Exploit具有更有效的搜索控制策略。此外，当结合KataGo的其他创新时，Go-Exploit的取样效率也会提高。

{"title":"Targeted Search Control in AlphaZero for Effective Policy Improvement","authors":"Alexandre Trudeau, Michael H. Bowling","doi":"10.48550/arXiv.2302.12359","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12359","url":null,"abstract":"AlphaZero is a self-play reinforcement learning algorithm that achieves superhuman play in chess, shogi, and Go via policy iteration. To be an effective policy improvement operator, AlphaZero's search requires accurate value estimates for the states appearing in its search tree. AlphaZero trains upon self-play matches beginning from the initial state of a game and only samples actions over the first few moves, limiting its exploration of states deeper in the game tree. We introduce Go-Exploit, a novel search control strategy for AlphaZero. Go-Exploit samples the start state of its self-play trajectories from an archive of states of interest. Beginning self-play trajectories from varied starting states enables Go-Exploit to more effectively explore the game tree and to learn a value function that generalizes better. Producing shorter self-play trajectories allows Go-Exploit to train upon more independent value targets, improving value training. Finally, the exploration inherent in Go-Exploit reduces its need for exploratory actions, enabling it to train under more exploitative policies. In the games of Connect Four and 9x9 Go, we show that Go-Exploit learns with a greater sample efficiency than standard AlphaZero, resulting in stronger performance against reference opponents and in head-to-head play. We also compare Go-Exploit to KataGo, a more sample efficient reimplementation of AlphaZero, and demonstrate that Go-Exploit has a more effective search control strategy. Furthermore, Go-Exploit's sample efficiency improves when KataGo's other innovations are incorporated.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116793013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Characterizations of Sequential Valuation Rules 顺序估价规则的特征

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-23 DOI: 10.48550/arXiv.2302.11890

Chris Dong, Patrick Lederer

Approval-based committee (ABC) voting rules elect a fixed size subset of the candidates, a so-called committee, based on the voters' approval ballots over the candidates. While these rules have recently attracted significant attention, axiomatic characterizations are largely missing so far. We address this problem by characterizing ABC voting rules within the broad and intuitive class of sequential valuation rules. These rules compute the winning committees by sequentially adding candidates that increase the score of the chosen committee the most. In more detail, we first characterize almost the full class of sequential valuation rules based on mild standard conditions and a new axiom called consistent committee monotonicity. This axiom postulates that the winning committees of size k can be derived from those of size k-1 by only adding candidates and that these new candidates are chosen consistently. By requiring additional conditions, we derive from this result also a characterization of the prominent class of sequential Thiele rules. Finally, we refine our results to characterize three well-known ABC voting rules, namely sequential approval voting, sequential proportional approval voting, and sequential Chamberlin-Courant approval voting.

基于批准的委员会(approval -based committee, ABC)投票规则根据选民对候选人的批准投票选出一个固定规模的候选人子集，即所谓的委员会。虽然这些规则最近引起了极大的关注，但到目前为止，公理化的描述在很大程度上是缺失的。我们通过在广泛和直观的顺序估值规则类别中描述ABC投票规则来解决这个问题。这些规则通过依次添加使所选委员会得分增加最多的候选人来计算获胜委员会。更详细地说，我们首先基于温和的标准条件和一个称为一致委员会单调性的新公理，描述了几乎整个序列估值规则类。这个公理假设，规模为k的获胜委员会可以通过只增加候选人而从规模为k-1的委员会中获得，并且这些新候选人的选择是一致的。通过要求附加条件，我们也从这一结果中导出了突出的一类顺序Thiele规则的特征。最后，我们改进了我们的结果，以表征三个著名的ABC投票规则，即顺序批准投票、顺序比例批准投票和顺序Chamberlin-Courant批准投票。

{"title":"Characterizations of Sequential Valuation Rules","authors":"Chris Dong, Patrick Lederer","doi":"10.48550/arXiv.2302.11890","DOIUrl":"https://doi.org/10.48550/arXiv.2302.11890","url":null,"abstract":"Approval-based committee (ABC) voting rules elect a fixed size subset of the candidates, a so-called committee, based on the voters' approval ballots over the candidates. While these rules have recently attracted significant attention, axiomatic characterizations are largely missing so far. We address this problem by characterizing ABC voting rules within the broad and intuitive class of sequential valuation rules. These rules compute the winning committees by sequentially adding candidates that increase the score of the chosen committee the most. In more detail, we first characterize almost the full class of sequential valuation rules based on mild standard conditions and a new axiom called consistent committee monotonicity. This axiom postulates that the winning committees of size k can be derived from those of size k-1 by only adding candidates and that these new candidates are chosen consistently. By requiring additional conditions, we derive from this result also a characterization of the prominent class of sequential Thiele rules. Finally, we refine our results to characterize three well-known ABC voting rules, namely sequential approval voting, sequential proportional approval voting, and sequential Chamberlin-Courant approval voting.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116330615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Decentralized core-periphery structure in social networks accelerates cultural innovation in agent-based model 社会网络中分散的核心-外围结构加速了基于主体模型的文化创新

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-23 DOI: 10.48550/arXiv.2302.12121

Jesse Milzman, Cody Moser

Previous investigations into creative and innovation networks have suggested that innovations often occurs at the boundary between the network's core and periphery. In this work, we investigate the effect of global core-periphery network structure on the speed and quality of cultural innovation. Drawing on differing notions of core-periphery structure from [arXiv:1808.07801] and [doi:10.1016/S0378-8733(99)00019-2], we distinguish decentralized core-periphery, centralized core-periphery, and affinity network structure. We generate networks of these three classes from stochastic block models (SBMs), and use them to run an agent-based model (ABM) of collective cultural innovation, in which agents can only directly interact with their network neighbors. In order to discover the highest-scoring innovation, agents must discover and combine the highest innovations from two completely parallel technology trees. We find that decentralized core-periphery networks outperform the others by finding the final crossover innovation more quickly on average. We hypothesize that decentralized core-periphery network structure accelerates collective problem-solving by shielding peripheral nodes from the local optima known by the core community at any given time. We then build upon the"Two Truths"hypothesis regarding community structure in spectral graph embeddings, first articulated in [arXiv:1808.07801], which suggests that the adjacency spectral embedding (ASE) captures core-periphery structure, while the Laplacian spectral embedding (LSE) captures affinity. We find that, for core-periphery networks, ASE-based resampling best recreates networks with similar performance on the innovation SBM, compared to LSE-based resampling. Since the Two Truths hypothesis suggests that ASE captures core-periphery structure, this result further supports our hypothesis.

以往对创意和创新网络的研究表明，创新往往发生在网络的核心和边缘之间的边界。本文研究了全球核心-边缘网络结构对文化创新速度和质量的影响。根据[arXiv:1808.07801]和[doi:10.1016/S0378-8733(99)00019-2]中核心-外围结构的不同概念，我们区分了分散式核心-外围、集中式核心-外围和亲和网络结构。我们从随机块模型(sbm)中生成了这三种类型的网络，并使用它们来运行基于智能体的集体文化创新模型(ABM)，其中智能体只能直接与它们的网络邻居进行交互。为了发现得分最高的创新，智能体必须从两个完全平行的技术树中发现并组合最高的创新。我们发现去中心化的核心-外围网络通过平均更快地找到最终的跨界创新而优于其他网络。我们假设，去中心化的核心-外围网络结构通过屏蔽外围节点，使其不受核心社区在任何给定时间已知的局部最优的影响，从而加速了集体问题的解决。然后，我们建立了关于光谱图嵌入中的群落结构的“两个真理”假设，该假设首先在[arXiv:1808.07801]中提出，这表明邻接谱嵌入(ASE)捕获核心-外围结构，而拉普拉斯谱嵌入(LSE)捕获亲和力。我们发现，对于核心-外围网络，与基于lse的重采样相比，基于ase的重采样在创新SBM上最好地再现了具有相似性能的网络。由于“两真理”假设表明，ASE捕获了核心-外围结构，因此这一结果进一步支持了我们的假设。

{"title":"Decentralized core-periphery structure in social networks accelerates cultural innovation in agent-based model","authors":"Jesse Milzman, Cody Moser","doi":"10.48550/arXiv.2302.12121","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12121","url":null,"abstract":"Previous investigations into creative and innovation networks have suggested that innovations often occurs at the boundary between the network's core and periphery. In this work, we investigate the effect of global core-periphery network structure on the speed and quality of cultural innovation. Drawing on differing notions of core-periphery structure from [arXiv:1808.07801] and [doi:10.1016/S0378-8733(99)00019-2], we distinguish decentralized core-periphery, centralized core-periphery, and affinity network structure. We generate networks of these three classes from stochastic block models (SBMs), and use them to run an agent-based model (ABM) of collective cultural innovation, in which agents can only directly interact with their network neighbors. In order to discover the highest-scoring innovation, agents must discover and combine the highest innovations from two completely parallel technology trees. We find that decentralized core-periphery networks outperform the others by finding the final crossover innovation more quickly on average. We hypothesize that decentralized core-periphery network structure accelerates collective problem-solving by shielding peripheral nodes from the local optima known by the core community at any given time. We then build upon the\"Two Truths\"hypothesis regarding community structure in spectral graph embeddings, first articulated in [arXiv:1808.07801], which suggests that the adjacency spectral embedding (ASE) captures core-periphery structure, while the Laplacian spectral embedding (LSE) captures affinity. We find that, for core-periphery networks, ASE-based resampling best recreates networks with similar performance on the innovation SBM, compared to LSE-based resampling. Since the Two Truths hypothesis suggests that ASE captures core-periphery structure, this result further supports our hypothesis.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121851801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

(Arbitrary) Partial Communication (任意)部分通信

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-23 DOI: 10.48550/arXiv.2302.12090

R. Galimullin, Fernando R. Vel'azquez-Quesada

Communication within groups of agents has been lately the focus of research in dynamic epistemic logic (DEL). This paper studies a recently introduced form of partial (more precisely, topic-based) communication. This type of communication allows for modelling scenarios of multi-agent collaboration and negotiation, and it is particularly well-suited for situations in which sharing all information is not feasible/advisable. After presenting results on invariance and complexity of model checking, the paper compares partial communication to public announcements, probably the most well-known type of communication in DEL. It is shown that the settings are, update-wise, incomparable: there are scenarios in which the effect of a public announcement cannot be replicated by partial communication, and vice versa. Then, the paper shifts its attention to strategic topic-based communication. It does so by extending the language with a modality that quantifies over the topics the agents can `talk about'. For this new framework, it provides a complete axiomatisation, showing also that the new language's model checking problem is PSPACE-complete. The paper closes showing that, in terms of expressivity, this new language of arbitrary partial communication is incomparable to that of arbitrary public announcements.

动态认知逻辑(dynamic epistemic logic, DEL)是近年来智能体群体间的通信问题研究的热点。本文研究了最近引入的部分(更准确地说是基于主题的)通信形式。这种类型的通信允许建模多代理协作和协商的场景，它特别适合于共享所有信息不可行/不可取的情况。在给出了模型检查的不变性和复杂性的结果之后，本文将部分通信与公告进行了比较，公告可能是DEL中最著名的通信类型。结果表明，这些设置在更新方面是不可比拟的:在某些情况下，公开通知的效果无法通过部分通信复制，反之亦然。然后，本文将注意力转移到战略话题传播。它通过使用一种情态来扩展语言，这种情态可以量化代理可以“谈论”的主题。对于这个新框架，它提供了一个完整的公理化，也表明了新语言的模型检查问题是pspace完备的。论文最后表明，就表达能力而言，这种任意部分交流的新语言是任意公告无法比拟的。

{"title":"(Arbitrary) Partial Communication","authors":"R. Galimullin, Fernando R. Vel'azquez-Quesada","doi":"10.48550/arXiv.2302.12090","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12090","url":null,"abstract":"Communication within groups of agents has been lately the focus of research in dynamic epistemic logic (DEL). This paper studies a recently introduced form of partial (more precisely, topic-based) communication. This type of communication allows for modelling scenarios of multi-agent collaboration and negotiation, and it is particularly well-suited for situations in which sharing all information is not feasible/advisable. After presenting results on invariance and complexity of model checking, the paper compares partial communication to public announcements, probably the most well-known type of communication in DEL. It is shown that the settings are, update-wise, incomparable: there are scenarios in which the effect of a public announcement cannot be replicated by partial communication, and vice versa. Then, the paper shifts its attention to strategic topic-based communication. It does so by extending the language with a modality that quantifies over the topics the agents can `talk about'. For this new framework, it provides a complete axiomatisation, showing also that the new language's model checking problem is PSPACE-complete. The paper closes showing that, in terms of expressivity, this new language of arbitrary partial communication is incomparable to that of arbitrary public announcements.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129650774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fair Chore Division under Binary Supermodular Costs 二元超模成本下的公平家务分割

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-22 DOI: 10.48550/arXiv.2302.11530

Siddharth Barman, V. V. Narayan, Paritosh Verma

We study the problem of dividing indivisible chores among agents whose costs (for the chores) are supermodular set functions with binary marginals. Such functions capture complementarity among chores, i.e., they constitute an expressive class wherein the marginal disutility of each chore is either one or zero, and the marginals increase with respect to supersets. In this setting, we study the broad landscape of finding fair and efficient chore allocations. In particular, we establish the existence of $(i)$ EF1 and Pareto efficient chore allocations, $(ii)$ MMS-fair and Pareto efficient allocations, and $(iii)$ Lorenz dominating chore allocations. Furthermore, we develop polynomial-time algorithms--in the value oracle model--for computing the chore allocations for each of these fairness and efficiency criteria. Complementing these existential and algorithmic results, we show that in this chore division setting, the aforementioned fairness notions, namely EF1, MMS, and Lorenz domination are incomparable: an allocation that satisfies any one of these notions does not necessarily satisfy the others. Additionally, we study EFX chore division. In contrast to the above-mentioned positive results, we show that, for binary supermodular costs, Pareto efficient allocations that are even approximately EFX do not exist, for any arbitrarily small approximation constant. Focusing on EFX fairness alone, when the cost functions are identical we present an algorithm (Add-and-Fix) that computes an EFX allocation. For binary marginals, we show that Add-and-Fix runs in polynomial time.

研究了不可分割杂务的分划问题，这些杂务的代价是具有二元边际的超模集函数。这些函数捕获了杂务之间的互补性，即，它们构成了一个表达类，其中每个杂务的边际负效用为1或零，并且边际相对于超集而增加。在这种情况下，我们研究寻找公平和有效的家务分配的广阔前景。特别地，我们建立了$(i)$ EF1和Pareto有效家务分配，$(ii)$ mms公平和Pareto有效分配，以及$(iii)$ Lorenz支配家务分配的存在性。此外，我们在价值oracle模型中开发了多项式时间算法，用于计算这些公平和效率标准的家务分配。补充这些存在主义和算法的结果，我们表明，在这种家务分工设置中，前面提到的公平概念，即EF1, MMS和洛伦兹支配是不可比较的:满足这些概念中的任何一个的分配不一定满足其他概念。此外，我们研究了EFX的家务分工。与上述正结果相反，我们证明，对于二元超模成本，对于任意小的近似常数，甚至近似于EFX的Pareto有效分配是不存在的。仅关注EFX公平性，当成本函数相同时，我们提出了计算EFX分配的算法(Add-and-Fix)。对于二元边际，我们证明了加法和修正在多项式时间内运行。

{"title":"Fair Chore Division under Binary Supermodular Costs","authors":"Siddharth Barman, V. V. Narayan, Paritosh Verma","doi":"10.48550/arXiv.2302.11530","DOIUrl":"https://doi.org/10.48550/arXiv.2302.11530","url":null,"abstract":"We study the problem of dividing indivisible chores among agents whose costs (for the chores) are supermodular set functions with binary marginals. Such functions capture complementarity among chores, i.e., they constitute an expressive class wherein the marginal disutility of each chore is either one or zero, and the marginals increase with respect to supersets. In this setting, we study the broad landscape of finding fair and efficient chore allocations. In particular, we establish the existence of $(i)$ EF1 and Pareto efficient chore allocations, $(ii)$ MMS-fair and Pareto efficient allocations, and $(iii)$ Lorenz dominating chore allocations. Furthermore, we develop polynomial-time algorithms--in the value oracle model--for computing the chore allocations for each of these fairness and efficiency criteria. Complementing these existential and algorithmic results, we show that in this chore division setting, the aforementioned fairness notions, namely EF1, MMS, and Lorenz domination are incomparable: an allocation that satisfies any one of these notions does not necessarily satisfy the others. Additionally, we study EFX chore division. In contrast to the above-mentioned positive results, we show that, for binary supermodular costs, Pareto efficient allocations that are even approximately EFX do not exist, for any arbitrarily small approximation constant. Focusing on EFX fairness alone, when the cost functions are identical we present an algorithm (Add-and-Fix) that computes an EFX allocation. For binary marginals, we show that Add-and-Fix runs in polynomial time.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129242452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Crowd simulation incorporating a route choice model and similarity evaluation using real large-scale data 结合路线选择模型和相似度评估的大规模真实数据人群仿真

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-21 DOI: 10.48550/arXiv.2302.10421

Ryo Nishida, Masaki Onishi, Koichi Hashimoto

Modeling and simulation approaches that express crowd movement with mathematical models are widely and actively studied to understand crowd movement and resolve crowd accidents. Existing literature on crowd modeling focuses on only the decision-making of walking behavior. However, the decision-making of route choice, which is a higher-level decision, should also be modeled for constructing more practical simulations. Furthermore, the reproducibility evaluation of the crowd simulation incorporating the route choice model using real data is insufficient. Therefore, we generalize and propose a crowd simulation framework that includes actual crowd movement measurements, route choice model estimation, and crowd simulator construction. We use the Discrete choice model as the route choice model and the Social force model as the walking model. In experiments, we measure crowd movements during an evacuation drill in a theater and a firework event where tens of thousands of people moved and prove that the crowd simulation incorporating the route choice model can reproduce the real large-scale crowd movement more accurately.

用数学模型表达人群运动的建模和仿真方法被广泛而积极地研究，以了解人群运动和解决人群事故。现有的人群建模文献只关注步行行为的决策。然而，路径选择决策作为一个更高层次的决策，也应该建模，以构建更实际的仿真。此外，结合实际数据的路线选择模型对人群仿真的再现性评价不足。因此，我们推广并提出了一个包括实际人群运动测量、路线选择模型估计和人群模拟器构建的人群模拟框架。我们使用离散选择模型作为路径选择模型，使用社会力模型作为步行模型。在实验中，我们测量了剧院疏散演习和数万人移动的烟花活动中的人群运动，并证明了结合路线选择模型的人群模拟可以更准确地再现真实的大规模人群运动。

{"title":"Crowd simulation incorporating a route choice model and similarity evaluation using real large-scale data","authors":"Ryo Nishida, Masaki Onishi, Koichi Hashimoto","doi":"10.48550/arXiv.2302.10421","DOIUrl":"https://doi.org/10.48550/arXiv.2302.10421","url":null,"abstract":"Modeling and simulation approaches that express crowd movement with mathematical models are widely and actively studied to understand crowd movement and resolve crowd accidents. Existing literature on crowd modeling focuses on only the decision-making of walking behavior. However, the decision-making of route choice, which is a higher-level decision, should also be modeled for constructing more practical simulations. Furthermore, the reproducibility evaluation of the crowd simulation incorporating the route choice model using real data is insufficient. Therefore, we generalize and propose a crowd simulation framework that includes actual crowd movement measurements, route choice model estimation, and crowd simulator construction. We use the Discrete choice model as the route choice model and the Social force model as the walking model. In experiments, we measure crowd movements during an evacuation drill in a theater and a firework event where tens of thousands of people moved and prove that the crowd simulation incorporating the route choice model can reproduce the real large-scale crowd movement more accurately.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134143077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0