Pub Date : 2023-03-03DOI: 10.48550/arXiv.2303.01768
Ji-Yun Oh, Joonkee Kim, Minchan Jeong, Se-Young Yun
The multi-agent setting is intricate and unpredictable since the behaviors of multiple agents influence one another. To address this environmental uncertainty, distributional reinforcement learning algorithms that incorporate uncertainty via distributional output have been integrated with multi-agent reinforcement learning (MARL) methods, achieving state-of-the-art performance. However, distributional MARL algorithms still rely on the traditional $epsilon$-greedy, which does not take cooperative strategy into account. In this paper, we present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution. Initially, we take expectations from the upper quantiles of state-action values for exploration, which are optimistic actions, and gradually shift the sampling region of quantiles to the full distribution for exploitation. By ensuring that each agent is exposed to the same level of risk, we can force them to take cooperatively optimistic actions. Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression appropriately controlling the level of risk.
{"title":"Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning","authors":"Ji-Yun Oh, Joonkee Kim, Minchan Jeong, Se-Young Yun","doi":"10.48550/arXiv.2303.01768","DOIUrl":"https://doi.org/10.48550/arXiv.2303.01768","url":null,"abstract":"The multi-agent setting is intricate and unpredictable since the behaviors of multiple agents influence one another. To address this environmental uncertainty, distributional reinforcement learning algorithms that incorporate uncertainty via distributional output have been integrated with multi-agent reinforcement learning (MARL) methods, achieving state-of-the-art performance. However, distributional MARL algorithms still rely on the traditional $epsilon$-greedy, which does not take cooperative strategy into account. In this paper, we present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution. Initially, we take expectations from the upper quantiles of state-action values for exploration, which are optimistic actions, and gradually shift the sampling region of quantiles to the full distribution for exploitation. By ensuring that each agent is exposed to the same level of risk, we can force them to take cooperatively optimistic actions. Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression appropriately controlling the level of risk.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116436789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.48550/arXiv.2303.00451
Woojun Kim, Whiyoung Jung, Myungsik Cho, Young-Jin Sung
In this paper, we propose a new mutual information framework for multi-agent reinforcement learning to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. The derived tractable objective can be interpreted as maximum entropy reinforcement learning combined with uncertainty reduction of other agents actions. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic, which follows centralized learning with decentralized execution. We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms other MARL algorithms in multi-agent tasks requiring high-quality coordination.
{"title":"A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning","authors":"Woojun Kim, Whiyoung Jung, Myungsik Cho, Young-Jin Sung","doi":"10.48550/arXiv.2303.00451","DOIUrl":"https://doi.org/10.48550/arXiv.2303.00451","url":null,"abstract":"In this paper, we propose a new mutual information framework for multi-agent reinforcement learning to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. The derived tractable objective can be interpreted as maximum entropy reinforcement learning combined with uncertainty reduction of other agents actions. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic, which follows centralized learning with decentralized execution. We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms other MARL algorithms in multi-agent tasks requiring high-quality coordination.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122865703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.48550/arXiv.2303.00435
Inbal Rozencweig, R. Meir, Nick Mattei, Ofra Amir
The explosion of conference paper submissions in AI and related fields, has underscored the need to improve many aspects of the peer review process, especially the matching of papers and reviewers. Recent work argues that the key to improve this matching is to modify aspects of the emph{bidding phase} itself, to ensure that the set of bids over papers is balanced, and in particular to avoid emph{orphan papers}, i.e., those papers that receive no bids. In an attempt to understand and mitigate this problem, we have developed a flexible bidding platform to test adaptations to the bidding process. Using this platform, we performed a field experiment during the bidding phase of a medium-size international workshop that compared two bidding methods. We further examined via controlled experiments on Amazon Mechanical Turk various factors that affect bidding, in particular the order in which papers are presented cite{cabanac2013capitalizing,fiez2020super}; and information on paper demand cite{meir2021market}. Our results suggest that several simple adaptations, that can be added to any existing platform, may significantly reduce the skew in bids, thereby improving the allocation for both reviewers and conference organizers.
{"title":"Mitigating Skewed Bidding for Conference Paper Assignment","authors":"Inbal Rozencweig, R. Meir, Nick Mattei, Ofra Amir","doi":"10.48550/arXiv.2303.00435","DOIUrl":"https://doi.org/10.48550/arXiv.2303.00435","url":null,"abstract":"The explosion of conference paper submissions in AI and related fields, has underscored the need to improve many aspects of the peer review process, especially the matching of papers and reviewers. Recent work argues that the key to improve this matching is to modify aspects of the emph{bidding phase} itself, to ensure that the set of bids over papers is balanced, and in particular to avoid emph{orphan papers}, i.e., those papers that receive no bids. In an attempt to understand and mitigate this problem, we have developed a flexible bidding platform to test adaptations to the bidding process. Using this platform, we performed a field experiment during the bidding phase of a medium-size international workshop that compared two bidding methods. We further examined via controlled experiments on Amazon Mechanical Turk various factors that affect bidding, in particular the order in which papers are presented cite{cabanac2013capitalizing,fiez2020super}; and information on paper demand cite{meir2021market}. Our results suggest that several simple adaptations, that can be added to any existing platform, may significantly reduce the skew in bids, thereby improving the allocation for both reviewers and conference organizers.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116309607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.48550/arXiv.2303.00745
Mikkel Abrahamsen, Tzvika Geft, D. Halperin, Barak Ugav
We study a fundamental NP-hard motion coordination problem for multi-robot/multi-agent systems: We are given a graph $G$ and set of agents, where each agent has a given directed path in $G$. Each agent is initially located on the first vertex of its path. At each time step an agent can move to the next vertex on its path, provided that the vertex is not occupied by another agent. The goal is to find a sequence of such moves along the given paths so that each reaches its target, or to report that no such sequence exists. The problem models guidepath-based transport systems, which is a pertinent abstraction for traffic in a variety of contemporary applications, ranging from train networks or Automated Guided Vehicles (AGVs) in factories, through computer game animations, to qubit transport in quantum computing. It also arises as a sub-problem in the more general multi-robot motion-planning problem. We provide a fine-grained tractability analysis of the problem by considering new assumptions and identifying minimal values of key parameters for which the problem remains NP-hard. Our analysis identifies a critical parameter called vertex multiplicity (VM), defined as the maximum number of paths passing through the same vertex. We show that a prevalent variant of the problem, which is equivalent to Sequential Resource Allocation (concerning deadlock prevention for concurrent processes), is NP-hard even when VM is 3. On the positive side, for VM $le$ 2 we give an efficient algorithm that iteratively resolves cycles of blocking relations among agents. We also present a variant that is NP-hard when the VM is 2 even when $G$ is a 2D grid and each path lies in a single grid row or column. By studying highly distilled yet NP-hard variants, we deepen the understanding of what makes the problem intractable and thereby guide the search for efficient solutions under practical assumptions.
{"title":"Coordination of Multiple Robots along Given Paths with Bounded Junction Complexity","authors":"Mikkel Abrahamsen, Tzvika Geft, D. Halperin, Barak Ugav","doi":"10.48550/arXiv.2303.00745","DOIUrl":"https://doi.org/10.48550/arXiv.2303.00745","url":null,"abstract":"We study a fundamental NP-hard motion coordination problem for multi-robot/multi-agent systems: We are given a graph $G$ and set of agents, where each agent has a given directed path in $G$. Each agent is initially located on the first vertex of its path. At each time step an agent can move to the next vertex on its path, provided that the vertex is not occupied by another agent. The goal is to find a sequence of such moves along the given paths so that each reaches its target, or to report that no such sequence exists. The problem models guidepath-based transport systems, which is a pertinent abstraction for traffic in a variety of contemporary applications, ranging from train networks or Automated Guided Vehicles (AGVs) in factories, through computer game animations, to qubit transport in quantum computing. It also arises as a sub-problem in the more general multi-robot motion-planning problem. We provide a fine-grained tractability analysis of the problem by considering new assumptions and identifying minimal values of key parameters for which the problem remains NP-hard. Our analysis identifies a critical parameter called vertex multiplicity (VM), defined as the maximum number of paths passing through the same vertex. We show that a prevalent variant of the problem, which is equivalent to Sequential Resource Allocation (concerning deadlock prevention for concurrent processes), is NP-hard even when VM is 3. On the positive side, for VM $le$ 2 we give an efficient algorithm that iteratively resolves cycles of blocking relations among agents. We also present a variant that is NP-hard when the VM is 2 even when $G$ is a 2D grid and each path lies in a single grid row or column. By studying highly distilled yet NP-hard variants, we deepen the understanding of what makes the problem intractable and thereby guide the search for efficient solutions under practical assumptions.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125677242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-28DOI: 10.48550/arXiv.2302.14604
Bengisu Guresti, Abdullah Vanlioglu, N. K. Ure
Achieving and maintaining cooperation between agents to accomplish a common objective is one of the central goals of Multi-Agent Reinforcement Learning (MARL). Nevertheless in many real-world scenarios, separately trained and specialized agents are deployed into a shared environment, or the environment requires multiple objectives to be achieved by different coexisting parties. These variations among specialties and objectives are likely to cause mixed motives that eventually result in a social dilemma where all the parties are at a loss. In order to resolve this issue, we propose the Incentive Q-Flow (IQ-Flow) algorithm, which modifies the system's reward setup with an incentive regulator agent such that the cooperative policy also corresponds to the self-interested policy for the agents. Unlike the existing methods that learn to incentivize self-interested agents, IQ-Flow does not make any assumptions about agents' policies or learning algorithms, which enables the generalization of the developed framework to a wider array of applications. IQ-Flow performs an offline evaluation of the optimality of the learned policies using the data provided by other agents to determine cooperative and self-interested policies. Next, IQ-Flow uses meta-gradient learning to estimate how policy evaluation changes according to given incentives and modifies the incentive such that the greedy policy for cooperative objective and self-interested objective yield the same actions. We present the operational characteristics of IQ-Flow in Iterated Matrix Games. We demonstrate that IQ-Flow outperforms the state-of-the-art incentive design algorithm in Escape Room and 2-Player Cleanup environments. We further demonstrate that the pretrained IQ-Flow mechanism significantly outperforms the performance of the shared reward setup in the 2-Player Cleanup environment.
{"title":"IQ-Flow: Mechanism Design for Inducing Cooperative Behavior to Self-Interested Agents in Sequential Social Dilemmas","authors":"Bengisu Guresti, Abdullah Vanlioglu, N. K. Ure","doi":"10.48550/arXiv.2302.14604","DOIUrl":"https://doi.org/10.48550/arXiv.2302.14604","url":null,"abstract":"Achieving and maintaining cooperation between agents to accomplish a common objective is one of the central goals of Multi-Agent Reinforcement Learning (MARL). Nevertheless in many real-world scenarios, separately trained and specialized agents are deployed into a shared environment, or the environment requires multiple objectives to be achieved by different coexisting parties. These variations among specialties and objectives are likely to cause mixed motives that eventually result in a social dilemma where all the parties are at a loss. In order to resolve this issue, we propose the Incentive Q-Flow (IQ-Flow) algorithm, which modifies the system's reward setup with an incentive regulator agent such that the cooperative policy also corresponds to the self-interested policy for the agents. Unlike the existing methods that learn to incentivize self-interested agents, IQ-Flow does not make any assumptions about agents' policies or learning algorithms, which enables the generalization of the developed framework to a wider array of applications. IQ-Flow performs an offline evaluation of the optimality of the learned policies using the data provided by other agents to determine cooperative and self-interested policies. Next, IQ-Flow uses meta-gradient learning to estimate how policy evaluation changes according to given incentives and modifies the incentive such that the greedy policy for cooperative objective and self-interested objective yield the same actions. We present the operational characteristics of IQ-Flow in Iterated Matrix Games. We demonstrate that IQ-Flow outperforms the state-of-the-art incentive design algorithm in Escape Room and 2-Player Cleanup environments. We further demonstrate that the pretrained IQ-Flow mechanism significantly outperforms the performance of the shared reward setup in the 2-Player Cleanup environment.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131477169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-27DOI: 10.48550/arXiv.2302.14176
Taylor Dohmen, Ashutosh Trivedi
A basic assumption of traditional reinforcement learning is that the value of a reward does not change once it is received by an agent. The present work forgoes this assumption and considers the situation where the value of a reward decays proportionally to the time elapsed since it was obtained. Emphasizing the inflection point occurring at the time of payment, we use the term asset to refer to a reward that is currently in the possession of an agent. Adopting this language, we initiate the study of depreciating assets within the framework of infinite-horizon quantitative optimization. In particular, we propose a notion of asset depreciation, inspired by classical exponential discounting, where the value of an asset is scaled by a fixed discount factor at each time step after it is obtained by the agent. We formulate a Bellman-style equational characterization of optimality in this context and develop a model-free reinforcement learning approach to obtain optimal policies.
{"title":"Reinforcement Learning with Depreciating Assets","authors":"Taylor Dohmen, Ashutosh Trivedi","doi":"10.48550/arXiv.2302.14176","DOIUrl":"https://doi.org/10.48550/arXiv.2302.14176","url":null,"abstract":"A basic assumption of traditional reinforcement learning is that the value of a reward does not change once it is received by an agent. The present work forgoes this assumption and considers the situation where the value of a reward decays proportionally to the time elapsed since it was obtained. Emphasizing the inflection point occurring at the time of payment, we use the term asset to refer to a reward that is currently in the possession of an agent. Adopting this language, we initiate the study of depreciating assets within the framework of infinite-horizon quantitative optimization. In particular, we propose a notion of asset depreciation, inspired by classical exponential discounting, where the value of an asset is scaled by a fixed discount factor at each time step after it is obtained by the agent. We formulate a Bellman-style equational characterization of optimality in this context and develop a model-free reinforcement learning approach to obtain optimal policies.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134448818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-27DOI: 10.48550/arXiv.2302.13653
Siddharth Chandak, Ilai Bistritz, N. Bambos
Consider a decision-maker that can pick one out of $K$ actions to control an unknown system, for $T$ turns. The actions are interpreted as different configurations or policies. Holding the same action fixed, the system asymptotically converges to a unique equilibrium, as a function of this action. The dynamics of the system are unknown to the decision-maker, which can only observe a noisy reward at the end of every turn. The decision-maker wants to maximize its accumulated reward over the $T$ turns. Learning what equilibria are better results in higher rewards, but waiting for the system to converge to equilibrium costs valuable time. Existing bandit algorithms, either stochastic or adversarial, achieve linear (trivial) regret for this problem. We present a novel algorithm, termed Upper Equilibrium Concentration Bound (UECB), that knows to switch an action quickly if it is not worth it to wait until the equilibrium is reached. This is enabled by employing convergence bounds to determine how far the system is from equilibrium. We prove that UECB achieves a regret of $mathcal{O}(log(T)+tau_clog(tau_c)+tau_cloglog(T))$ for this equilibrium bandit problem where $tau_c$ is the worst case approximate convergence time to equilibrium. We then show that both epidemic control and game control are special cases of equilibrium bandits, where $tau_clog tau_c$ typically dominates the regret. We then test UECB numerically for both of these applications.
{"title":"Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics","authors":"Siddharth Chandak, Ilai Bistritz, N. Bambos","doi":"10.48550/arXiv.2302.13653","DOIUrl":"https://doi.org/10.48550/arXiv.2302.13653","url":null,"abstract":"Consider a decision-maker that can pick one out of $K$ actions to control an unknown system, for $T$ turns. The actions are interpreted as different configurations or policies. Holding the same action fixed, the system asymptotically converges to a unique equilibrium, as a function of this action. The dynamics of the system are unknown to the decision-maker, which can only observe a noisy reward at the end of every turn. The decision-maker wants to maximize its accumulated reward over the $T$ turns. Learning what equilibria are better results in higher rewards, but waiting for the system to converge to equilibrium costs valuable time. Existing bandit algorithms, either stochastic or adversarial, achieve linear (trivial) regret for this problem. We present a novel algorithm, termed Upper Equilibrium Concentration Bound (UECB), that knows to switch an action quickly if it is not worth it to wait until the equilibrium is reached. This is enabled by employing convergence bounds to determine how far the system is from equilibrium. We prove that UECB achieves a regret of $mathcal{O}(log(T)+tau_clog(tau_c)+tau_cloglog(T))$ for this equilibrium bandit problem where $tau_c$ is the worst case approximate convergence time to equilibrium. We then show that both epidemic control and game control are special cases of equilibrium bandits, where $tau_clog tau_c$ typically dominates the regret. We then test UECB numerically for both of these applications.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129931787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-26DOI: 10.48550/arXiv.2302.13405
Jaime Arias, W. Jamroga, W. Penczek, L. Petrucci, Teofil Sidoruk
We define extensions of CTL and TCTL with strategic operators, called Strategic CTL (SCTL) and Strategic TCTL (STCTL), respectively. For each of the above logics we give a synchronous and asynchronous semantics, i.e., STCTL is interpreted over networks of extended Timed Automata (TA) that either make synchronous moves or synchronise via joint actions. We consider several semantics regarding information: imperfect (i) and perfect (I), and recall: imperfect (r) and perfect (R). We prove that SCTL is more expressive than ATL for all semantics, and this holds for the timed versions as well. Moreover, the model checking problem for SCTL[ir] is of the same complexity as for ATL[ir], the model checking problem for STCTL[ir] is of the same complexity as for TCTL, while for STCTL[iR] it is undecidable as for ATL[iR]. The above results suggest to use SCTL[ir] and STCTL[ir] in practical applications. Therefore, we use the tool IMITATOR to support model checking of STCTL[ir].
{"title":"Strategic (Timed) Computation Tree Logic","authors":"Jaime Arias, W. Jamroga, W. Penczek, L. Petrucci, Teofil Sidoruk","doi":"10.48550/arXiv.2302.13405","DOIUrl":"https://doi.org/10.48550/arXiv.2302.13405","url":null,"abstract":"We define extensions of CTL and TCTL with strategic operators, called Strategic CTL (SCTL) and Strategic TCTL (STCTL), respectively. For each of the above logics we give a synchronous and asynchronous semantics, i.e., STCTL is interpreted over networks of extended Timed Automata (TA) that either make synchronous moves or synchronise via joint actions. We consider several semantics regarding information: imperfect (i) and perfect (I), and recall: imperfect (r) and perfect (R). We prove that SCTL is more expressive than ATL for all semantics, and this holds for the timed versions as well. Moreover, the model checking problem for SCTL[ir] is of the same complexity as for ATL[ir], the model checking problem for STCTL[ir] is of the same complexity as for TCTL, while for STCTL[iR] it is undecidable as for ATL[iR]. The above results suggest to use SCTL[ir] and STCTL[ir] in practical applications. Therefore, we use the tool IMITATOR to support model checking of STCTL[ir].","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125975278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-26DOI: 10.48550/arXiv.2302.13224
Mingyu Xiao, Guoliang Qiu, Sen Huang
We study the problem of allocating indivisible chores to agents under the Maximin share (MMS) fairness notion. The chores are embedded on a graph and each bundle of chores assigned to an agent should be connected. Although there is a simple algorithm for MMS allocations of goods on trees, it remains open whether MMS allocations of chores on trees always exist or not, which is a simple but annoying problem in chores allocation. In this paper, we introduce a new method for chores allocation with connectivity constraints, called the group-satisfied method, that can show the existence of MMS allocations of chores on several subclasses of trees. Even these subcases are non-trivial and our results can be considered as a significant step to the open problem. We also consider MMS allocations of chores on cycles where we get the tight approximation ratio for three agents. Our result was obtained via the linear programming (LP) method, which enables us not only to compute approximate MMS allocations but also to construct tight examples of the nonexistence of MMS allocations without complicated combinatorial analysis. These two proposed methods, the group-satisfied method and the LP method, have the potential to solve more related problems.
{"title":"MMS Allocations of Chores with Connectivity Constraints: New Methods and New Results","authors":"Mingyu Xiao, Guoliang Qiu, Sen Huang","doi":"10.48550/arXiv.2302.13224","DOIUrl":"https://doi.org/10.48550/arXiv.2302.13224","url":null,"abstract":"We study the problem of allocating indivisible chores to agents under the Maximin share (MMS) fairness notion. The chores are embedded on a graph and each bundle of chores assigned to an agent should be connected. Although there is a simple algorithm for MMS allocations of goods on trees, it remains open whether MMS allocations of chores on trees always exist or not, which is a simple but annoying problem in chores allocation. In this paper, we introduce a new method for chores allocation with connectivity constraints, called the group-satisfied method, that can show the existence of MMS allocations of chores on several subclasses of trees. Even these subcases are non-trivial and our results can be considered as a significant step to the open problem. We also consider MMS allocations of chores on cycles where we get the tight approximation ratio for three agents. Our result was obtained via the linear programming (LP) method, which enables us not only to compute approximate MMS allocations but also to construct tight examples of the nonexistence of MMS allocations without complicated combinatorial analysis. These two proposed methods, the group-satisfied method and the LP method, have the potential to solve more related problems.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122305444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-26DOI: 10.48550/arXiv.2302.13232
Bryce Wiedenbeck, Erik Brinkman
We present new data structures for representing symmetric normal-form games. These data structures are optimized for efficiently computing the expected utility of each unilateral pure-strategy deviation from a symmetric mixed-strategy profile. The cumulative effect of numerous incremental innovations is a dramatic speedup in the computation of symmetric mixed-strategy Nash equilibria, making it practical to represent and solve games with dozens to hundreds of players. These data structures naturally extend to role-symmetric and action-graph games with similar benefits.
{"title":"Data Structures for Deviation Payoffs","authors":"Bryce Wiedenbeck, Erik Brinkman","doi":"10.48550/arXiv.2302.13232","DOIUrl":"https://doi.org/10.48550/arXiv.2302.13232","url":null,"abstract":"We present new data structures for representing symmetric normal-form games. These data structures are optimized for efficiently computing the expected utility of each unilateral pure-strategy deviation from a symmetric mixed-strategy profile. The cumulative effect of numerous incremental innovations is a dramatic speedup in the computation of symmetric mixed-strategy Nash equilibria, making it practical to represent and solve games with dozens to hundreds of players. These data structures naturally extend to role-symmetric and action-graph games with similar benefits.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131718768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}