Credit assignment is a core problem that distinguishes agents' marginal contributions for optimizing cooperative strategies in multi-agent reinforcement learning (MARL). Current credit assignment methods usually assume synchronous decision-making among agents. However, a prerequisite for many realistic cooperative tasks is asynchronous decision-making by agents, without waiting for others to avoid disastrous consequences. To address this issue, we propose an asynchronous credit assignment framework with a problem model called ADEX-POMDP and a multiplicative value decomposition (MVD) algorithm. ADEX-POMDP is an asynchronous problem model with extra virtual agents for a decentralized partially observable markov decision process. We prove that ADEX-POMDP preserves both the task equilibrium and the algorithm convergence. MVD utilizes multiplicative interaction to efficiently capture the interactions of asynchronous decisions, and we theoretically demonstrate its advantages in handling asynchronous tasks. Experimental results show that on two asynchronous decision-making benchmarks, Overcooked and POAC, MVD not only consistently outperforms state-of-the-art MARL methods but also provides the interpretability for asynchronous cooperation.
{"title":"Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning","authors":"Yongheng Liang, Hejun Wu, Haitao Wang, Hao Cai","doi":"arxiv-2408.03692","DOIUrl":"https://doi.org/arxiv-2408.03692","url":null,"abstract":"Credit assignment is a core problem that distinguishes agents' marginal\u0000contributions for optimizing cooperative strategies in multi-agent\u0000reinforcement learning (MARL). Current credit assignment methods usually assume\u0000synchronous decision-making among agents. However, a prerequisite for many\u0000realistic cooperative tasks is asynchronous decision-making by agents, without\u0000waiting for others to avoid disastrous consequences. To address this issue, we\u0000propose an asynchronous credit assignment framework with a problem model called\u0000ADEX-POMDP and a multiplicative value decomposition (MVD) algorithm. ADEX-POMDP\u0000is an asynchronous problem model with extra virtual agents for a decentralized\u0000partially observable markov decision process. We prove that ADEX-POMDP\u0000preserves both the task equilibrium and the algorithm convergence. MVD utilizes\u0000multiplicative interaction to efficiently capture the interactions of\u0000asynchronous decisions, and we theoretically demonstrate its advantages in\u0000handling asynchronous tasks. Experimental results show that on two asynchronous\u0000decision-making benchmarks, Overcooked and POAC, MVD not only consistently\u0000outperforms state-of-the-art MARL methods but also provides the\u0000interpretability for asynchronous cooperation.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stochastic multi-agent multi-armed bandits typically assume that the rewards from each arm follow a fixed distribution, regardless of which agent pulls the arm. However, in many real-world settings, rewards can depend on the sensitivity of each agent to their environment. In medical screening, disease detection rates can vary by test type; in preference matching, rewards can depend on user preferences; and in environmental sensing, observation quality can vary across sensors. Since past work does not specify how to allocate agents of heterogeneous but known sensitivity of these types in a stochastic bandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates information from diverse agents. In doing so, we address the joint challenges of (i) aggregating the rewards, which follow different distributions for each agent-arm pair, and (ii) coordinating the assignments of agents to arms. Min-Width facilitates efficient collaboration among heterogeneous agents, exploiting the known structure in the agents' reward functions to weight their rewards accordingly. We analyze the regret of Min-Width and conduct pseudo-synthetic and fully synthetic experiments to study the performance of different levels of information sharing. Our results confirm that the gains to modeling agent heterogeneity tend to be greater when the sensitivities are more varied across agents, while combining more information does not always improve performance.
{"title":"Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents","authors":"Lucia Gordon, Esther Rolf, Milind Tambe","doi":"arxiv-2408.03405","DOIUrl":"https://doi.org/arxiv-2408.03405","url":null,"abstract":"Stochastic multi-agent multi-armed bandits typically assume that the rewards\u0000from each arm follow a fixed distribution, regardless of which agent pulls the\u0000arm. However, in many real-world settings, rewards can depend on the\u0000sensitivity of each agent to their environment. In medical screening, disease\u0000detection rates can vary by test type; in preference matching, rewards can\u0000depend on user preferences; and in environmental sensing, observation quality\u0000can vary across sensors. Since past work does not specify how to allocate\u0000agents of heterogeneous but known sensitivity of these types in a stochastic\u0000bandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates\u0000information from diverse agents. In doing so, we address the joint challenges\u0000of (i) aggregating the rewards, which follow different distributions for each\u0000agent-arm pair, and (ii) coordinating the assignments of agents to arms.\u0000Min-Width facilitates efficient collaboration among heterogeneous agents,\u0000exploiting the known structure in the agents' reward functions to weight their\u0000rewards accordingly. We analyze the regret of Min-Width and conduct\u0000pseudo-synthetic and fully synthetic experiments to study the performance of\u0000different levels of information sharing. Our results confirm that the gains to\u0000modeling agent heterogeneity tend to be greater when the sensitivities are more\u0000varied across agents, while combining more information does not always improve\u0000performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarita Rattanakunuprakarn, Mingzhou Jin, Mustafa Can Camur, Xueping Li
As global supply chains and freight volumes grow, the U.S. faces escalating transportation demands. The heavy reliance on road transport, coupled with the underutilization of the railway system, results in congested highways, prolonged transportation times, higher costs, and increased carbon emissions. California's San Pedro Port Complex (SPPC), the nation's busiest, incurs a significant share of these challenges. We utilize an agent-based simulation to replicate real-world scenarios, focusing on the intricacies of interactions in a modified intermodal inbound freight system for the SPPC. This involves relocating container classification to potential warehouses in California, Utah, Arizona, and Nevada, rather than exclusively at port areas. Our primary aim is to evaluate the proposed system's efficiency, considering cost and freight throughput, while also examining the effects of workforce shortages. Computational analysis suggests that strategically installing intermodal capabilities in select warehouses can reduce transportation costs, boost throughput, and foster resour
{"title":"Assessing the Effects of Container Handling Strategies on Enhancing Freight Throughput","authors":"Sarita Rattanakunuprakarn, Mingzhou Jin, Mustafa Can Camur, Xueping Li","doi":"arxiv-2408.02768","DOIUrl":"https://doi.org/arxiv-2408.02768","url":null,"abstract":"As global supply chains and freight volumes grow, the U.S. faces escalating\u0000transportation demands. The heavy reliance on road transport, coupled with the\u0000underutilization of the railway system, results in congested highways,\u0000prolonged transportation times, higher costs, and increased carbon emissions.\u0000California's San Pedro Port Complex (SPPC), the nation's busiest, incurs a\u0000significant share of these challenges. We utilize an agent-based simulation to\u0000replicate real-world scenarios, focusing on the intricacies of interactions in\u0000a modified intermodal inbound freight system for the SPPC. This involves\u0000relocating container classification to potential warehouses in California,\u0000Utah, Arizona, and Nevada, rather than exclusively at port areas. Our primary\u0000aim is to evaluate the proposed system's efficiency, considering cost and\u0000freight throughput, while also examining the effects of workforce shortages.\u0000Computational analysis suggests that strategically installing intermodal\u0000capabilities in select warehouses can reduce transportation costs, boost\u0000throughput, and foster resour","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose Exanna, a framework to realize agents that incorporate values in decision making. An Exannaagent considers the values of itself and others when providing rationales for its actions and evaluating the rationales provided by others. Via multiagent simulation, we demonstrate that considering values in decision making and producing rationales, especially for norm-deviating actions, leads to (1) higher conflict resolution, (2) better social experience, (3) higher privacy, and (4) higher flexibility.
{"title":"Value-Based Rationales Improve Social Experience: A Multiagent Simulation Study","authors":"Sz-Ting Tzeng, Nirav Ajmeri, Munindar P. Singh","doi":"arxiv-2408.02117","DOIUrl":"https://doi.org/arxiv-2408.02117","url":null,"abstract":"We propose Exanna, a framework to realize agents that incorporate values in\u0000decision making. An Exannaagent considers the values of itself and others when\u0000providing rationales for its actions and evaluating the rationales provided by\u0000others. Via multiagent simulation, we demonstrate that considering values in\u0000decision making and producing rationales, especially for norm-deviating\u0000actions, leads to (1) higher conflict resolution, (2) better social experience,\u0000(3) higher privacy, and (4) higher flexibility.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mustafa Yasir, Andrew Howes, Vasilios Mavroudis, Chris Hicks
Multi-agent reinforcement learning (MARL) methods, while effective in zero-sum or positive-sum games, often yield suboptimal outcomes in general-sum games where cooperation is essential for achieving globally optimal outcomes. Matrix game social dilemmas, which abstract key aspects of general-sum interactions, such as cooperation, risk, and trust, fail to model the temporal and spatial dynamics characteristic of real-world scenarios. In response, our study extends matrix game social dilemmas into more complex, higher-dimensional MARL environments. We adapt a gridworld implementation of the Stag Hunt dilemma to more closely match the decision-space of a one-shot matrix game while also introducing variable environment complexity. Our findings indicate that as complexity increases, MARL agents trained in these environments converge to suboptimal strategies, consistent with the risk-dominant Nash equilibria strategies found in matrix games. Our work highlights the impact of environment complexity on achieving optimal outcomes in higher-dimensional game-theoretic MARL environments.
{"title":"Environment Complexity and Nash Equilibria in a Sequential Social Dilemma","authors":"Mustafa Yasir, Andrew Howes, Vasilios Mavroudis, Chris Hicks","doi":"arxiv-2408.02148","DOIUrl":"https://doi.org/arxiv-2408.02148","url":null,"abstract":"Multi-agent reinforcement learning (MARL) methods, while effective in\u0000zero-sum or positive-sum games, often yield suboptimal outcomes in general-sum\u0000games where cooperation is essential for achieving globally optimal outcomes.\u0000Matrix game social dilemmas, which abstract key aspects of general-sum\u0000interactions, such as cooperation, risk, and trust, fail to model the temporal\u0000and spatial dynamics characteristic of real-world scenarios. In response, our\u0000study extends matrix game social dilemmas into more complex, higher-dimensional\u0000MARL environments. We adapt a gridworld implementation of the Stag Hunt dilemma\u0000to more closely match the decision-space of a one-shot matrix game while also\u0000introducing variable environment complexity. Our findings indicate that as\u0000complexity increases, MARL agents trained in these environments converge to\u0000suboptimal strategies, consistent with the risk-dominant Nash equilibria\u0000strategies found in matrix games. Our work highlights the impact of environment\u0000complexity on achieving optimal outcomes in higher-dimensional game-theoretic\u0000MARL environments.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When engaging in conversations, dialogue agents in a virtual simulation environment may exhibit their own emotional states that are unrelated to the immediate conversational context, a phenomenon known as self-emotion. This study explores how such self-emotion affects the agents' behaviors in dialogue strategies and decision-making within a large language model (LLM)-driven simulation framework. In a dialogue strategy prediction experiment, we analyze the dialogue strategy choices employed by agents both with and without self-emotion, comparing them to those of humans. The results show that incorporating self-emotion helps agents exhibit more human-like dialogue strategies. In an independent experiment comparing the performance of models fine-tuned on GPT-4 generated dialogue datasets, we demonstrate that self-emotion can lead to better overall naturalness and humanness. Finally, in a virtual simulation environment where agents have discussions on multiple topics, we show that self-emotion of agents can significantly influence the decision-making process of the agents, leading to approximately a 50% change in decisions.
{"title":"Self-Emotion Blended Dialogue Generation in Social Simulation Agents","authors":"Qiang Zhang, Jason Naradowsky, Yusuke Miyao","doi":"arxiv-2408.01633","DOIUrl":"https://doi.org/arxiv-2408.01633","url":null,"abstract":"When engaging in conversations, dialogue agents in a virtual simulation\u0000environment may exhibit their own emotional states that are unrelated to the\u0000immediate conversational context, a phenomenon known as self-emotion. This\u0000study explores how such self-emotion affects the agents' behaviors in dialogue\u0000strategies and decision-making within a large language model (LLM)-driven\u0000simulation framework. In a dialogue strategy prediction experiment, we analyze\u0000the dialogue strategy choices employed by agents both with and without\u0000self-emotion, comparing them to those of humans. The results show that\u0000incorporating self-emotion helps agents exhibit more human-like dialogue\u0000strategies. In an independent experiment comparing the performance of models\u0000fine-tuned on GPT-4 generated dialogue datasets, we demonstrate that\u0000self-emotion can lead to better overall naturalness and humanness. Finally, in\u0000a virtual simulation environment where agents have discussions on multiple\u0000topics, we show that self-emotion of agents can significantly influence the\u0000decision-making process of the agents, leading to approximately a 50% change in\u0000decisions.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"119 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rong Gu, Kaige Tan, Andreas Holck Høeg-Petersen, Lei Feng, Kim Guldstrand Larsen
Combining machine learning and formal methods (FMs) provides a possible solution to overcome the safety issue of autonomous driving (AD) vehicles. However, there are gaps to be bridged before this combination becomes practically applicable and useful. In an attempt to facilitate researchers in both FMs and AD areas, this paper proposes a framework that combines two well-known tools, namely CommonRoad and UPPAAL. On the one hand, CommonRoad can be enhanced by the rigorous semantics of models in UPPAAL, which enables a systematic and comprehensive understanding of the AD system's behaviour and thus strengthens the safety of the system. On the other hand, controllers synthesised by UPPAAL can be visualised by CommonRoad in real-world road networks, which facilitates AD vehicle designers greatly adopting formal models in system design. In this framework, we provide automatic model conversions between CommonRoad and UPPAAL. Therefore, users only need to program in Python and the framework takes care of the formal models, learning, and verification in the backend. We perform experiments to demonstrate the applicability of our framework in various AD scenarios, discuss the advantages of solving motion planning in our framework, and show the scalability limit and possible solutions.
机器学习与形式化方法(FMs)的结合为解决自动驾驶汽车(AD)的安全问题提供了可能。为了给调频和自动驾驶领域的研究人员提供便利,本文提出了一个框架,该框架结合了两个众所周知的工具,即 CommonRoad 和 UPPAAL。一方面,CommonRoad 可以通过 UPPAAL 中严格的模型语义得到增强,从而实现对 AD 系统行为的系统而全面的理解,进而加强系统的安全性。另一方面,UPPAAL合成的控制器可以通过CommonRoad在真实道路网络中进行可视化,这极大地方便了自动驾驶汽车设计人员在系统设计中采用正式模型。在这个框架中,我们提供了 CommonRoad 和 UPPAAL 之间的自动模型转换。因此,用户只需用 Python 编程,该框架就能在后台处理形式化模型、学习和验证。我们通过实验证明了我们的框架在各种 AD 场景中的适用性,讨论了在我们的框架中解决运动规划的优势,并展示了可扩展性限制和可能的解决方案。
{"title":"CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles","authors":"Rong Gu, Kaige Tan, Andreas Holck Høeg-Petersen, Lei Feng, Kim Guldstrand Larsen","doi":"arxiv-2408.01093","DOIUrl":"https://doi.org/arxiv-2408.01093","url":null,"abstract":"Combining machine learning and formal methods (FMs) provides a possible\u0000solution to overcome the safety issue of autonomous driving (AD) vehicles.\u0000However, there are gaps to be bridged before this combination becomes\u0000practically applicable and useful. In an attempt to facilitate researchers in\u0000both FMs and AD areas, this paper proposes a framework that combines two\u0000well-known tools, namely CommonRoad and UPPAAL. On the one hand, CommonRoad can\u0000be enhanced by the rigorous semantics of models in UPPAAL, which enables a\u0000systematic and comprehensive understanding of the AD system's behaviour and\u0000thus strengthens the safety of the system. On the other hand, controllers\u0000synthesised by UPPAAL can be visualised by CommonRoad in real-world road\u0000networks, which facilitates AD vehicle designers greatly adopting formal models\u0000in system design. In this framework, we provide automatic model conversions\u0000between CommonRoad and UPPAAL. Therefore, users only need to program in Python\u0000and the framework takes care of the formal models, learning, and verification\u0000in the backend. We perform experiments to demonstrate the applicability of our\u0000framework in various AD scenarios, discuss the advantages of solving motion\u0000planning in our framework, and show the scalability limit and possible\u0000solutions.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih
The application of Large Language Models (LLMs) in healthcare is expanding rapidly, with one potential use case being the translation of formal medical reports into patient-legible equivalents. Currently, LLM outputs often need to be edited and evaluated by a human to ensure both factual accuracy and comprehensibility, and this is true for the above use case. We aim to minimize this step by proposing an agentic workflow with the Reflexion framework, which uses iterative self-reflection to correct outputs from an LLM. This pipeline was tested and compared to zero-shot prompting on 16 randomized radiology reports. In our multi-agent approach, reports had an accuracy rate of 94.94% when looking at verification of ICD-10 codes, compared to zero-shot prompted reports, which had an accuracy rate of 68.23%. Additionally, 81.25% of the final reflected reports required no corrections for accuracy or readability, while only 25% of zero-shot prompted reports met these criteria without needing modifications. These results indicate that our approach presents a feasible method for communicating clinical findings to patients in a quick, efficient and coherent manner whilst also retaining medical accuracy. The codebase is available for viewing at http://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation.
{"title":"Agentic LLM Workflows for Generating Patient-Friendly Medical Reports","authors":"Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih","doi":"arxiv-2408.01112","DOIUrl":"https://doi.org/arxiv-2408.01112","url":null,"abstract":"The application of Large Language Models (LLMs) in healthcare is expanding\u0000rapidly, with one potential use case being the translation of formal medical\u0000reports into patient-legible equivalents. Currently, LLM outputs often need to\u0000be edited and evaluated by a human to ensure both factual accuracy and\u0000comprehensibility, and this is true for the above use case. We aim to minimize\u0000this step by proposing an agentic workflow with the Reflexion framework, which\u0000uses iterative self-reflection to correct outputs from an LLM. This pipeline\u0000was tested and compared to zero-shot prompting on 16 randomized radiology\u0000reports. In our multi-agent approach, reports had an accuracy rate of 94.94%\u0000when looking at verification of ICD-10 codes, compared to zero-shot prompted\u0000reports, which had an accuracy rate of 68.23%. Additionally, 81.25% of the\u0000final reflected reports required no corrections for accuracy or readability,\u0000while only 25% of zero-shot prompted reports met these criteria without needing\u0000modifications. These results indicate that our approach presents a feasible\u0000method for communicating clinical findings to patients in a quick, efficient\u0000and coherent manner whilst also retaining medical accuracy. The codebase is\u0000available for viewing at\u0000http://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicole Orzan, Erman Acar, Davide Grossi, Patrick Mannion, Roxana Rădulescu
Addressing the question of how to achieve optimal decision-making under risk and uncertainty is crucial for enhancing the capabilities of artificial agents that collaborate with or support humans. In this work, we address this question in the context of Public Goods Games. We study learning in a novel multi-objective version of the Public Goods Game where agents have different risk preferences, by means of multi-objective reinforcement learning. We introduce a parametric non-linear utility function to model risk preferences at the level of individual agents, over the collective and individual reward components of the game. We study the interplay between such preference modelling and environmental uncertainty on the incentive alignment level in the game. We demonstrate how different combinations of individual preferences and environmental uncertainties sustain the emergence of cooperative patterns in non-cooperative environments (i.e., where competitive strategies are dominant), while others sustain competitive patterns in cooperative environments (i.e., where cooperative strategies are dominant).
{"title":"Learning in Multi-Objective Public Goods Games with Non-Linear Utilities","authors":"Nicole Orzan, Erman Acar, Davide Grossi, Patrick Mannion, Roxana Rădulescu","doi":"arxiv-2408.00682","DOIUrl":"https://doi.org/arxiv-2408.00682","url":null,"abstract":"Addressing the question of how to achieve optimal decision-making under risk\u0000and uncertainty is crucial for enhancing the capabilities of artificial agents\u0000that collaborate with or support humans. In this work, we address this question\u0000in the context of Public Goods Games. We study learning in a novel\u0000multi-objective version of the Public Goods Game where agents have different\u0000risk preferences, by means of multi-objective reinforcement learning. We\u0000introduce a parametric non-linear utility function to model risk preferences at\u0000the level of individual agents, over the collective and individual reward\u0000components of the game. We study the interplay between such preference\u0000modelling and environmental uncertainty on the incentive alignment level in the\u0000game. We demonstrate how different combinations of individual preferences and\u0000environmental uncertainties sustain the emergence of cooperative patterns in\u0000non-cooperative environments (i.e., where competitive strategies are dominant),\u0000while others sustain competitive patterns in cooperative environments (i.e.,\u0000where cooperative strategies are dominant).","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The well-known Condorcet's Jury theorem posits that the majority rule selects the best alternative among two available options with probability one, as the population size increases to infinity. We study this result under an asymmetric two-candidate setup, where supporters of both candidates may have different participation costs. When the decision to abstain is fully rational i.e., when the vote pivotality is the probability of a tie, the only equilibrium outcome is a trivial equilibrium where all voters except those with zero voting cost, abstain. We propose and analyze a more practical, boundedly rational model where voters overestimate their pivotality, and show that under this model, non-trivial equilibria emerge where the winning probability of both candidates is bounded away from one. We show that when the pivotality estimate strongly depends on the margin of victory, victory is not assured to any candidate in any non-trivial equilibrium, regardless of population size and in contrast to Condorcet's assertion. Whereas, under a weak dependence on margin, Condorcet's Jury theorem is restored.
{"title":"Condorcet's Jury Theorem with Abstention","authors":"Ganesh Ghalme, Reshef Meir","doi":"arxiv-2408.00317","DOIUrl":"https://doi.org/arxiv-2408.00317","url":null,"abstract":"The well-known Condorcet's Jury theorem posits that the majority rule selects\u0000the best alternative among two available options with probability one, as the\u0000population size increases to infinity. We study this result under an asymmetric\u0000two-candidate setup, where supporters of both candidates may have different\u0000participation costs. When the decision to abstain is fully rational i.e., when the vote pivotality\u0000is the probability of a tie, the only equilibrium outcome is a trivial\u0000equilibrium where all voters except those with zero voting cost, abstain. We\u0000propose and analyze a more practical, boundedly rational model where voters\u0000overestimate their pivotality, and show that under this model, non-trivial\u0000equilibria emerge where the winning probability of both candidates is bounded\u0000away from one. We show that when the pivotality estimate strongly depends on the margin of\u0000victory, victory is not assured to any candidate in any non-trivial\u0000equilibrium, regardless of population size and in contrast to Condorcet's\u0000assertion. Whereas, under a weak dependence on margin, Condorcet's Jury theorem\u0000is restored.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}