首页 > 最新文献

arXiv - CS - Multiagent Systems最新文献

英文 中文
Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning 多代理强化学习的异步学分分配框架
Pub Date : 2024-08-07 DOI: arxiv-2408.03692
Yongheng Liang, Hejun Wu, Haitao Wang, Hao Cai
Credit assignment is a core problem that distinguishes agents' marginalcontributions for optimizing cooperative strategies in multi-agentreinforcement learning (MARL). Current credit assignment methods usually assumesynchronous decision-making among agents. However, a prerequisite for manyrealistic cooperative tasks is asynchronous decision-making by agents, withoutwaiting for others to avoid disastrous consequences. To address this issue, wepropose an asynchronous credit assignment framework with a problem model calledADEX-POMDP and a multiplicative value decomposition (MVD) algorithm. ADEX-POMDPis an asynchronous problem model with extra virtual agents for a decentralizedpartially observable markov decision process. We prove that ADEX-POMDPpreserves both the task equilibrium and the algorithm convergence. MVD utilizesmultiplicative interaction to efficiently capture the interactions ofasynchronous decisions, and we theoretically demonstrate its advantages inhandling asynchronous tasks. Experimental results show that on two asynchronousdecision-making benchmarks, Overcooked and POAC, MVD not only consistentlyoutperforms state-of-the-art MARL methods but also provides theinterpretability for asynchronous cooperation.
在多代理强化学习(MARL)中,信用分配是区分代理边际贡献以优化合作策略的核心问题。目前的信用分配方法通常假定代理之间的决策是同步的。然而,许多现实合作任务的先决条件是代理间的异步决策,而无需等待他人来避免灾难性后果。为了解决这个问题,我们提出了一种异步信用分配框架,其问题模型称为 ADEX-POMDP 和乘法值分解(MVD)算法。ADEX-POMDP 是一个异步问题模型,其中包含一个分散的部分可观测马尔可夫决策过程的额外虚拟代理。我们证明 ADEX-POMDP 既能保持任务均衡,又能保持算法收敛。MVD 利用乘法交互来有效捕捉异步决策的交互,我们从理论上证明了它在处理异步任务时的优势。实验结果表明,在 Overcooked 和 POAC 这两个异步决策基准上,MVD 不仅始终优于最先进的 MARL 方法,而且还为异步合作提供了可解释性。
{"title":"Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning","authors":"Yongheng Liang, Hejun Wu, Haitao Wang, Hao Cai","doi":"arxiv-2408.03692","DOIUrl":"https://doi.org/arxiv-2408.03692","url":null,"abstract":"Credit assignment is a core problem that distinguishes agents' marginal\u0000contributions for optimizing cooperative strategies in multi-agent\u0000reinforcement learning (MARL). Current credit assignment methods usually assume\u0000synchronous decision-making among agents. However, a prerequisite for many\u0000realistic cooperative tasks is asynchronous decision-making by agents, without\u0000waiting for others to avoid disastrous consequences. To address this issue, we\u0000propose an asynchronous credit assignment framework with a problem model called\u0000ADEX-POMDP and a multiplicative value decomposition (MVD) algorithm. ADEX-POMDP\u0000is an asynchronous problem model with extra virtual agents for a decentralized\u0000partially observable markov decision process. We prove that ADEX-POMDP\u0000preserves both the task equilibrium and the algorithm convergence. MVD utilizes\u0000multiplicative interaction to efficiently capture the interactions of\u0000asynchronous decisions, and we theoretically demonstrate its advantages in\u0000handling asynchronous tasks. Experimental results show that on two asynchronous\u0000decision-making benchmarks, Overcooked and POAC, MVD not only consistently\u0000outperforms state-of-the-art MARL methods but also provides the\u0000interpretability for asynchronous cooperation.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents 结合不同信息,协调行动:异构代理的随机匪徒算法
Pub Date : 2024-08-06 DOI: arxiv-2408.03405
Lucia Gordon, Esther Rolf, Milind Tambe
Stochastic multi-agent multi-armed bandits typically assume that the rewardsfrom each arm follow a fixed distribution, regardless of which agent pulls thearm. However, in many real-world settings, rewards can depend on thesensitivity of each agent to their environment. In medical screening, diseasedetection rates can vary by test type; in preference matching, rewards candepend on user preferences; and in environmental sensing, observation qualitycan vary across sensors. Since past work does not specify how to allocateagents of heterogeneous but known sensitivity of these types in a stochasticbandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregatesinformation from diverse agents. In doing so, we address the joint challengesof (i) aggregating the rewards, which follow different distributions for eachagent-arm pair, and (ii) coordinating the assignments of agents to arms.Min-Width facilitates efficient collaboration among heterogeneous agents,exploiting the known structure in the agents' reward functions to weight theirrewards accordingly. We analyze the regret of Min-Width and conductpseudo-synthetic and fully synthetic experiments to study the performance ofdifferent levels of information sharing. Our results confirm that the gains tomodeling agent heterogeneity tend to be greater when the sensitivities are morevaried across agents, while combining more information does not always improveperformance.
随机多代理多臂强盗通常假定,无论哪个代理拉动手臂,每个手臂的奖励都遵循固定的分布。然而,在现实世界的许多环境中,奖励可能取决于每个代理对环境的敏感度。在医疗筛查中,疾病检测率可能因检测类型而异;在偏好匹配中,奖励取决于用户的偏好;在环境感知中,不同传感器的观测质量可能不同。由于过去的工作没有明确说明如何在随机带位设置中分配这些类型的异构但已知灵敏度的代理,因此我们引入了一种 UCB 类型的算法 Min-Width,它可以聚合来自不同代理的信息。Min-Width 促进了异构代理之间的高效协作,利用代理奖励函数中的已知结构对其奖励进行相应加权。我们分析了 Min-Width 的遗憾,并进行了伪合成和全合成实验,以研究不同信息共享水平的性能。我们的结果证实,当各代理的敏感性差异较大时,模拟代理异质性的收益往往更大,而结合更多信息并不总能提高性能。
{"title":"Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents","authors":"Lucia Gordon, Esther Rolf, Milind Tambe","doi":"arxiv-2408.03405","DOIUrl":"https://doi.org/arxiv-2408.03405","url":null,"abstract":"Stochastic multi-agent multi-armed bandits typically assume that the rewards\u0000from each arm follow a fixed distribution, regardless of which agent pulls the\u0000arm. However, in many real-world settings, rewards can depend on the\u0000sensitivity of each agent to their environment. In medical screening, disease\u0000detection rates can vary by test type; in preference matching, rewards can\u0000depend on user preferences; and in environmental sensing, observation quality\u0000can vary across sensors. Since past work does not specify how to allocate\u0000agents of heterogeneous but known sensitivity of these types in a stochastic\u0000bandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates\u0000information from diverse agents. In doing so, we address the joint challenges\u0000of (i) aggregating the rewards, which follow different distributions for each\u0000agent-arm pair, and (ii) coordinating the assignments of agents to arms.\u0000Min-Width facilitates efficient collaboration among heterogeneous agents,\u0000exploiting the known structure in the agents' reward functions to weight their\u0000rewards accordingly. We analyze the regret of Min-Width and conduct\u0000pseudo-synthetic and fully synthetic experiments to study the performance of\u0000different levels of information sharing. Our results confirm that the gains to\u0000modeling agent heterogeneity tend to be greater when the sensitivities are more\u0000varied across agents, while combining more information does not always improve\u0000performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Effects of Container Handling Strategies on Enhancing Freight Throughput 评估集装箱装卸策略对提高货运吞吐量的影响
Pub Date : 2024-08-05 DOI: arxiv-2408.02768
Sarita Rattanakunuprakarn, Mingzhou Jin, Mustafa Can Camur, Xueping Li
As global supply chains and freight volumes grow, the U.S. faces escalatingtransportation demands. The heavy reliance on road transport, coupled with theunderutilization of the railway system, results in congested highways,prolonged transportation times, higher costs, and increased carbon emissions.California's San Pedro Port Complex (SPPC), the nation's busiest, incurs asignificant share of these challenges. We utilize an agent-based simulation toreplicate real-world scenarios, focusing on the intricacies of interactions ina modified intermodal inbound freight system for the SPPC. This involvesrelocating container classification to potential warehouses in California,Utah, Arizona, and Nevada, rather than exclusively at port areas. Our primaryaim is to evaluate the proposed system's efficiency, considering cost andfreight throughput, while also examining the effects of workforce shortages.Computational analysis suggests that strategically installing intermodalcapabilities in select warehouses can reduce transportation costs, boostthroughput, and foster resour
随着全球供应链和货运量的增长,美国面临着不断升级的运输需求。对公路运输的严重依赖,再加上铁路系统利用不足,导致公路拥堵、运输时间延长、成本上升和碳排放增加。加利福尼亚州的圣佩德罗港口综合体(SPPC)是全美最繁忙的港口,在这些挑战中占有重要份额。我们利用基于代理的模拟来复制现实世界中的情景,重点关注 SPPC 的改良多式联运进港货运系统中错综复杂的互动。这涉及将集装箱分类转移到加利福尼亚州、犹他州、亚利桑那州和内华达州的潜在仓库,而不仅仅是港口地区。我们的主要目的是评估拟议系统的效率,同时考虑成本和货物吞吐量,并研究劳动力短缺的影响。计算分析表明,在选定的仓库中战略性地安装多式联运能力可以降低运输成本、提高吞吐量并促进资源分配。
{"title":"Assessing the Effects of Container Handling Strategies on Enhancing Freight Throughput","authors":"Sarita Rattanakunuprakarn, Mingzhou Jin, Mustafa Can Camur, Xueping Li","doi":"arxiv-2408.02768","DOIUrl":"https://doi.org/arxiv-2408.02768","url":null,"abstract":"As global supply chains and freight volumes grow, the U.S. faces escalating\u0000transportation demands. The heavy reliance on road transport, coupled with the\u0000underutilization of the railway system, results in congested highways,\u0000prolonged transportation times, higher costs, and increased carbon emissions.\u0000California's San Pedro Port Complex (SPPC), the nation's busiest, incurs a\u0000significant share of these challenges. We utilize an agent-based simulation to\u0000replicate real-world scenarios, focusing on the intricacies of interactions in\u0000a modified intermodal inbound freight system for the SPPC. This involves\u0000relocating container classification to potential warehouses in California,\u0000Utah, Arizona, and Nevada, rather than exclusively at port areas. Our primary\u0000aim is to evaluate the proposed system's efficiency, considering cost and\u0000freight throughput, while also examining the effects of workforce shortages.\u0000Computational analysis suggests that strategically installing intermodal\u0000capabilities in select warehouses can reduce transportation costs, boost\u0000throughput, and foster resour","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Value-Based Rationales Improve Social Experience: A Multiagent Simulation Study 基于价值的理由可改善社会体验:多代理模拟研究
Pub Date : 2024-08-04 DOI: arxiv-2408.02117
Sz-Ting Tzeng, Nirav Ajmeri, Munindar P. Singh
We propose Exanna, a framework to realize agents that incorporate values indecision making. An Exannaagent considers the values of itself and others whenproviding rationales for its actions and evaluating the rationales provided byothers. Via multiagent simulation, we demonstrate that considering values indecision making and producing rationales, especially for norm-deviatingactions, leads to (1) higher conflict resolution, (2) better social experience,(3) higher privacy, and (4) higher flexibility.
我们提出了 Exanna,这是一个用于实现包含价值观决策的代理的框架。Exanna 代理在为自己的行动提供理由和评估他人提供的理由时,会考虑自己和他人的价值观。通过多代理仿真,我们证明了考虑价值优柔寡断并提出理由(尤其是对于违反规范的行为)会带来:(1)更高的冲突解决能力;(2)更好的社会体验;(3)更高的隐私性;以及(4)更高的灵活性。
{"title":"Value-Based Rationales Improve Social Experience: A Multiagent Simulation Study","authors":"Sz-Ting Tzeng, Nirav Ajmeri, Munindar P. Singh","doi":"arxiv-2408.02117","DOIUrl":"https://doi.org/arxiv-2408.02117","url":null,"abstract":"We propose Exanna, a framework to realize agents that incorporate values in\u0000decision making. An Exannaagent considers the values of itself and others when\u0000providing rationales for its actions and evaluating the rationales provided by\u0000others. Via multiagent simulation, we demonstrate that considering values in\u0000decision making and producing rationales, especially for norm-deviating\u0000actions, leads to (1) higher conflict resolution, (2) better social experience,\u0000(3) higher privacy, and (4) higher flexibility.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Environment Complexity and Nash Equilibria in a Sequential Social Dilemma 连续社会困境中的环境复杂性和纳什均衡点
Pub Date : 2024-08-04 DOI: arxiv-2408.02148
Mustafa Yasir, Andrew Howes, Vasilios Mavroudis, Chris Hicks
Multi-agent reinforcement learning (MARL) methods, while effective inzero-sum or positive-sum games, often yield suboptimal outcomes in general-sumgames where cooperation is essential for achieving globally optimal outcomes.Matrix game social dilemmas, which abstract key aspects of general-suminteractions, such as cooperation, risk, and trust, fail to model the temporaland spatial dynamics characteristic of real-world scenarios. In response, ourstudy extends matrix game social dilemmas into more complex, higher-dimensionalMARL environments. We adapt a gridworld implementation of the Stag Hunt dilemmato more closely match the decision-space of a one-shot matrix game while alsointroducing variable environment complexity. Our findings indicate that ascomplexity increases, MARL agents trained in these environments converge tosuboptimal strategies, consistent with the risk-dominant Nash equilibriastrategies found in matrix games. Our work highlights the impact of environmentcomplexity on achieving optimal outcomes in higher-dimensional game-theoreticMARL environments.
多代理强化学习(MARL)方法虽然在零和博弈或正和博弈中有效,但在一般和博弈中却经常产生次优结果,而在一般和博弈中,合作对于实现全局最优结果至关重要。矩阵博弈社交困境抽象了一般和互动的关键方面,如合作、风险和信任,但却无法模拟现实世界场景中特有的时间和空间动态。为此,我们的研究将矩阵博弈社交困境扩展到了更复杂、更高维度的 MARL 环境中。我们调整了 "雄鹿狩猎 "困境的网格世界实现,使其更接近于一击矩阵博弈的决策空间,同时还引入了可变的环境复杂度。我们的研究结果表明,随着复杂度的增加,在这些环境中训练的 MARL 代理会趋同于次优策略,这与矩阵博弈中发现的风险主导型纳什均衡策略是一致的。我们的研究凸显了环境复杂度对在高维博弈理论MARL环境中实现最优结果的影响。
{"title":"Environment Complexity and Nash Equilibria in a Sequential Social Dilemma","authors":"Mustafa Yasir, Andrew Howes, Vasilios Mavroudis, Chris Hicks","doi":"arxiv-2408.02148","DOIUrl":"https://doi.org/arxiv-2408.02148","url":null,"abstract":"Multi-agent reinforcement learning (MARL) methods, while effective in\u0000zero-sum or positive-sum games, often yield suboptimal outcomes in general-sum\u0000games where cooperation is essential for achieving globally optimal outcomes.\u0000Matrix game social dilemmas, which abstract key aspects of general-sum\u0000interactions, such as cooperation, risk, and trust, fail to model the temporal\u0000and spatial dynamics characteristic of real-world scenarios. In response, our\u0000study extends matrix game social dilemmas into more complex, higher-dimensional\u0000MARL environments. We adapt a gridworld implementation of the Stag Hunt dilemma\u0000to more closely match the decision-space of a one-shot matrix game while also\u0000introducing variable environment complexity. Our findings indicate that as\u0000complexity increases, MARL agents trained in these environments converge to\u0000suboptimal strategies, consistent with the risk-dominant Nash equilibria\u0000strategies found in matrix games. Our work highlights the impact of environment\u0000complexity on achieving optimal outcomes in higher-dimensional game-theoretic\u0000MARL environments.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Emotion Blended Dialogue Generation in Social Simulation Agents 社会模拟代理中的自我情感混合对话生成
Pub Date : 2024-08-03 DOI: arxiv-2408.01633
Qiang Zhang, Jason Naradowsky, Yusuke Miyao
When engaging in conversations, dialogue agents in a virtual simulationenvironment may exhibit their own emotional states that are unrelated to theimmediate conversational context, a phenomenon known as self-emotion. Thisstudy explores how such self-emotion affects the agents' behaviors in dialoguestrategies and decision-making within a large language model (LLM)-drivensimulation framework. In a dialogue strategy prediction experiment, we analyzethe dialogue strategy choices employed by agents both with and withoutself-emotion, comparing them to those of humans. The results show thatincorporating self-emotion helps agents exhibit more human-like dialoguestrategies. In an independent experiment comparing the performance of modelsfine-tuned on GPT-4 generated dialogue datasets, we demonstrate thatself-emotion can lead to better overall naturalness and humanness. Finally, ina virtual simulation environment where agents have discussions on multipletopics, we show that self-emotion of agents can significantly influence thedecision-making process of the agents, leading to approximately a 50% change indecisions.
虚拟仿真环境中的对话代理在进行对话时,可能会表现出与即时对话语境无关的自身情绪状态,这种现象被称为自我情绪(self-emotion)。本研究探讨了在大语言模型(LLM)驱动的仿真框架中,这种自我情绪如何影响对话代理的对话策略和决策行为。在对话策略预测实验中,我们分析了有自我情感和无自我情感的代理所采用的对话策略选择,并将其与人类的对话策略进行了比较。结果表明,加入自我情感有助于代理表现出更像人类的对话策略。在一项独立的实验中,我们比较了在 GPT-4 生成的对话数据集上经过微调的模型的性能,结果表明自我情感可以带来更好的整体自然度和人性化。最后,在虚拟仿真环境中,代理就多个主题进行讨论,我们证明代理的自我情感可以显著影响代理的决策过程,导致约 50% 的决策改变。
{"title":"Self-Emotion Blended Dialogue Generation in Social Simulation Agents","authors":"Qiang Zhang, Jason Naradowsky, Yusuke Miyao","doi":"arxiv-2408.01633","DOIUrl":"https://doi.org/arxiv-2408.01633","url":null,"abstract":"When engaging in conversations, dialogue agents in a virtual simulation\u0000environment may exhibit their own emotional states that are unrelated to the\u0000immediate conversational context, a phenomenon known as self-emotion. This\u0000study explores how such self-emotion affects the agents' behaviors in dialogue\u0000strategies and decision-making within a large language model (LLM)-driven\u0000simulation framework. In a dialogue strategy prediction experiment, we analyze\u0000the dialogue strategy choices employed by agents both with and without\u0000self-emotion, comparing them to those of humans. The results show that\u0000incorporating self-emotion helps agents exhibit more human-like dialogue\u0000strategies. In an independent experiment comparing the performance of models\u0000fine-tuned on GPT-4 generated dialogue datasets, we demonstrate that\u0000self-emotion can lead to better overall naturalness and humanness. Finally, in\u0000a virtual simulation environment where agents have discussions on multiple\u0000topics, we show that self-emotion of agents can significantly influence the\u0000decision-making process of the agents, leading to approximately a 50% change in\u0000decisions.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"119 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles CommonUppRoad:自动驾驶汽车的正式建模、验证、学习和可视化框架
Pub Date : 2024-08-02 DOI: arxiv-2408.01093
Rong Gu, Kaige Tan, Andreas Holck Høeg-Petersen, Lei Feng, Kim Guldstrand Larsen
Combining machine learning and formal methods (FMs) provides a possiblesolution to overcome the safety issue of autonomous driving (AD) vehicles.However, there are gaps to be bridged before this combination becomespractically applicable and useful. In an attempt to facilitate researchers inboth FMs and AD areas, this paper proposes a framework that combines twowell-known tools, namely CommonRoad and UPPAAL. On the one hand, CommonRoad canbe enhanced by the rigorous semantics of models in UPPAAL, which enables asystematic and comprehensive understanding of the AD system's behaviour andthus strengthens the safety of the system. On the other hand, controllerssynthesised by UPPAAL can be visualised by CommonRoad in real-world roadnetworks, which facilitates AD vehicle designers greatly adopting formal modelsin system design. In this framework, we provide automatic model conversionsbetween CommonRoad and UPPAAL. Therefore, users only need to program in Pythonand the framework takes care of the formal models, learning, and verificationin the backend. We perform experiments to demonstrate the applicability of ourframework in various AD scenarios, discuss the advantages of solving motionplanning in our framework, and show the scalability limit and possiblesolutions.
机器学习与形式化方法(FMs)的结合为解决自动驾驶汽车(AD)的安全问题提供了可能。为了给调频和自动驾驶领域的研究人员提供便利,本文提出了一个框架,该框架结合了两个众所周知的工具,即 CommonRoad 和 UPPAAL。一方面,CommonRoad 可以通过 UPPAAL 中严格的模型语义得到增强,从而实现对 AD 系统行为的系统而全面的理解,进而加强系统的安全性。另一方面,UPPAAL合成的控制器可以通过CommonRoad在真实道路网络中进行可视化,这极大地方便了自动驾驶汽车设计人员在系统设计中采用正式模型。在这个框架中,我们提供了 CommonRoad 和 UPPAAL 之间的自动模型转换。因此,用户只需用 Python 编程,该框架就能在后台处理形式化模型、学习和验证。我们通过实验证明了我们的框架在各种 AD 场景中的适用性,讨论了在我们的框架中解决运动规划的优势,并展示了可扩展性限制和可能的解决方案。
{"title":"CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles","authors":"Rong Gu, Kaige Tan, Andreas Holck Høeg-Petersen, Lei Feng, Kim Guldstrand Larsen","doi":"arxiv-2408.01093","DOIUrl":"https://doi.org/arxiv-2408.01093","url":null,"abstract":"Combining machine learning and formal methods (FMs) provides a possible\u0000solution to overcome the safety issue of autonomous driving (AD) vehicles.\u0000However, there are gaps to be bridged before this combination becomes\u0000practically applicable and useful. In an attempt to facilitate researchers in\u0000both FMs and AD areas, this paper proposes a framework that combines two\u0000well-known tools, namely CommonRoad and UPPAAL. On the one hand, CommonRoad can\u0000be enhanced by the rigorous semantics of models in UPPAAL, which enables a\u0000systematic and comprehensive understanding of the AD system's behaviour and\u0000thus strengthens the safety of the system. On the other hand, controllers\u0000synthesised by UPPAAL can be visualised by CommonRoad in real-world road\u0000networks, which facilitates AD vehicle designers greatly adopting formal models\u0000in system design. In this framework, we provide automatic model conversions\u0000between CommonRoad and UPPAAL. Therefore, users only need to program in Python\u0000and the framework takes care of the formal models, learning, and verification\u0000in the backend. We perform experiments to demonstrate the applicability of our\u0000framework in various AD scenarios, discuss the advantages of solving motion\u0000planning in our framework, and show the scalability limit and possible\u0000solutions.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Agentic LLM Workflows for Generating Patient-Friendly Medical Reports 生成患者友好型医疗报告的代理 LLM 工作流程
Pub Date : 2024-08-02 DOI: arxiv-2408.01112
Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih
The application of Large Language Models (LLMs) in healthcare is expandingrapidly, with one potential use case being the translation of formal medicalreports into patient-legible equivalents. Currently, LLM outputs often need tobe edited and evaluated by a human to ensure both factual accuracy andcomprehensibility, and this is true for the above use case. We aim to minimizethis step by proposing an agentic workflow with the Reflexion framework, whichuses iterative self-reflection to correct outputs from an LLM. This pipelinewas tested and compared to zero-shot prompting on 16 randomized radiologyreports. In our multi-agent approach, reports had an accuracy rate of 94.94%when looking at verification of ICD-10 codes, compared to zero-shot promptedreports, which had an accuracy rate of 68.23%. Additionally, 81.25% of thefinal reflected reports required no corrections for accuracy or readability,while only 25% of zero-shot prompted reports met these criteria without needingmodifications. These results indicate that our approach presents a feasiblemethod for communicating clinical findings to patients in a quick, efficientand coherent manner whilst also retaining medical accuracy. The codebase isavailable for viewing athttp://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation.
大语言模型(LLM)在医疗保健领域的应用正在迅速扩大,其中一个潜在的应用案例是将正式的医疗报告翻译成病人可理解的等价物。目前,LLM 的输出结果通常需要由人工进行编辑和评估,以确保事实的准确性和可理解性,上述用例也是如此。我们的目标是利用 Reflexion 框架提出一种代理工作流程,利用迭代式自我反思来修正 LLM 的输出,从而最大限度地减少这一步骤。我们在 16 份随机放射学报告中对这一管道进行了测试,并与零镜头提示进行了比较。在我们的多代理方法中,报告在验证 ICD-10 编码时的准确率为 94.94%,而零次提示报告的准确率为 68.23%。此外,81.25% 的最终反映报告无需对准确性或可读性进行修改,而只有 25% 的零镜头提示报告符合这些标准而无需修改。这些结果表明,我们的方法是一种可行的方法,既能快速、高效、连贯地向患者传达临床结果,又能保持医疗准确性。代码库可在http://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation。
{"title":"Agentic LLM Workflows for Generating Patient-Friendly Medical Reports","authors":"Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih","doi":"arxiv-2408.01112","DOIUrl":"https://doi.org/arxiv-2408.01112","url":null,"abstract":"The application of Large Language Models (LLMs) in healthcare is expanding\u0000rapidly, with one potential use case being the translation of formal medical\u0000reports into patient-legible equivalents. Currently, LLM outputs often need to\u0000be edited and evaluated by a human to ensure both factual accuracy and\u0000comprehensibility, and this is true for the above use case. We aim to minimize\u0000this step by proposing an agentic workflow with the Reflexion framework, which\u0000uses iterative self-reflection to correct outputs from an LLM. This pipeline\u0000was tested and compared to zero-shot prompting on 16 randomized radiology\u0000reports. In our multi-agent approach, reports had an accuracy rate of 94.94%\u0000when looking at verification of ICD-10 codes, compared to zero-shot prompted\u0000reports, which had an accuracy rate of 68.23%. Additionally, 81.25% of the\u0000final reflected reports required no corrections for accuracy or readability,\u0000while only 25% of zero-shot prompted reports met these criteria without needing\u0000modifications. These results indicate that our approach presents a feasible\u0000method for communicating clinical findings to patients in a quick, efficient\u0000and coherent manner whilst also retaining medical accuracy. The codebase is\u0000available for viewing at\u0000http://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning in Multi-Objective Public Goods Games with Non-Linear Utilities 具有非线性效用的多目标公共物品博弈中的学习
Pub Date : 2024-08-01 DOI: arxiv-2408.00682
Nicole Orzan, Erman Acar, Davide Grossi, Patrick Mannion, Roxana Rădulescu
Addressing the question of how to achieve optimal decision-making under riskand uncertainty is crucial for enhancing the capabilities of artificial agentsthat collaborate with or support humans. In this work, we address this questionin the context of Public Goods Games. We study learning in a novelmulti-objective version of the Public Goods Game where agents have differentrisk preferences, by means of multi-objective reinforcement learning. Weintroduce a parametric non-linear utility function to model risk preferences atthe level of individual agents, over the collective and individual rewardcomponents of the game. We study the interplay between such preferencemodelling and environmental uncertainty on the incentive alignment level in thegame. We demonstrate how different combinations of individual preferences andenvironmental uncertainties sustain the emergence of cooperative patterns innon-cooperative environments (i.e., where competitive strategies are dominant),while others sustain competitive patterns in cooperative environments (i.e.,where cooperative strategies are dominant).
解决如何在风险和不确定性条件下实现最优决策的问题,对于提高与人类合作或为人类提供支持的人工智能的能力至关重要。在这项工作中,我们以公共物品游戏为背景来解决这个问题。我们通过多目标强化学习,研究了一种新型多目标版本的公共物品博弈中的学习,在这种博弈中,代理具有不同的风险偏好。我们引入了一个参数化非线性效用函数,以模拟个体博弈者对博弈中集体和个体奖励部分的风险偏好。我们研究了这种偏好建模与环境不确定性在博弈激励调整层面上的相互作用。我们证明了个体偏好和环境不确定性的不同组合如何在非合作环境(即竞争策略占主导地位的环境)中维持合作模式的出现,而其他组合又如何在合作环境(即合作策略占主导地位的环境)中维持竞争模式的出现。
{"title":"Learning in Multi-Objective Public Goods Games with Non-Linear Utilities","authors":"Nicole Orzan, Erman Acar, Davide Grossi, Patrick Mannion, Roxana Rădulescu","doi":"arxiv-2408.00682","DOIUrl":"https://doi.org/arxiv-2408.00682","url":null,"abstract":"Addressing the question of how to achieve optimal decision-making under risk\u0000and uncertainty is crucial for enhancing the capabilities of artificial agents\u0000that collaborate with or support humans. In this work, we address this question\u0000in the context of Public Goods Games. We study learning in a novel\u0000multi-objective version of the Public Goods Game where agents have different\u0000risk preferences, by means of multi-objective reinforcement learning. We\u0000introduce a parametric non-linear utility function to model risk preferences at\u0000the level of individual agents, over the collective and individual reward\u0000components of the game. We study the interplay between such preference\u0000modelling and environmental uncertainty on the incentive alignment level in the\u0000game. We demonstrate how different combinations of individual preferences and\u0000environmental uncertainties sustain the emergence of cooperative patterns in\u0000non-cooperative environments (i.e., where competitive strategies are dominant),\u0000while others sustain competitive patterns in cooperative environments (i.e.,\u0000where cooperative strategies are dominant).","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Condorcet's Jury Theorem with Abstention 带弃权的孔多塞陪审员定理
Pub Date : 2024-08-01 DOI: arxiv-2408.00317
Ganesh Ghalme, Reshef Meir
The well-known Condorcet's Jury theorem posits that the majority rule selectsthe best alternative among two available options with probability one, as thepopulation size increases to infinity. We study this result under an asymmetrictwo-candidate setup, where supporters of both candidates may have differentparticipation costs. When the decision to abstain is fully rational i.e., when the vote pivotalityis the probability of a tie, the only equilibrium outcome is a trivialequilibrium where all voters except those with zero voting cost, abstain. Wepropose and analyze a more practical, boundedly rational model where votersoverestimate their pivotality, and show that under this model, non-trivialequilibria emerge where the winning probability of both candidates is boundedaway from one. We show that when the pivotality estimate strongly depends on the margin ofvictory, victory is not assured to any candidate in any non-trivialequilibrium, regardless of population size and in contrast to Condorcet'sassertion. Whereas, under a weak dependence on margin, Condorcet's Jury theoremis restored.
众所周知的孔多塞陪审团定理认为,当人口数量增加到无穷大时,多数规则会以 1 的概率从两个可选方案中选出最佳方案。我们将在两个候选人不对称的情况下研究这一结果,在这种情况下,两个候选人的支持者可能有不同的参与成本。当弃权决定是完全理性的,即投票中枢是平局的概率时,唯一的均衡结果是三元均衡,即除了投票成本为零的选民外,所有选民都弃权。我们提出并分析了一个更实用的、有界理性的模型,在这个模型中,选民会低估自己的投票枢轴性,并证明在这个模型下,会出现非三元均衡,即两个候选人的获胜概率都有界于 1。我们证明,当枢轴性估计强烈依赖于胜负差时,无论人口规模如何,在任何非三重均衡中,任何候选人都无法确保获胜,这与孔多塞的主张形成了鲜明对比。而在弱依赖胜负关系的情况下,康德赛特的陪审员定理又得以恢复。
{"title":"Condorcet's Jury Theorem with Abstention","authors":"Ganesh Ghalme, Reshef Meir","doi":"arxiv-2408.00317","DOIUrl":"https://doi.org/arxiv-2408.00317","url":null,"abstract":"The well-known Condorcet's Jury theorem posits that the majority rule selects\u0000the best alternative among two available options with probability one, as the\u0000population size increases to infinity. We study this result under an asymmetric\u0000two-candidate setup, where supporters of both candidates may have different\u0000participation costs. When the decision to abstain is fully rational i.e., when the vote pivotality\u0000is the probability of a tie, the only equilibrium outcome is a trivial\u0000equilibrium where all voters except those with zero voting cost, abstain. We\u0000propose and analyze a more practical, boundedly rational model where voters\u0000overestimate their pivotality, and show that under this model, non-trivial\u0000equilibria emerge where the winning probability of both candidates is bounded\u0000away from one. We show that when the pivotality estimate strongly depends on the margin of\u0000victory, victory is not assured to any candidate in any non-trivial\u0000equilibrium, regardless of population size and in contrast to Condorcet's\u0000assertion. Whereas, under a weak dependence on margin, Condorcet's Jury theorem\u0000is restored.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Multiagent Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1