For public health programs with limited resources, the ability to predict how behaviors change over time and in response to interventions is crucial for deciding when and to whom interventions should be allocated. Using data from a real-world maternal health program, we demonstrate how a cognitive model based on Instance-Based Learning (IBL) Theory can augment existing purely computational approaches. Our findings show that, compared to general time-series forecasters (e.g., LSTMs), IBL models, which reflect human decision-making processes, better predict the dynamics of individuals' states. Additionally, IBL provides estimates of the volatility in individuals' states and their sensitivity to interventions, which can improve the efficiency of training of other time series models.
{"title":"Improving the Prediction of Individual Engagement in Recommendations Using Cognitive Models","authors":"Roderick Seow, Yunfan Zhao, Duncan Wood, Milind Tambe, Cleotilde Gonzalez","doi":"arxiv-2408.16147","DOIUrl":"https://doi.org/arxiv-2408.16147","url":null,"abstract":"For public health programs with limited resources, the ability to predict how\u0000behaviors change over time and in response to interventions is crucial for\u0000deciding when and to whom interventions should be allocated. Using data from a\u0000real-world maternal health program, we demonstrate how a cognitive model based\u0000on Instance-Based Learning (IBL) Theory can augment existing purely\u0000computational approaches. Our findings show that, compared to general\u0000time-series forecasters (e.g., LSTMs), IBL models, which reflect human\u0000decision-making processes, better predict the dynamics of individuals' states.\u0000Additionally, IBL provides estimates of the volatility in individuals' states\u0000and their sensitivity to interventions, which can improve the efficiency of\u0000training of other time series models.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guanren Qiao, Guorui Quan, Jiawei Yu, Shujun Jia, Guiliang Liu
While modern Autonomous Vehicle (AV) systems can develop reliable driving policies under regular traffic conditions, they frequently struggle with safety-critical traffic scenarios. This difficulty primarily arises from the rarity of such scenarios in driving datasets and the complexities associated with predictive modeling among multiple vehicles. To support the testing and refinement of AV policies, simulating safety-critical traffic events is an essential challenge to be addressed. In this work, we introduce TrafficGamer, which facilitates game-theoretic traffic simulation by viewing common road driving as a multi-agent game. In evaluating the empirical performance across various real-world datasets, TrafficGamer ensures both fidelity and exploitability of the simulated scenarios, guaranteeing that they not only statically align with real-world traffic distribution but also efficiently capture equilibriums for representing safety-critical scenarios involving multiple agents. Additionally, the results demonstrate that TrafficGamer exhibits highly flexible simulation across various contexts. Specifically, we demonstrate that the generated scenarios can dynamically adapt to equilibriums of varying tightness by configuring risk-sensitive constraints during optimization. To the best of our knowledge, TrafficGamer is the first simulator capable of generating diverse traffic scenarios involving multiple agents. We have provided a demo webpage for the project at https://qiaoguanren.github.io/trafficgamer-demo/.
{"title":"TrafficGamer: Reliable and Flexible Traffic Simulation for Safety-Critical Scenarios with Game-Theoretic Oracles","authors":"Guanren Qiao, Guorui Quan, Jiawei Yu, Shujun Jia, Guiliang Liu","doi":"arxiv-2408.15538","DOIUrl":"https://doi.org/arxiv-2408.15538","url":null,"abstract":"While modern Autonomous Vehicle (AV) systems can develop reliable driving\u0000policies under regular traffic conditions, they frequently struggle with\u0000safety-critical traffic scenarios. This difficulty primarily arises from the\u0000rarity of such scenarios in driving datasets and the complexities associated\u0000with predictive modeling among multiple vehicles. To support the testing and\u0000refinement of AV policies, simulating safety-critical traffic events is an\u0000essential challenge to be addressed. In this work, we introduce TrafficGamer,\u0000which facilitates game-theoretic traffic simulation by viewing common road\u0000driving as a multi-agent game. In evaluating the empirical performance across\u0000various real-world datasets, TrafficGamer ensures both fidelity and\u0000exploitability of the simulated scenarios, guaranteeing that they not only\u0000statically align with real-world traffic distribution but also efficiently\u0000capture equilibriums for representing safety-critical scenarios involving\u0000multiple agents. Additionally, the results demonstrate that TrafficGamer\u0000exhibits highly flexible simulation across various contexts. Specifically, we\u0000demonstrate that the generated scenarios can dynamically adapt to equilibriums\u0000of varying tightness by configuring risk-sensitive constraints during\u0000optimization. To the best of our knowledge, TrafficGamer is the first simulator\u0000capable of generating diverse traffic scenarios involving multiple agents. We\u0000have provided a demo webpage for the project at\u0000https://qiaoguanren.github.io/trafficgamer-demo/.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurately identifying the underlying graph structures of multi-agent systems remains a difficult challenge. Our work introduces a novel machine learning-based solution that leverages the attention mechanism to predict future states of multi-agent systems by learning node representations. The graph structure is then inferred from the strength of the attention values. This approach is applied to both linear consensus dynamics and the non-linear dynamics of Kuramoto oscillators, resulting in implicit learning the graph by learning good agent representations. Our results demonstrate that the presented data-driven graph attention machine learning model can identify the network topology in multi-agent systems, even when the underlying dynamic model is not known, as evidenced by the F1 scores achieved in the link prediction.
准确识别多代理系统的底层图结构仍然是一项艰巨的挑战。我们的工作引入了一种新颖的基于机器学习的解决方案,它利用注意力机制,通过学习节点表征来预测多机器人系统的未来状态。这种方法既适用于线性共识动力学,也适用于仓本振荡器的非线性动力学,从而通过学习良好的代理表征来隐式学习图谱。我们的研究结果表明,提出的数据驱动图注意力机器学习模型可以识别多代理系统中的网络拓扑结构,即使在不知道底层动态模型的情况下也是如此,这一点可以从链接预测中获得的 F1 分数得到证明。
{"title":"Graph Attention Inference of Network Topology in Multi-Agent Systems","authors":"Akshay Kolli, Reza Azadeh, Kshitj Jerath","doi":"arxiv-2408.15449","DOIUrl":"https://doi.org/arxiv-2408.15449","url":null,"abstract":"Accurately identifying the underlying graph structures of multi-agent systems\u0000remains a difficult challenge. Our work introduces a novel machine\u0000learning-based solution that leverages the attention mechanism to predict\u0000future states of multi-agent systems by learning node representations. The\u0000graph structure is then inferred from the strength of the attention values.\u0000This approach is applied to both linear consensus dynamics and the non-linear\u0000dynamics of Kuramoto oscillators, resulting in implicit learning the graph by\u0000learning good agent representations. Our results demonstrate that the presented\u0000data-driven graph attention machine learning model can identify the network\u0000topology in multi-agent systems, even when the underlying dynamic model is not\u0000known, as evidenced by the F1 scores achieved in the link prediction.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we study a challenging variant of the multi-agent pathfinding problem (MAPF), when a set of agents must reach a set of goal locations, but it does not matter which agent reaches a specific goal - Anonymous MAPF (AMAPF). Current optimal and suboptimal AMAPF solvers rely on the existence of a centralized controller which is in charge of both target assignment and pathfinding. We extend the state of the art and present the first AMAPF solver capable of solving the problem at hand in a fully decentralized fashion, when each agent makes decisions individually and relies only on the local communication with the others. The core of our method is a priority and target swapping procedure tailored to produce consistent goal assignments (i.e. making sure that no two agents are heading towards the same goal). Coupled with an established rule-based path planning, we end up with a TP-SWAP, an efficient and flexible approach to solve decentralized AMAPF. On the theoretical side, we prove that TP-SWAP is complete (i.e. TP-SWAP guarantees that each target will be reached by some agent). Empirically, we evaluate TP-SWAP across a wide range of setups and compare it to both centralized and decentralized baselines. Indeed, TP-SWAP outperforms the fully-decentralized competitor and can even outperform the semi-decentralized one (i.e. the one relying on the initial consistent goal assignment) in terms of flowtime (a widespread cost objective in MAPF
{"title":"Decentralized Unlabeled Multi-agent Pathfinding Via Target And Priority Swapping (With Supplementary)","authors":"Stepan Dergachev, Konstantin Yakovlev","doi":"arxiv-2408.14948","DOIUrl":"https://doi.org/arxiv-2408.14948","url":null,"abstract":"In this paper we study a challenging variant of the multi-agent pathfinding\u0000problem (MAPF), when a set of agents must reach a set of goal locations, but it\u0000does not matter which agent reaches a specific goal - Anonymous MAPF (AMAPF).\u0000Current optimal and suboptimal AMAPF solvers rely on the existence of a\u0000centralized controller which is in charge of both target assignment and\u0000pathfinding. We extend the state of the art and present the first AMAPF solver\u0000capable of solving the problem at hand in a fully decentralized fashion, when\u0000each agent makes decisions individually and relies only on the local\u0000communication with the others. The core of our method is a priority and target\u0000swapping procedure tailored to produce consistent goal assignments (i.e. making\u0000sure that no two agents are heading towards the same goal). Coupled with an\u0000established rule-based path planning, we end up with a TP-SWAP, an efficient\u0000and flexible approach to solve decentralized AMAPF. On the theoretical side, we\u0000prove that TP-SWAP is complete (i.e. TP-SWAP guarantees that each target will\u0000be reached by some agent). Empirically, we evaluate TP-SWAP across a wide range\u0000of setups and compare it to both centralized and decentralized baselines.\u0000Indeed, TP-SWAP outperforms the fully-decentralized competitor and can even\u0000outperform the semi-decentralized one (i.e. the one relying on the initial\u0000consistent goal assignment) in terms of flowtime (a widespread cost objective\u0000in MAPF","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the first work to model the TAPF problem for intelligent warehouse to cooperative multi-agent deep RL, and the first to simultaneously address TAPF based on multi-agent deep RL. Furthermore, previous literature rarely considers the physical dynamics of agents. In this study, the physical dynamics of the agents is considered. Experimental results show that our method performs well in various task settings, which means that the target assignment is solved reasonably well and the planned path is almost shortest. Moreover, our method is more time-efficient than baselines.
{"title":"Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective","authors":"Qi Liu, Jianqi Gao, Dongjie Zhu, Xizheng Pang, Pengbin Chen, Jingxiang Guo, Yanjie Li","doi":"arxiv-2408.13750","DOIUrl":"https://doi.org/arxiv-2408.13750","url":null,"abstract":"Multi-agent target assignment and path planning (TAPF) are two key problems\u0000in intelligent warehouse. However, most literature only addresses one of these\u0000two problems separately. In this study, we propose a method to simultaneously\u0000solve target assignment and path planning from a perspective of cooperative\u0000multi-agent deep reinforcement learning (RL). To the best of our knowledge,\u0000this is the first work to model the TAPF problem for intelligent warehouse to\u0000cooperative multi-agent deep RL, and the first to simultaneously address TAPF\u0000based on multi-agent deep RL. Furthermore, previous literature rarely considers\u0000the physical dynamics of agents. In this study, the physical dynamics of the\u0000agents is considered. Experimental results show that our method performs well\u0000in various task settings, which means that the target assignment is solved\u0000reasonably well and the planned path is almost shortest. Moreover, our method\u0000is more time-efficient than baselines.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new approach for multi-agent collective construction, based on the idea of reversible ramps. Our ReRamp algorithm utilizes reversible side-ramps to generate construction plans for ramped block structures higher and larger than was previously possible using state-of-the-art planning algorithms, given the same building area. We compare the ReRamp algorithm to similar state-of-the-art algorithms on a set of benchmark instances, where we demonstrate its superior computational speed. We also establish in our experiments that the ReRamp algorithm is capable of generating plans for a single-story house, an important milestone on the road to real-world multi-agent construction applications.
{"title":"Reaching New Heights in Multi-Agent Collective Construction","authors":"Martin Rameš, Pavel Surynek","doi":"arxiv-2408.13615","DOIUrl":"https://doi.org/arxiv-2408.13615","url":null,"abstract":"We propose a new approach for multi-agent collective construction, based on\u0000the idea of reversible ramps. Our ReRamp algorithm utilizes reversible\u0000side-ramps to generate construction plans for ramped block structures higher\u0000and larger than was previously possible using state-of-the-art planning\u0000algorithms, given the same building area. We compare the ReRamp algorithm to\u0000similar state-of-the-art algorithms on a set of benchmark instances, where we\u0000demonstrate its superior computational speed. We also establish in our\u0000experiments that the ReRamp algorithm is capable of generating plans for a\u0000single-story house, an important milestone on the road to real-world\u0000multi-agent construction applications.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Sartoretti
In multi-agent reinforcement learning (MARL), achieving multi-task generalization to diverse agents and objectives presents significant challenges. Existing online MARL algorithms primarily focus on single-task performance, but their lack of multi-task generalization capabilities typically results in substantial computational waste and limited real-life applicability. Meanwhile, existing offline multi-task MARL approaches are heavily dependent on data quality, often resulting in poor performance on unseen tasks. In this paper, we introduce HyGen, a novel hybrid MARL framework, Hybrid Training for Enhanced Multi-Task Generalization, which integrates online and offline learning to ensure both multi-task generalization and training efficiency. Specifically, our framework extracts potential general skills from offline multi-task datasets. We then train policies to select the optimal skills under the centralized training and decentralized execution paradigm (CTDE). During this stage, we utilize a replay buffer that integrates both offline data and online interactions. We empirically demonstrate that our framework effectively extracts and refines general skills, yielding impressive generalization to unseen tasks. Comparative analyses on the StarCraft multi-agent challenge show that HyGen outperforms a wide range of existing solely online and offline methods.
{"title":"Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning","authors":"Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Sartoretti","doi":"arxiv-2408.13567","DOIUrl":"https://doi.org/arxiv-2408.13567","url":null,"abstract":"In multi-agent reinforcement learning (MARL), achieving multi-task\u0000generalization to diverse agents and objectives presents significant\u0000challenges. Existing online MARL algorithms primarily focus on single-task\u0000performance, but their lack of multi-task generalization capabilities typically\u0000results in substantial computational waste and limited real-life applicability.\u0000Meanwhile, existing offline multi-task MARL approaches are heavily dependent on\u0000data quality, often resulting in poor performance on unseen tasks. In this\u0000paper, we introduce HyGen, a novel hybrid MARL framework, Hybrid Training for\u0000Enhanced Multi-Task Generalization, which integrates online and offline\u0000learning to ensure both multi-task generalization and training efficiency.\u0000Specifically, our framework extracts potential general skills from offline\u0000multi-task datasets. We then train policies to select the optimal skills under\u0000the centralized training and decentralized execution paradigm (CTDE). During\u0000this stage, we utilize a replay buffer that integrates both offline data and\u0000online interactions. We empirically demonstrate that our framework effectively\u0000extracts and refines general skills, yielding impressive generalization to\u0000unseen tasks. Comparative analyses on the StarCraft multi-agent challenge show\u0000that HyGen outperforms a wide range of existing solely online and offline\u0000methods.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the interactions between multiple agents within Large Language Models (LLMs) in the context of programming and coding tasks. We utilize the AutoGen framework to facilitate communication among agents, evaluating different configurations based on the success rates from 40 random runs for each setup. The study focuses on developing a flexible automation framework for applying the Finite Element Method (FEM) to solve linear elastic problems. Our findings emphasize the importance of optimizing agent roles and clearly defining their responsibilities, rather than merely increasing the number of agents. Effective collaboration among agents is shown to be crucial for addressing general FEM challenges. This research demonstrates the potential of LLM multi-agent systems to enhance computational automation in simulation methodologies, paving the way for future advancements in engineering and artificial intelligence.
{"title":"Optimizing Collaboration of LLM based Agents for Finite Element Analysis","authors":"Chuan Tian, Yilei Zhang","doi":"arxiv-2408.13406","DOIUrl":"https://doi.org/arxiv-2408.13406","url":null,"abstract":"This paper investigates the interactions between multiple agents within Large\u0000Language Models (LLMs) in the context of programming and coding tasks. We\u0000utilize the AutoGen framework to facilitate communication among agents,\u0000evaluating different configurations based on the success rates from 40 random\u0000runs for each setup. The study focuses on developing a flexible automation\u0000framework for applying the Finite Element Method (FEM) to solve linear elastic\u0000problems. Our findings emphasize the importance of optimizing agent roles and\u0000clearly defining their responsibilities, rather than merely increasing the\u0000number of agents. Effective collaboration among agents is shown to be crucial\u0000for addressing general FEM challenges. This research demonstrates the potential\u0000of LLM multi-agent systems to enhance computational automation in simulation\u0000methodologies, paving the way for future advancements in engineering and\u0000artificial intelligence.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large language models (LLMs) have had a significant impact on diverse research domains, including medicine and healthcare. However, the potential of LLMs as copilots in medical education remains underexplored. Current AI-assisted educational tools are limited by their solitary learning approach and inability to simulate the multi-disciplinary and interactive nature of actual medical training. To address these limitations, we propose MEDCO (Medical EDucation COpilots), a novel multi-agent-based copilot system specially developed to emulate real-world medical training environments. MEDCO incorporates three primary agents: an agentic patient, an expert doctor, and a radiologist, facilitating a multi-modal and interactive learning environment. Our framework emphasizes the learning of proficient question-asking skills, multi-disciplinary collaboration, and peer discussions between students. Our experiments show that simulated virtual students who underwent training with MEDCO not only achieved substantial performance enhancements comparable to those of advanced models, but also demonstrated human-like learning behaviors and improvements, coupled with an increase in the number of learning samples. This work contributes to medical education by introducing a copilot that implements an interactive and collaborative learning approach. It also provides valuable insights into the effectiveness of AI-integrated training paradigms.
{"title":"MEDCO: Medical Education Copilots Based on A Multi-Agent Framework","authors":"Hao Wei, Jianing Qiu, Haibao Yu, Wu Yuan","doi":"arxiv-2408.12496","DOIUrl":"https://doi.org/arxiv-2408.12496","url":null,"abstract":"Large language models (LLMs) have had a significant impact on diverse\u0000research domains, including medicine and healthcare. However, the potential of\u0000LLMs as copilots in medical education remains underexplored. Current\u0000AI-assisted educational tools are limited by their solitary learning approach\u0000and inability to simulate the multi-disciplinary and interactive nature of\u0000actual medical training. To address these limitations, we propose MEDCO\u0000(Medical EDucation COpilots), a novel multi-agent-based copilot system\u0000specially developed to emulate real-world medical training environments. MEDCO\u0000incorporates three primary agents: an agentic patient, an expert doctor, and a\u0000radiologist, facilitating a multi-modal and interactive learning environment.\u0000Our framework emphasizes the learning of proficient question-asking skills,\u0000multi-disciplinary collaboration, and peer discussions between students. Our\u0000experiments show that simulated virtual students who underwent training with\u0000MEDCO not only achieved substantial performance enhancements comparable to\u0000those of advanced models, but also demonstrated human-like learning behaviors\u0000and improvements, coupled with an increase in the number of learning samples.\u0000This work contributes to medical education by introducing a copilot that\u0000implements an interactive and collaborative learning approach. It also provides\u0000valuable insights into the effectiveness of AI-integrated training paradigms.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.
{"title":"Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards","authors":"Shresth Verma, Niclas Boehmer, Lingkai Kong, Milind Tambe","doi":"arxiv-2408.12112","DOIUrl":"https://doi.org/arxiv-2408.12112","url":null,"abstract":"LLMs are increasingly used to design reward functions based on human\u0000preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards\u0000for Restless Multi-Armed Bandits, a framework for allocating limited resources\u0000among agents. In applications such as public health, this approach empowers\u0000grassroots health workers to tailor automated allocation decisions to community\u0000needs. In the presence of multiple agents, altering the reward function based\u0000on human preferences can impact subpopulations very differently, leading to\u0000complex tradeoffs and a multi-objective resource allocation problem. We are the\u0000first to present a principled method termed Social Choice Language Model for\u0000dealing with these tradeoffs for LLM-designed rewards for multiagent planners\u0000in general and restless bandits in particular. The novel part of our model is a\u0000transparent and configurable selection component, called an adjudicator,\u0000external to the LLM that controls complex tradeoffs via a user-selected social\u0000welfare function. Our experiments demonstrate that our model reliably selects\u0000more effective, aligned, and balanced reward functions compared to purely\u0000LLM-based approaches.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}