Adaptive Agents and Multi-Agent Systems最新文献

英文中文

Revenue Maximization Mechanisms for an Uninformed Mediator with Communication Abilities 具有沟通能力的不知情调解人的收益最大化机制

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-08-01 DOI: 10.5555/3545946.3599124

Zhikang Fan, Weiran Shen

Consider a market where a seller owns an item for sale and a buyer wants to purchase it. Each player has private information, known as their type. It can be costly and difficult for the players to reach an agreement through direct communication. However, with a mediator as a trusted third party, both players can communicate privately with the mediator without worrying about leaking too much or too little information. The mediator can design and commit to a multi-round communication protocol for both players, in which they update their beliefs about the other player's type. The mediator cannot force the players to trade but can influence their behaviors by sending messages to them.We study the problem of designing revenue-maximizing mechanisms for the mediator. We show that the mediator can, without loss of generality, focus on a set of direct and incentive-compatible mechanisms. We then formulate this problem as a mathematical program and provide an optimal solution in closed form under a regularity condition. Our mechanism is simple and has a threshold structure. We also discuss some interesting properties of the optimal mechanism, such as situations where the mediator may lose money.

考虑一个市场，卖家有一件待售物品，而买家想要购买它。每个玩家都有自己的私人信息，即他们的类型。对于球员来说，通过直接沟通达成协议可能是昂贵和困难的。然而，有了作为可信第三方的中介，双方玩家都可以私下与中介沟通，而不用担心泄露太多或太少的信息。中介可以为双方玩家设计并承诺一个多轮通信协议，在这个协议中，他们可以更新他们对另一方玩家类型的信念。调解人不能强迫玩家进行交易，但可以通过向他们发送信息来影响他们的行为。本文研究了中介机构收益最大化机制的设计问题。我们表明，中介可以在不失去一般性的情况下，专注于一套直接和激励兼容的机制。然后，我们将这个问题表述为一个数学程序，并在正则性条件下给出一个封闭形式的最优解。我们的机制很简单，有一个阈值结构。我们还讨论了最优机制的一些有趣的性质，例如中介可能损失金钱的情况。

{"title":"Revenue Maximization Mechanisms for an Uninformed Mediator with Communication Abilities","authors":"Zhikang Fan, Weiran Shen","doi":"10.5555/3545946.3599124","DOIUrl":"https://doi.org/10.5555/3545946.3599124","url":null,"abstract":"Consider a market where a seller owns an item for sale and a buyer wants to purchase it. Each player has private information, known as their type. It can be costly and difficult for the players to reach an agreement through direct communication. However, with a mediator as a trusted third party, both players can communicate privately with the mediator without worrying about leaking too much or too little information. The mediator can design and commit to a multi-round communication protocol for both players, in which they update their beliefs about the other player's type. The mediator cannot force the players to trade but can influence their behaviors by sending messages to them.\u0000\u0000\u0000\u0000We study the problem of designing revenue-maximizing mechanisms for the mediator. We show that the mediator can, without loss of generality, focus on a set of direct and incentive-compatible mechanisms. We then formulate this problem as a mathematical program and provide an optimal solution in closed form under a regularity condition. Our mechanism is simple and has a threshold structure. We also discuss some interesting properties of the optimal mechanism, such as situations where the mediator may lose money.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127636293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deliberation as Evidence Disclosure: A Tale of Two Protocol Types 审议作为证据披露:两种协议类型的故事

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-08-01 DOI: 10.5555/3545946.3599105

Julian Chingoma, Adrian Haret

We study a model inspired by deliberative practice, in which agents selectively disclose evidence about a set of alternatives prior to taking a final decision on them. We are interested in whether such a process, when iterated to termination, results in the objectively best alternatives being selected—thereby lending support to the idea that groups can be wise even when their members communicate with each other. We find that, under certain restrictions on the relative amounts of evidence, together with the actions available to the agents, there exist deliberation protocols in each of the two families we look at (i.e., simultaneous and sequential) that offer desirable guarantees. Simulation results further complement this picture, by showing how the distribution of evidence among the agents influences parameters of interest, such as the outcome of the protocols and the number of rounds until termination.

我们研究了一个受审议实践启发的模型，在这个模型中，代理人在对一组备选方案做出最终决定之前，有选择地披露有关这些备选方案的证据。我们感兴趣的是这样一个过程，当迭代到终止时，是否会产生客观上最好的选择——从而支持这样一种观点，即即使成员之间相互沟通，群体也可能是明智的。我们发现，在对证据的相对数量的某些限制下，加上代理人可以采取的行动，在我们所研究的两个家庭(即同时和顺序)中都存在提供理想保证的审议协议。模拟结果进一步补充了这一情况，显示了证据在代理之间的分布如何影响感兴趣的参数，例如协议的结果和直到终止的回合数。

引用次数: 0

Asynchronous Communication Aware Multi-Agent Task Allocation 异步通信感知多代理任务分配

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-08-01 DOI: 10.5555/3545946.3598927

Ben Rachmut, Sofia Amador Nelke, R. Zivan

Multi-agent task allocation in physical environments with spatial and temporal constraints, are hard problems that are relevant in many realistic applications. A task allocation algorithm based on Fisher market clearing (FMC_TA), that can be performed either centrally or distributively, has been shown to produce high quality allocations in comparison to both centralized and distributed state of the art incomplete optimization algorithms. However, the algorithm is synchronous and therefore depends on perfect communication between agents.We propose FMC_ATA, an asynchronous version of FMC_TA, which is robust to message latency and message loss. In contrast to the former version of the algorithm, FMC_ATA allows agents to identify dynamic events and initiate the generation of an updated allocation. Thus, it is more compatible for dynamic environments. We further investigate the conditions in which the distributed version of the algorithm is preferred over the centralized version. Our results indicate that the proposed asynchronous distributed algorithm produces consistent results even when the communication level is extremely poor.

在具有空间和时间约束的物理环境中，多智能体任务分配是许多现实应用中的难题。基于费雪市场清算(FMC_TA)的任务分配算法，可以集中或分散地执行，与集中式和分布式的不完全优化算法相比，已被证明可以产生高质量的分配。然而，该算法是同步的，因此依赖于代理之间的完美通信。我们提出了FMC_ATA，这是FMC_TA的异步版本，它对消息延迟和消息丢失具有鲁棒性。与前一版本的算法相比，FMC_ATA允许代理识别动态事件并发起更新分配的生成。因此，它更适合动态环境。我们进一步研究了算法的分布式版本优于集中式版本的条件。结果表明，即使在通信水平极差的情况下，所提出的异步分布式算法也能产生一致的结果。

引用次数: 0

Strategic Play By Resource-Bounded Agents in Security Games 安全博弈中资源有限主体的策略博弈

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-07-25 DOI: 10.5555/3545946.3598973

Xinming Liu, J. Halpern

Many studies have shown that humans are"predictably irrational": they do not act in a fully rational way, but their deviations from rational behavior are quite systematic. Our goal is to see the extent to which we can explain and justify these deviations as the outcome of rational but resource-bounded agents doing as well as they can, given their limitations. We focus on the well-studied ranger-poacher game, where rangers are trying to protect a number of sites from poaching. We capture the computational limitations by modeling the poacher and the ranger as probabilistic finite automata (PFAs). We show that, with sufficiently large memory, PFAs learn to play the Nash equilibrium (NE) strategies of the game and achieve the NE utility. However, if we restrict the memory, we get more"human-like"behaviors, such as probability matching (i.e., visiting sites in proportion to the probability of a rhino being there), and avoiding sites where there was a bad outcome (e.g., the poacher was caught by the ranger), that we also observed in experiments conducted on Amazon Mechanical Turk. Interestingly, we find that adding human-like behaviors such as probability matching and overweighting significant events (like getting caught) actually improves performance, showing that this seemingly irrational behavior can be quite rational.

许多研究表明，人类是“可预见的非理性”:他们不会以完全理性的方式行事，但他们对理性行为的偏离是相当系统的。我们的目标是看看我们能在多大程度上解释和证明这些偏差是理性的，但资源有限的代理人在他们的局限性下尽其所能地做的结果。我们关注的是经过充分研究的护林员和偷猎者之间的博弈，护林员正试图保护一些地点免遭偷猎。我们通过将偷猎者和护林员建模为概率有限自动机(PFAs)来捕捉计算限制。我们证明，在足够大的内存下，pfa学习博弈的纳什均衡(NE)策略并实现NE效用。然而，如果我们限制记忆，我们会得到更多“类似人类”的行为，比如概率匹配(即，按照犀牛在那里的概率的比例访问地点)，避开有不好结果的地点(例如，偷猎者被护林员抓住)，我们也在亚马逊土耳其机械上进行的实验中观察到。有趣的是，我们发现添加类似人类的行为，如概率匹配和过度重视重要事件(如被抓住)实际上可以提高性能，这表明这种看似非理性的行为可以非常理性。

{"title":"Strategic Play By Resource-Bounded Agents in Security Games","authors":"Xinming Liu, J. Halpern","doi":"10.5555/3545946.3598973","DOIUrl":"https://doi.org/10.5555/3545946.3598973","url":null,"abstract":"Many studies have shown that humans are\"predictably irrational\": they do not act in a fully rational way, but their deviations from rational behavior are quite systematic. Our goal is to see the extent to which we can explain and justify these deviations as the outcome of rational but resource-bounded agents doing as well as they can, given their limitations. We focus on the well-studied ranger-poacher game, where rangers are trying to protect a number of sites from poaching. We capture the computational limitations by modeling the poacher and the ranger as probabilistic finite automata (PFAs). We show that, with sufficiently large memory, PFAs learn to play the Nash equilibrium (NE) strategies of the game and achieve the NE utility. However, if we restrict the memory, we get more\"human-like\"behaviors, such as probability matching (i.e., visiting sites in proportion to the probability of a rhino being there), and avoiding sites where there was a bad outcome (e.g., the poacher was caught by the ranger), that we also observed in experiments conducted on Amazon Mechanical Turk. Interestingly, we find that adding human-like behaviors such as probability matching and overweighting significant events (like getting caught) actually improves performance, showing that this seemingly irrational behavior can be quite rational.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116731312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Smart, Sustainable Mobility with Game Theory and Multi-Agent Reinforcement Learning 用博弈论和多智能体强化学习增强智能、可持续的交通

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-06-26 DOI: 10.5555/3545946.3599163

Lucia Cipolina-Kun

We propose the use of game-theoretic solutions and multi- agent Reinforcement Learning in the mechanism design of smart, sustainable mobility services. In particular, we present applications to ridesharing as an example of a cost game.

我们建议在智能、可持续移动服务的机制设计中使用博弈论解决方案和多智能体强化学习。特别地，我们将拼车应用作为成本博弈的一个例子。

引用次数: 0

Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization 基于耦合值分解的离线多智能体强化学习

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-06-15 DOI: 10.5555/3545946.3599076

Xiangsen Wang, Xianyuan Zhan

Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.

近年来，离线强化学习(RL)在没有环境交互的情况下从离线数据集学习策略，受到了广泛的关注。与单智能体案例的丰富文献相比，离线多智能体强化学习仍然是一个相对未被充分探索的领域。大多数现有方法直接在多智能体设置中应用离线RL成分，而没有充分利用可分解的问题结构，导致复杂任务的性能不太令人满意。提出了一种新的基于耦合值分解的离线多智能体强化学习算法OMAC。OMAC采用了一种耦合的价值分解方案，将全局价值函数分解为局部和共享分量，并保持状态值和q值函数之间的信用分配一致性。此外，OMAC对分解的局部状态值函数进行样本内学习，隐式地在局部进行max-Q操作，同时避免了因评估分布外行为而导致的分布移位。基于对离线多智能体《星际争霸II》微管理任务的综合评估，我们证明了OMAC优于最先进的离线多智能体强化学习方法。

{"title":"Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization","authors":"Xiangsen Wang, Xianyuan Zhan","doi":"10.5555/3545946.3599076","DOIUrl":"https://doi.org/10.5555/3545946.3599076","url":null,"abstract":"Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126891576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learnability with PAC Semantics for Multi-agent Beliefs 多智能体信念PAC语义的可学习性

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-06-08 DOI: 10.5555/3545946.3599016

I. Mocanu, Vaishak Belle, Brendan Juba

The tension between deduction and induction is perhaps the most fundamental issue in areas such as philosophy, cognition, and artificial intelligence. In an influential paper, Valiant recognized that the challenge of learning should be integrated with deduction. In particular, he proposed a semantics to capture the quality possessed by the output of probably approximately correct (PAC) learning algorithms when formulated in a logic. Although weaker than classical entailment, it allows for a powerful model-theoretic framework for answering queries. In this paper, we provide a new technical foundation to demonstrate PAC learning with multi-agent epistemic logics. To circumvent the negative results in the literature on the difficulty of robust learning with the PAC semantics, we consider so-called implicit learning where we are able to incorporate observations to the background theory in service of deciding the entailment of an epistemic query. We prove correctness of the learning procedure and discuss results on the sample complexity, that is how many observations we will need to provably assert that the query is entailed given a user-specified error bound. Finally, we investigate under what circumstances this algorithm can be made efficient. On the last point, given that reasoning in epistemic logics especially in multi-agent epistemic logics is PSPACE-complete, it might seem like there is no hope for this problem. We leverage some recent results on the so-called Representation Theorem explored for single-agent and multi-agent epistemic logics with the only knowing operator to reduce modal reasoning to propositional reasoning.

演绎和归纳法之间的紧张关系可能是哲学、认知和人工智能等领域最根本的问题。在一篇有影响力的论文中，Valiant认识到学习的挑战应该与推理相结合。特别是，他提出了一种语义，用于捕获在逻辑中表述时可能近似正确(PAC)学习算法的输出所具有的质量。虽然比经典蕴涵弱，但它允许一个强大的模型理论框架来回答查询。本文为用多智能体认知逻辑来演示PAC学习提供了一个新的技术基础。为了规避文献中关于使用PAC语义进行稳健学习困难的负面结果，我们考虑了所谓的内隐学习，在这种学习中，我们能够将观察结果纳入背景理论，以决定认知查询的蕴涵。我们证明了学习过程的正确性，并讨论了样本复杂性的结果，即在给定用户指定的错误界限的情况下，我们需要多少个观察值来证明查询是必需的。最后，我们研究了在什么情况下该算法是有效的。最后一点，考虑到认知逻辑中的推理，特别是多智能体认知逻辑中的推理是pspace完全的，这个问题似乎没有希望。我们利用所谓的表征定理的一些最新成果，探索了单智能体和多智能体的认知逻辑，并使用唯一知道算子将模态推理简化为命题推理。

{"title":"Learnability with PAC Semantics for Multi-agent Beliefs","authors":"I. Mocanu, Vaishak Belle, Brendan Juba","doi":"10.5555/3545946.3599016","DOIUrl":"https://doi.org/10.5555/3545946.3599016","url":null,"abstract":"\u0000 The tension between deduction and induction is perhaps the most fundamental issue in areas such as philosophy, cognition, and artificial intelligence. In an influential paper, Valiant recognized that the challenge of learning should be integrated with deduction. In particular, he proposed a semantics to capture the quality possessed by the output of probably approximately correct (PAC) learning algorithms when formulated in a logic. Although weaker than classical entailment, it allows for a powerful model-theoretic framework for answering queries. In this paper, we provide a new technical foundation to demonstrate PAC learning with multi-agent epistemic logics. To circumvent the negative results in the literature on the difficulty of robust learning with the PAC semantics, we consider so-called implicit learning where we are able to incorporate observations to the background theory in service of deciding the entailment of an epistemic query. We prove correctness of the learning procedure and discuss results on the sample complexity, that is how many observations we will need to provably assert that the query is entailed given a user-specified error bound. Finally, we investigate under what circumstances this algorithm can be made efficient. On the last point, given that reasoning in epistemic logics especially in multi-agent epistemic logics is PSPACE-complete, it might seem like there is no hope for this problem. We leverage some recent results on the so-called Representation Theorem explored for single-agent and multi-agent epistemic logics with the only knowing operator to reduce modal reasoning to propositional reasoning.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128884704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling Dynamic Environments with Scene Graph Memory 基于场景图记忆的动态环境建模

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-05-27 DOI: 10.5555/3545946.3599100

Andrey Kurenkov, Michael Lingelbach, Tanmay Agarwal, Chengshu Li, Emily Jin, Ruohan Zhang, Li Fei-Fei, Jiajun Wu, S. Savarese, Roberto Martín-Martín

Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep. This partial observability poses a challenge to existing link prediction approaches, which we address. We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations, as well as a neural net architecture called a Node Edge Predictor (NEP) that extracts information from the SGM to search efficiently. We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy. The codebase and more can be found at https://www.scenegraphmemory.com.

在家庭等大型环境中搜索物体的嵌入式AI代理通常需要根据部分信息预测物体位置，从而做出有效的决策。我们提出了一种新的链接预测问题:部分可观察动态图上的链接预测。我们的图是一个场景的表示，其中房间和物体是节点，它们的关系编码在边缘中;在每个时间步，代理只知道变化图的一部分。这种部分可观察性对现有的链路预测方法提出了挑战，我们解决了这个问题。我们提出了一种新的状态表示——场景图记忆(SGM)——它捕获了智能体积累的观察集，以及一种称为节点边缘预测器(NEP)的神经网络架构，该架构从SGM中提取信息以进行有效搜索。我们在动态房屋模拟器(Dynamic House Simulator)中评估了我们的方法，这是一个新的基准，可以根据家庭中常见的语义模式创建不同的动态图形，并表明NEP可以被训练来预测具有不同物体运动动态的各种环境中的物体位置，在新场景适应性和整体准确性方面都优于基线。代码库和更多内容可以在https://www.scenegraphmemory.com上找到。

{"title":"Modeling Dynamic Environments with Scene Graph Memory","authors":"Andrey Kurenkov, Michael Lingelbach, Tanmay Agarwal, Chengshu Li, Emily Jin, Ruohan Zhang, Li Fei-Fei, Jiajun Wu, S. Savarese, Roberto Martín-Martín","doi":"10.5555/3545946.3599100","DOIUrl":"https://doi.org/10.5555/3545946.3599100","url":null,"abstract":"Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep. This partial observability poses a challenge to existing link prediction approaches, which we address. We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations, as well as a neural net architecture called a Node Edge Predictor (NEP) that extracts information from the SGM to search efficiently. We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy. The codebase and more can be found at https://www.scenegraphmemory.com.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130570962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Reward-Machine-Guided, Self-Paced Reinforcement Learning 奖励——机器引导、自定进度的强化学习

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-05-25 DOI: 10.5555/3545946.3598964

Cevahir Köprülü, U. Topcu

Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in long-horizon planning tasks that involve temporally extended behaviors. We hypothesize that taking advantage of prior knowledge about the underlying task structure can improve the effectiveness of self-paced RL. We develop a self-paced RL algorithm guided by reward machines, i.e., a type of finite-state machine that encodes the underlying task structure. The algorithm integrates reward machines in 1) the update of the policy and value functions obtained by any RL algorithm of choice, and 2) the update of the automated curriculum that generates context distributions. Our empirical results evidence that the proposed algorithm achieves optimal behavior reliably even in cases in which existing baselines cannot make any meaningful progress. It also decreases the curriculum length and reduces the variance in the curriculum generation process by up to one-fourth and four orders of magnitude, respectively.

自进度强化学习(RL)旨在通过自动创建上下文概率分布序列(即课程)来提高学习的数据效率。然而，现有的自定节奏强化学习技术在涉及时间扩展行为的长期规划任务中失败。我们假设利用对潜在任务结构的先验知识可以提高自定节奏强化学习的有效性。我们开发了一种由奖励机器引导的自定节奏RL算法，即一种编码底层任务结构的有限状态机。该算法将奖励机器集成在以下方面:1)任何RL算法所获得的策略和价值函数的更新，以及2)生成上下文分布的自动化课程的更新。我们的经验结果表明，即使在现有基线无法取得任何有意义的进展的情况下，所提出的算法也能可靠地实现最优行为。它还减少了课程长度，并减少了课程生成过程中的差异，分别减少了四分之一和四个数量级。

引用次数: 2

Online Influence Maximization under Decreasing Cascade Model 递减级联模型下的在线影响最大化

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-05-19 DOI: 10.5555/3545946.3598895

Fang-yuan Kong, Jize Xie, Baoxiang Wang, Tao Yao, Shuai Li

We study online influence maximization (OIM) under a new model of decreasing cascade (DC). This model is a generalization of the independent cascade (IC) model by considering the common phenomenon of market saturation. In DC, the chance of an influence attempt being successful reduces with previous failures. The effect is neglected by previous OIM works under IC and linear threshold models. We propose the DC-UCB algorithm to solve this problem, which achieves a regret bound of the same order as the state-of-the-art works on the IC model. Extensive experiments on both synthetic and real datasets show the effectiveness of our algorithm.

研究了一种新的递减级联模型下的在线影响最大化问题。该模型是独立级联(IC)模型的推广，考虑了市场饱和的普遍现象。在DC中，影响尝试成功的机会随着先前的失败而减少。在IC和线性阈值模型下，以往的OIM工作忽略了这种影响。我们提出了DC-UCB算法来解决这个问题，该算法实现了与IC模型上最先进的工作相同阶的遗憾界。在合成数据集和真实数据集上的大量实验表明了该算法的有效性。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Adaptive Agents and Multi-Agent Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀