首页 > 最新文献

Adaptive Agents and Multi-Agent Systems最新文献

英文 中文
Revisiting the Distortion of Distributed Voting 重新审视分布式投票的扭曲
Pub Date : 2023-01-09 DOI: 10.48550/arXiv.2301.03279
Aris Filos-Ratsikas, Alexandros A. Voudouris
We consider a setting with agents that have preferences over alternatives and are partitioned into disjoint districts. The goal is to choose one alternative as the winner using a mechanism which first decides a representative alternative for each district based on a local election with the agents therein as participants, and then chooses one of the district representatives as the winner. Previous work showed bounds on the distortion of a specific class of deterministic plurality-based mechanisms depending on the available information about the preferences of the agents in the districts. In this paper, we first consider the whole class of deterministic mechanisms and show asymptotically tight bounds on their distortion. We then initiate the study of the distortion of randomized mechanisms in distributed voting and show bounds based on several informational assumptions, which in many cases turn out to be tight. Finally, we also experimentally compare the distortion of many different mechanisms of interest using synthetic and real-world data.
我们考虑一种设置,其中代理对备选方案有偏好,并且被划分为不相交的区域。目标是选择一个替代方案作为获胜者,使用的机制是首先根据每个地区的代理人作为参与者的地方选举决定每个地区的代表替代方案,然后选择一个地区代表作为获胜者。先前的研究显示了一类特定的基于确定性多元性的机制的扭曲程度,这取决于有关区域中代理人偏好的可用信息。本文首先考虑了一类确定性机构,并给出了其畸变的渐近紧界。然后,我们开始研究分布式投票中随机机制的失真,并基于几个信息假设显示边界,这些假设在许多情况下被证明是紧密的。最后,我们还通过实验比较了使用合成和现实世界数据的许多不同机制的畸变。
{"title":"Revisiting the Distortion of Distributed Voting","authors":"Aris Filos-Ratsikas, Alexandros A. Voudouris","doi":"10.48550/arXiv.2301.03279","DOIUrl":"https://doi.org/10.48550/arXiv.2301.03279","url":null,"abstract":"We consider a setting with agents that have preferences over alternatives and are partitioned into disjoint districts. The goal is to choose one alternative as the winner using a mechanism which first decides a representative alternative for each district based on a local election with the agents therein as participants, and then chooses one of the district representatives as the winner. Previous work showed bounds on the distortion of a specific class of deterministic plurality-based mechanisms depending on the available information about the preferences of the agents in the districts. In this paper, we first consider the whole class of deterministic mechanisms and show asymptotically tight bounds on their distortion. We then initiate the study of the distortion of randomized mechanisms in distributed voting and show bounds based on several informational assumptions, which in many cases turn out to be tight. Finally, we also experimentally compare the distortion of many different mechanisms of interest using synthetic and real-world data.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122097409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Attention-Based Recurrent Variational Autoencoder for Highway Vehicle Anomaly Detection 基于结构关注的递归变分自编码器在公路车辆异常检测中的应用
Pub Date : 2023-01-09 DOI: 10.48550/arXiv.2301.03634
Neeloy Chakraborty, Aamir Hasan, Shuijing Liu, Tianchen Ji, Weihang Liang, D. McPherson, K. Driggs-Campbell
In autonomous driving, detection of abnormal driving behaviors is essential to ensure the safety of vehicle controllers. Prior works in vehicle anomaly detection have shown that modeling interactions between agents improves detection accuracy, but certain abnormal behaviors where structured road information is paramount are poorly identified, such as wrong-way and off-road driving. We propose a novel unsupervised framework for highway anomaly detection named Structural Attention-Based Recurrent VAE (SABeR-VAE), which explicitly uses the structure of the environment to aid anomaly identification. Specifically, we use a vehicle self-attention module to learn the relations among vehicles on a road, and a separate lane-vehicle attention module to model the importance of permissible lanes to aid in trajectory prediction. Conditioned on the attention modules' outputs, a recurrent encoder-decoder architecture with a stochastic Koopman operator-propagated latent space predicts the next states of vehicles. Our model is trained end-to-end to minimize prediction loss on normal vehicle behaviors, and is deployed to detect anomalies in (ab)normal scenarios. By combining the heterogeneous vehicle and lane information, SABeR-VAE and its deterministic variant, SABeR-AE, improve abnormal AUPR by 18% and 25% respectively on the simulated MAAD highway dataset over STGAE-KDE. Furthermore, we show that the learned Koopman operator in SABeR-VAE enforces interpretable structure in the variational latent space. The results of our method indeed show that modeling environmental factors is essential to detecting a diverse set of anomalies in deployment. For code implementation, please visit https://sites.google.com/illinois.edu/saber-vae.
在自动驾驶中,异常驾驶行为的检测对于保证车辆控制器的安全至关重要。先前的车辆异常检测工作表明,智能体之间的相互作用建模提高了检测精度,但某些异常行为(结构化道路信息至关重要)的识别能力较差,例如错误的道路和越野驾驶。我们提出了一种新的无监督的高速公路异常检测框架,称为基于结构注意力的循环VAE (SABeR-VAE),它明确地使用环境的结构来帮助异常识别。具体来说,我们使用车辆自注意模块来学习道路上车辆之间的关系,并使用单独的车道-车辆注意模块来建模允许车道的重要性,以帮助进行轨迹预测。在注意模块输出的条件下,具有随机库普曼算子传播潜空间的循环编码器-解码器结构预测车辆的下一个状态。我们的模型经过端到端训练,以最大限度地减少对正常车辆行为的预测损失,并用于检测(ab)正常场景中的异常情况。通过结合异构车辆和车道信息,SABeR-VAE及其确定性变体SABeR-AE在STGAE-KDE上模拟MAAD公路数据集上的异常AUPR分别提高了18%和25%。此外,我们还证明了在SABeR-VAE中学习到的Koopman算子在变分潜在空间中加强了可解释结构。我们的方法的结果确实表明,建模环境因素对于检测部署中的各种异常是必不可少的。有关代码实现,请访问https://sites.google.com/illinois.edu/saber-vae。
{"title":"Structural Attention-Based Recurrent Variational Autoencoder for Highway Vehicle Anomaly Detection","authors":"Neeloy Chakraborty, Aamir Hasan, Shuijing Liu, Tianchen Ji, Weihang Liang, D. McPherson, K. Driggs-Campbell","doi":"10.48550/arXiv.2301.03634","DOIUrl":"https://doi.org/10.48550/arXiv.2301.03634","url":null,"abstract":"In autonomous driving, detection of abnormal driving behaviors is essential to ensure the safety of vehicle controllers. Prior works in vehicle anomaly detection have shown that modeling interactions between agents improves detection accuracy, but certain abnormal behaviors where structured road information is paramount are poorly identified, such as wrong-way and off-road driving. We propose a novel unsupervised framework for highway anomaly detection named Structural Attention-Based Recurrent VAE (SABeR-VAE), which explicitly uses the structure of the environment to aid anomaly identification. Specifically, we use a vehicle self-attention module to learn the relations among vehicles on a road, and a separate lane-vehicle attention module to model the importance of permissible lanes to aid in trajectory prediction. Conditioned on the attention modules' outputs, a recurrent encoder-decoder architecture with a stochastic Koopman operator-propagated latent space predicts the next states of vehicles. Our model is trained end-to-end to minimize prediction loss on normal vehicle behaviors, and is deployed to detect anomalies in (ab)normal scenarios. By combining the heterogeneous vehicle and lane information, SABeR-VAE and its deterministic variant, SABeR-AE, improve abnormal AUPR by 18% and 25% respectively on the simulated MAAD highway dataset over STGAE-KDE. Furthermore, we show that the learned Koopman operator in SABeR-VAE enforces interpretable structure in the variational latent space. The results of our method indeed show that modeling environmental factors is essential to detecting a diverse set of anomalies in deployment. For code implementation, please visit https://sites.google.com/illinois.edu/saber-vae.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123868337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Assigning Agents to Increase Network-Based Neighborhood Diversity 分配代理以增加基于网络的邻居多样性
Pub Date : 2023-01-07 DOI: 10.48550/arXiv.2301.02876
Zirou Qiu, A. Yuan, Chen Chen, M. Marathe, Sujith Ravi, D. Rosenkrantz, R. Stearns, A. Vullikanti
Motivated by real-world applications such as the allocation of public housing, we examine the problem of assigning a group of agents to vertices (e.g., spatial locations) of a network so that the diversity level is maximized. Specifically, agents are of two types (characterized by features), and we measure diversity by the number of agents who have at least one neighbor of a different type. This problem is known to be NP-hard, and we focus on developing approximation algorithms with provable performance guarantees. We first present a local-improvement algorithm for general graphs that provides an approximation factor of 1/2. For the special case where the sizes of agent subgroups are similar, we present a randomized approach based on semidefinite programming that yields an approximation factor better than 1/2. Further, we show that the problem can be solved efficiently when the underlying graph is treewidth-bounded and obtain a polynomial time approximation scheme (PTAS) for the problem on planar graphs. Lastly, we conduct experiments to evaluate the per-performance of the proposed algorithms on synthetic and real-world networks.
受诸如公共住房分配等现实世界应用的激励,我们研究了将一组代理分配到网络的顶点(例如空间位置)以使多样性水平最大化的问题。具体来说,代理有两种类型(以特征为特征),我们通过至少有一个不同类型邻居的代理数量来衡量多样性。这个问题被认为是np困难的,我们专注于开发具有可证明性能保证的近似算法。我们首先提出了一般图的局部改进算法,该算法提供了1/2的近似因子。对于代理子组大小相似的特殊情况,我们提出了一种基于半确定规划的随机方法,该方法产生的近似因子优于1/2。进一步,我们证明了当底层图是树宽有界时,可以有效地解决问题,并获得了平面图上问题的多项式时间逼近格式(PTAS)。最后,我们进行了实验来评估所提出算法在合成网络和现实世界网络上的性能。
{"title":"Assigning Agents to Increase Network-Based Neighborhood Diversity","authors":"Zirou Qiu, A. Yuan, Chen Chen, M. Marathe, Sujith Ravi, D. Rosenkrantz, R. Stearns, A. Vullikanti","doi":"10.48550/arXiv.2301.02876","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02876","url":null,"abstract":"Motivated by real-world applications such as the allocation of public housing, we examine the problem of assigning a group of agents to vertices (e.g., spatial locations) of a network so that the diversity level is maximized. Specifically, agents are of two types (characterized by features), and we measure diversity by the number of agents who have at least one neighbor of a different type. This problem is known to be NP-hard, and we focus on developing approximation algorithms with provable performance guarantees. We first present a local-improvement algorithm for general graphs that provides an approximation factor of 1/2. For the special case where the sizes of agent subgroups are similar, we present a randomized approach based on semidefinite programming that yields an approximation factor better than 1/2. Further, we show that the problem can be solved efficiently when the underlying graph is treewidth-bounded and obtain a polynomial time approximation scheme (PTAS) for the problem on planar graphs. Lastly, we conduct experiments to evaluate the per-performance of the proposed algorithms on synthetic and real-world networks.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114372717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Measuring a Priori Voting Power - Taking Delegations Seriously 衡量先验的投票权-认真对待代表团
Pub Date : 2023-01-06 DOI: 10.48550/arXiv.2301.02462
Rachael Colley, Théo Delemazure, Hugo Gilbert
We introduce new power indices to measure the a priori voting power of voters in liquid democracy elections where an underlying network restricts delegations. We argue that our power indices are natural extensions of the standard Penrose-Banzhaf index in simple voting games. We show that computing the criticality of a voter is #P-hard even when voting weights are polynomially-bounded in the size of the instance. However, for specific settings, such as when the underlying network is a bipartite or complete graph, recursive formulas can compute these indices for weighted voting games in pseudo-polynomial time. We highlight their theoretical properties and provide numerical results to illustrate how restricting the possible delegations can alter voters' voting power.
我们引入了新的权力指数来衡量选民在流动民主选举中的先验投票权,其中潜在的网络限制了代表团。我们认为我们的权力指数是简单投票游戏中标准Penrose-Banzhaf指数的自然延伸。我们证明,即使投票权重在实例的大小上是多项式有界的,计算投票人的临界性也是#P-hard。然而,对于特定的设置,例如当底层网络是二部图或完全图时,递归公式可以在伪多项式时间内计算加权投票游戏的这些指标。我们强调了它们的理论性质,并提供了数值结果来说明限制可能的代表团如何改变选民的投票权。
{"title":"Measuring a Priori Voting Power - Taking Delegations Seriously","authors":"Rachael Colley, Théo Delemazure, Hugo Gilbert","doi":"10.48550/arXiv.2301.02462","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02462","url":null,"abstract":"We introduce new power indices to measure the a priori voting power of voters in liquid democracy elections where an underlying network restricts delegations. We argue that our power indices are natural extensions of the standard Penrose-Banzhaf index in simple voting games. We show that computing the criticality of a voter is #P-hard even when voting weights are polynomially-bounded in the size of the instance. However, for specific settings, such as when the underlying network is a bipartite or complete graph, recursive formulas can compute these indices for weighted voting games in pseudo-polynomial time. We highlight their theoretical properties and provide numerical results to illustrate how restricting the possible delegations can alter voters' voting power.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125472881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Centralized Cooperative Exploration Policy for Continuous Control Tasks 连续控制任务的集中协同探索策略
Pub Date : 2023-01-06 DOI: 10.48550/arXiv.2301.02375
C. Li, Chen Gong, Qiang He, Xinwen Hou, Yu Liu
The deep reinforcement learning (DRL) algorithm works brilliantly on solving various complex control tasks. This phenomenal success can be partly attributed to DRL encouraging intelligent agents to sufficiently explore the environment and collect diverse experiences during the agent training process. Therefore, exploration plays a significant role in accessing an optimal policy for DRL. Despite recent works making great progress in continuous control tasks, exploration in these tasks has remained insufficiently investigated. To explicitly encourage exploration in continuous control tasks, we propose CCEP (Centralized Cooperative Exploration Policy), which utilizes underestimation and overestimation of value functions to maintain the capacity of exploration. CCEP first keeps two value functions initialized with different parameters, and generates diverse policies with multiple exploration styles from a pair of value functions. In addition, a centralized policy framework ensures that CCEP achieves message delivery between multiple policies, furthermore contributing to exploring the environment cooperatively. Extensive experimental results demonstrate that CCEP achieves higher exploration capacity. Empirical analysis shows diverse exploration styles in the learned policies by CCEP, reaping benefits in more exploration regions. And this exploration capacity of CCEP ensures it outperforms the current state-of-the-art methods across multiple continuous control tasks shown in experiments.
深度强化学习(DRL)算法在解决各种复杂的控制任务方面表现出色。这种显著的成功可以部分归因于DRL鼓励智能代理在代理训练过程中充分探索环境并收集不同的经验。因此,探索对于获取最优DRL策略起着重要的作用。尽管最近的工作在连续控制任务方面取得了很大进展,但对这些任务的探索仍然没有得到充分的研究。为了明确鼓励在连续控制任务中的探索,我们提出了CCEP(集中式合作探索策略),该策略利用低估和高估价值函数来维持探索能力。CCEP首先保持两个初始化参数不同的值函数,并从一对值函数中生成具有多种探索风格的多种策略。此外,集中式策略框架确保CCEP实现多个策略之间的消息传递,进一步有助于协作地探索环境。大量的实验结果表明,CCEP具有较高的勘探能力。实证分析表明,CCEP学习的勘探方式多种多样,在更多的勘探区域受益。CCEP的这种探索能力确保它在实验中显示的多个连续控制任务中优于当前最先进的方法。
{"title":"Centralized Cooperative Exploration Policy for Continuous Control Tasks","authors":"C. Li, Chen Gong, Qiang He, Xinwen Hou, Yu Liu","doi":"10.48550/arXiv.2301.02375","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02375","url":null,"abstract":"The deep reinforcement learning (DRL) algorithm works brilliantly on solving various complex control tasks. This phenomenal success can be partly attributed to DRL encouraging intelligent agents to sufficiently explore the environment and collect diverse experiences during the agent training process. Therefore, exploration plays a significant role in accessing an optimal policy for DRL. Despite recent works making great progress in continuous control tasks, exploration in these tasks has remained insufficiently investigated. To explicitly encourage exploration in continuous control tasks, we propose CCEP (Centralized Cooperative Exploration Policy), which utilizes underestimation and overestimation of value functions to maintain the capacity of exploration. CCEP first keeps two value functions initialized with different parameters, and generates diverse policies with multiple exploration styles from a pair of value functions. In addition, a centralized policy framework ensures that CCEP achieves message delivery between multiple policies, furthermore contributing to exploring the environment cooperatively. Extensive experimental results demonstrate that CCEP achieves higher exploration capacity. Empirical analysis shows diverse exploration styles in the learned policies by CCEP, reaping benefits in more exploration regions. And this exploration capacity of CCEP ensures it outperforms the current state-of-the-art methods across multiple continuous control tasks shown in experiments.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130960773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Agent Reinforcement Learning for Fast-Timescale Demand Response of Residential Loads 住宅负荷快速时间尺度需求响应的多智能体强化学习
Pub Date : 2023-01-06 DOI: 10.48550/arXiv.2301.02593
Vincent Mai, Philippe Maisonneuve, Tianyu Zhang, Hadi Nekoei, L. Paull, Antoine Lesage-Landry
To integrate high amounts of renewable energy resources, electrical power grids must be able to cope with high amplitude, fast timescale variations in power generation. Frequency regulation through demand response has the potential to coordinate temporally flexible loads, such as air conditioners, to counteract these variations. Existing approaches for discrete control with dynamic constraints struggle to provide satisfactory performance for fast timescale action selection with hundreds of agents. We propose a decentralized agent trained with multi-agent proximal policy optimization with localized communication. We explore two communication frameworks: hand-engineered, or learned through targeted multi-agent communication. The resulting policies perform well and robustly for frequency regulation, and scale seamlessly to arbitrary numbers of houses for constant processing times.
为了整合大量的可再生能源资源,电网必须能够应对发电的高振幅、快速时间尺度变化。通过需求响应进行频率调节有可能协调临时灵活的负载,例如空调,以抵消这些变化。现有的具有动态约束的离散控制方法难以为具有数百个智能体的快速时间尺度动作选择提供令人满意的性能。我们提出了一种分散式智能体训练方法,该方法具有局部通信的多智能体近端策略优化。我们探索了两种通信框架:手工设计的,或通过有针对性的多智能体通信学习的。由此产生的策略在频率调节方面表现良好且稳健,并且可以无缝地扩展到任意数量的房屋,以实现恒定的处理时间。
{"title":"Multi-Agent Reinforcement Learning for Fast-Timescale Demand Response of Residential Loads","authors":"Vincent Mai, Philippe Maisonneuve, Tianyu Zhang, Hadi Nekoei, L. Paull, Antoine Lesage-Landry","doi":"10.48550/arXiv.2301.02593","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02593","url":null,"abstract":"To integrate high amounts of renewable energy resources, electrical power grids must be able to cope with high amplitude, fast timescale variations in power generation. Frequency regulation through demand response has the potential to coordinate temporally flexible loads, such as air conditioners, to counteract these variations. Existing approaches for discrete control with dynamic constraints struggle to provide satisfactory performance for fast timescale action selection with hundreds of agents. We propose a decentralized agent trained with multi-agent proximal policy optimization with localized communication. We explore two communication frameworks: hand-engineered, or learned through targeted multi-agent communication. The resulting policies perform well and robustly for frequency regulation, and scale seamlessly to arbitrary numbers of houses for constant processing times.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131820613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Self-Motivated Multi-Agent Exploration 自激励多智能体探索
Pub Date : 2023-01-05 DOI: 10.48550/arXiv.2301.02083
Shaowei Zhang, Jiahang Cao, Lei Yuan, Yang Yu, De-chuan Zhan
In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration. However, agents can hardly accomplish the team task without coordination and they would be trapped in a local optimum where easy cooperation is accessed without enough individual exploration. Recent works mainly concentrate on agents' coordinated exploration, which brings about the exponentially grown exploration of the state space. To address this issue, we propose Self-Motivated Multi-Agent Exploration (SMMAE), which aims to achieve success in team tasks by adaptively finding a trade-off between self-exploration and team cooperation. In SMMAE, we train an independent exploration policy for each agent to maximize their own visited state space. Each agent learns an adjustable exploration probability based on the stability of the joint team policy. The experiments on highly cooperative tasks in StarCraft II micromanagement benchmark (SMAC) demonstrate that SMMAE can explore task-related states more efficiently, accomplish coordinated behaviours and boost the learning performance.
在协作式多智能体强化学习(CMARL)中,智能体在自我探索和团队协作之间取得平衡至关重要。然而,如果没有协调,智能体很难完成团队任务,如果没有足够的个体探索,智能体就会陷入容易合作的局部最优。近年来的研究主要集中在智能体的协同探索上,这使得对状态空间的探索呈指数级增长。为了解决这个问题,我们提出了自动机多智能体探索(SMMAE),它旨在通过自适应地在自我探索和团队合作之间找到平衡点来实现团队任务的成功。在SMMAE中,我们为每个智能体训练一个独立的探索策略,以最大化它们自己访问的状态空间。每个智能体根据联合团队策略的稳定性学习可调整的勘探概率。在《星际争霸2》微管理基准(SMAC)中进行的高协作任务实验表明,SMMAE可以更有效地探索任务相关状态,完成协调行为,提高学习性能。
{"title":"Self-Motivated Multi-Agent Exploration","authors":"Shaowei Zhang, Jiahang Cao, Lei Yuan, Yang Yu, De-chuan Zhan","doi":"10.48550/arXiv.2301.02083","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02083","url":null,"abstract":"In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration. However, agents can hardly accomplish the team task without coordination and they would be trapped in a local optimum where easy cooperation is accessed without enough individual exploration. Recent works mainly concentrate on agents' coordinated exploration, which brings about the exponentially grown exploration of the state space. To address this issue, we propose Self-Motivated Multi-Agent Exploration (SMMAE), which aims to achieve success in team tasks by adaptively finding a trade-off between self-exploration and team cooperation. In SMMAE, we train an independent exploration policy for each agent to maximize their own visited state space. Each agent learns an adjustable exploration probability based on the stability of the joint team policy. The experiments on highly cooperative tasks in StarCraft II micromanagement benchmark (SMAC) demonstrate that SMMAE can explore task-related states more efficiently, accomplish coordinated behaviours and boost the learning performance.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127213160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost Inference for Feedback Dynamic Games from Noisy Partial State Observations and Incomplete Trajectories 基于噪声部分状态观测和不完全轨迹的反馈动态博弈的代价推断
Pub Date : 2023-01-04 DOI: 10.48550/arXiv.2301.01398
Jingqi Li, Chih-Yuan Chiu, Lasse Peters, S. Sojoudi, C. Tomlin, David Fridovich-Keil
In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players' objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observation data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game.
在多智能体动态博弈中,每个智能体的纳什均衡状态轨迹由其成本函数和博弈的信息模式决定。然而,每个代理的成本和轨迹可能对其他代理不可用。先前使用部分观察来推断动态博弈成本的工作假设了一个开环信息模式。在这项工作中,我们证明了反馈纳什均衡概念更具表现力和编码更复杂的行为。我们需要开发特定的工具来推断玩家在反馈游戏中的目标。因此,我们考虑反馈信息模式下,仅使用部分状态观测和不完全轨迹数据的动态博弈成本推理问题。为此,我们首先提出了一个逆反馈博弈损失函数,其最小值产生最接近观测数据的反馈纳什均衡状态轨迹。我们刻画了损失函数的格局和可微性。考虑到获得精确梯度的难度,我们的主要贡献是一个有效的梯度近似器,它使一种新的逆反馈博弈求解器能够使用一阶优化最小化损失。通过深入的实证评估,我们证明了当观察数据反映了一组参与者在反馈纳什博弈中的行为时,我们的算法收敛可靠,并且比开环基线方法具有更好的鲁棒性和泛化性能。
{"title":"Cost Inference for Feedback Dynamic Games from Noisy Partial State Observations and Incomplete Trajectories","authors":"Jingqi Li, Chih-Yuan Chiu, Lasse Peters, S. Sojoudi, C. Tomlin, David Fridovich-Keil","doi":"10.48550/arXiv.2301.01398","DOIUrl":"https://doi.org/10.48550/arXiv.2301.01398","url":null,"abstract":"In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players' objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observation data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124129451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Infomaxformer: Maximum Entropy Transformer for Long Time-Series Forecasting Problem 长时间序列预测问题的最大熵变压器
Pub Date : 2023-01-04 DOI: 10.48550/arXiv.2301.01772
Peiwang Tang, Xianchao Zhang
The Transformer architecture yields state-of-the-art results in many tasks such as natural language processing (NLP) and computer vision (CV), since the ability to efficiently capture the precise long-range dependency coupling between input sequences. With this advanced capability, however, the quadratic time complexity and high memory usage prevents the Transformer from dealing with long time-series forecasting problem (LTFP). To address these difficulties: (i) we revisit the learned attention patterns of the vanilla self-attention, redesigned the calculation method of self-attention based the Maximum Entropy Principle. (ii) we propose a new method to sparse the self-attention, which can prevent the loss of more important self-attention scores due to random sampling.(iii) We propose Keys/Values Distilling method motivated that a large amount of feature in the original self-attention map is redundant, which can further reduce the time and spatial complexity and make it possible to input longer time-series. Finally, we propose a method that combines the encoder-decoder architecture with seasonal-trend decomposition, i.e., using the encoder-decoder architecture to capture more specific seasonal parts. A large number of experiments on several large-scale datasets show that our Infomaxformer is obviously superior to the existing methods. We expect this to open up a new solution for Transformer to solve LTFP, and exploring the ability of the Transformer architecture to capture much longer temporal dependencies.
Transformer体系结构在许多任务中产生最先进的结果,例如自然语言处理(NLP)和计算机视觉(CV),因为它能够有效地捕获输入序列之间精确的远程依赖耦合。然而,使用这种高级功能,二次元时间复杂度和高内存使用会阻止Transformer处理长时间序列预测问题(LTFP)。为了解决这些困难:(i)我们重新审视了香草自注意的学习注意模式,重新设计了基于最大熵原理的自注意计算方法。(ii)提出了一种新的自注意稀疏方法,可以防止由于随机采样而丢失更重要的自注意分数。(iii)我们提出了key /Values Distilling方法,这是基于原始自注意图中大量的特征是冗余的,可以进一步降低时间和空间复杂度,使输入更长的时间序列成为可能。最后,我们提出了一种将编码器-解码器架构与季节趋势分解相结合的方法,即使用编码器-解码器架构来捕获更具体的季节部分。在多个大规模数据集上的大量实验表明,我们的Infomaxformer明显优于现有的方法。我们期望这将为Transformer打开一个解决LTFP的新解决方案,并探索Transformer体系结构捕获更长的时间依赖性的能力。
{"title":"Infomaxformer: Maximum Entropy Transformer for Long Time-Series Forecasting Problem","authors":"Peiwang Tang, Xianchao Zhang","doi":"10.48550/arXiv.2301.01772","DOIUrl":"https://doi.org/10.48550/arXiv.2301.01772","url":null,"abstract":"The Transformer architecture yields state-of-the-art results in many tasks such as natural language processing (NLP) and computer vision (CV), since the ability to efficiently capture the precise long-range dependency coupling between input sequences. With this advanced capability, however, the quadratic time complexity and high memory usage prevents the Transformer from dealing with long time-series forecasting problem (LTFP). To address these difficulties: (i) we revisit the learned attention patterns of the vanilla self-attention, redesigned the calculation method of self-attention based the Maximum Entropy Principle. (ii) we propose a new method to sparse the self-attention, which can prevent the loss of more important self-attention scores due to random sampling.(iii) We propose Keys/Values Distilling method motivated that a large amount of feature in the original self-attention map is redundant, which can further reduce the time and spatial complexity and make it possible to input longer time-series. Finally, we propose a method that combines the encoder-decoder architecture with seasonal-trend decomposition, i.e., using the encoder-decoder architecture to capture more specific seasonal parts. A large number of experiments on several large-scale datasets show that our Infomaxformer is obviously superior to the existing methods. We expect this to open up a new solution for Transformer to solve LTFP, and exploring the ability of the Transformer architecture to capture much longer temporal dependencies.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134506143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimal Decoy Resource Allocation for Proactive Defense in Probabilistic Attack Graphs 概率攻击图中主动防御的最优诱饵资源分配
Pub Date : 2023-01-03 DOI: 10.48550/arXiv.2301.01336
Haoxiang Ma, Shuo Han, Nandi O. Leslie, C. Kamhoua, Jie Fu
This paper investigates the problem of synthesizing proactive defense systems in which the defender can allocate deceptive targets and modify the cost of actions for the attacker who aims to compromise security assets in this system. We model the interaction of the attacker and the system using a formal security model -- a probabilistic attack graph. By allocating fake targets/decoys, the defender aims to distract the attacker from compromising true targets. By increasing the cost of some attack actions, the defender aims to discourage the attacker from committing to certain policies and thereby improve the defense. To optimize the defense given limited decoy resources and operational constraints, we formulate the synthesis problem as a bi-level optimization problem, while the defender designs the system, in anticipation of the attacker's best response given that the attacker has disinformation about the system due to the use of deception. Though the general formulation with bi-level optimization is NP-hard, we show that under certain assumptions, the problem can be transformed into a constrained optimization problem. We proposed an algorithm to approximately solve this constrained optimization problem using a novel incentive-design method for projected gradient ascent. We demonstrate the effectiveness of the proposed method using extensive numerical experiments.
本文研究了综合主动防御系统的问题,在该系统中,防御者可以为攻击者分配欺骗性目标,并修改攻击者的行动成本。我们使用一个正式的安全模型——一个概率攻击图来模拟攻击者和系统之间的交互。通过分配假目标/诱饵,防御者的目的是分散攻击者对真实目标的妥协。通过增加某些攻击行为的代价,防御者的目的是阻止攻击者采取某些策略,从而提高防御能力。为了在有限的诱饵资源和操作约束下优化防御,我们将综合问题表述为一个双层优化问题,而防御方设计系统时,考虑到攻击者由于使用欺骗而对系统有虚假信息,预计攻击者的最佳反应。虽然双层优化的一般公式是np困难的,但我们证明在一定的假设下,问题可以转化为约束优化问题。我们提出了一种算法来近似解决这个约束优化问题,该算法采用了一种新的梯度上升的激励设计方法。我们通过大量的数值实验证明了所提出方法的有效性。
{"title":"Optimal Decoy Resource Allocation for Proactive Defense in Probabilistic Attack Graphs","authors":"Haoxiang Ma, Shuo Han, Nandi O. Leslie, C. Kamhoua, Jie Fu","doi":"10.48550/arXiv.2301.01336","DOIUrl":"https://doi.org/10.48550/arXiv.2301.01336","url":null,"abstract":"This paper investigates the problem of synthesizing proactive defense systems in which the defender can allocate deceptive targets and modify the cost of actions for the attacker who aims to compromise security assets in this system. We model the interaction of the attacker and the system using a formal security model -- a probabilistic attack graph. By allocating fake targets/decoys, the defender aims to distract the attacker from compromising true targets. By increasing the cost of some attack actions, the defender aims to discourage the attacker from committing to certain policies and thereby improve the defense. To optimize the defense given limited decoy resources and operational constraints, we formulate the synthesis problem as a bi-level optimization problem, while the defender designs the system, in anticipation of the attacker's best response given that the attacker has disinformation about the system due to the use of deception. Though the general formulation with bi-level optimization is NP-hard, we show that under certain assumptions, the problem can be transformed into a constrained optimization problem. We proposed an algorithm to approximately solve this constrained optimization problem using a novel incentive-design method for projected gradient ascent. We demonstrate the effectiveness of the proposed method using extensive numerical experiments.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121149764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Adaptive Agents and Multi-Agent Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1