Adaptive Agents and Multi-Agent Systems最新文献_第7页

Random Majority Opinion Diffusion: Stabilization Time, Absorbing States, and Influential Nodes 随机多数意见扩散:稳定时间、吸收状态和影响节点

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-14 DOI: 10.48550/arXiv.2302.06760

Ahad N. Zehmakan

Consider a graph G with n nodes and m edges, which represents a social network, and assume that initially each node is blue or white. In each round, all nodes simultaneously update their color to the most frequent color in their neighborhood. This is called the Majority Model (MM) if a node keeps its color in case of a tie and the Random Majority Model (RMM) if it chooses blue with probability 1/2 and white otherwise. We prove that there are graphs for which RMM needs exponentially many rounds to reach a stable configuration in expectation, and such a configuration can have exponentially many states (i.e., colorings). This is in contrast to MM, which is known to always reach a stable configuration with one or two states in $O(m)$ rounds. For the special case of a cycle graph C_n, we prove the stronger and tight bounds of $lceil n/2rceil-1$ and $O(n^2)$ in MM and RMM, respectively. Furthermore, we show that the number of stable colorings in MM on C_n is equal to $Theta(Phi^n)$, where $Phi = (1+sqrt{5})/2$ is the golden ratio, while it is equal to 2 for RMM. We also study the minimum size of a winning set, which is a set of nodes whose agreement on a color in the initial coloring enforces the process to end in a coloring where all nodes share that color. We present tight bounds on the minimum size of a winning set for both MM and RMM. Furthermore, we analyze our models for a random initial coloring, where each node is colored blue independently with some probability $p$ and white otherwise. Using some martingale analysis and counting arguments, we prove that the expected final number of blue nodes is respectively equal to $(2p^2-p^3)n/(1-p+p^2)$ and pn in MM and RMM on a cycle graph C_n. Finally, we conduct some experiments which complement our theoretical findings and also lead to the proposal of some intriguing open problems and conjectures to be tackled in future work.

考虑一个有n个节点和m条边的图G，它代表一个社交网络，并假设最初每个节点都是蓝色或白色的。在每一轮中，所有节点同时更新自己的颜色，使其成为邻近区域中最常见的颜色。如果一个节点在平局的情况下保持其颜色，则称为多数模型(MM);如果它以1/2的概率选择蓝色，否则选择白色，则称为随机多数模型(RMM)。我们证明了存在RMM需要指数次轮数才能达到期望中的稳定配置的图，并且这种配置可以具有指数次的状态(即着色)。这与MM相反，众所周知，MM总是在$O(m)$轮中达到一个或两个状态的稳定配置。对于循环图C＿n的特殊情况，分别证明了在MM和RMM中$lceil n/2rceil-1$和$O(n^2)$的强边界和紧边界。进一步，我们证明了在C＿n上，MM中稳定着色的数量等于$Theta(Phi^n)$，其中$Phi = (1+sqrt{5})/2$是黄金比例，而RMM的黄金比例等于2。我们还研究了获胜集的最小大小，获胜集是一组节点，它们在初始着色时对一种颜色达成一致，从而强制整个过程以所有节点共享该颜色的着色结束。对于MM和RMM，我们给出了获胜集最小大小的严格界限。此外，我们分析了随机初始着色的模型，其中每个节点以一定概率分别为蓝色$p$和白色。利用一些鞅分析和计数参数，证明了循环图C＿n上的MM和RMM中的预期最终蓝节点数分别等于$(2p^2-p^3)n/(1-p+p^2)$和pn。最后，我们进行了一些实验来补充我们的理论发现，并提出了一些有趣的开放性问题和猜想，以便在未来的工作中解决。

{"title":"Random Majority Opinion Diffusion: Stabilization Time, Absorbing States, and Influential Nodes","authors":"Ahad N. Zehmakan","doi":"10.48550/arXiv.2302.06760","DOIUrl":"https://doi.org/10.48550/arXiv.2302.06760","url":null,"abstract":"Consider a graph G with n nodes and m edges, which represents a social network, and assume that initially each node is blue or white. In each round, all nodes simultaneously update their color to the most frequent color in their neighborhood. This is called the Majority Model (MM) if a node keeps its color in case of a tie and the Random Majority Model (RMM) if it chooses blue with probability 1/2 and white otherwise. We prove that there are graphs for which RMM needs exponentially many rounds to reach a stable configuration in expectation, and such a configuration can have exponentially many states (i.e., colorings). This is in contrast to MM, which is known to always reach a stable configuration with one or two states in $O(m)$ rounds. For the special case of a cycle graph C_n, we prove the stronger and tight bounds of $lceil n/2rceil-1$ and $O(n^2)$ in MM and RMM, respectively. Furthermore, we show that the number of stable colorings in MM on C_n is equal to $Theta(Phi^n)$, where $Phi = (1+sqrt{5})/2$ is the golden ratio, while it is equal to 2 for RMM. We also study the minimum size of a winning set, which is a set of nodes whose agreement on a color in the initial coloring enforces the process to end in a coloring where all nodes share that color. We present tight bounds on the minimum size of a winning set for both MM and RMM. Furthermore, we analyze our models for a random initial coloring, where each node is colored blue independently with some probability $p$ and white otherwise. Using some martingale analysis and counting arguments, we prove that the expected final number of blue nodes is respectively equal to $(2p^2-p^3)n/(1-p+p^2)$ and pn in MM and RMM on a cycle graph C_n. Finally, we conduct some experiments which complement our theoretical findings and also lead to the proposal of some intriguing open problems and conjectures to be tackled in future work.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130587235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improving Quantal Cognitive Hierarchy Model Through Iterative Population Learning 通过迭代群体学习改进定量认知层次模型

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-13 DOI: 10.48550/arXiv.2302.06033

Yuhong Xu, Shih-Fen Cheng, Xinyu Chen

In domains where agents interact strategically, game theory is applied widely to predict how agents would behave. However, game-theoretic predictions are based on the assumption that agents are fully rational and believe in equilibrium plays, which unfortunately are mostly not true when human decision makers are involved. To address this limitation, a number of behavioral game-theoretic models are defined to account for the limited rationality of human decision makers. The"quantal cognitive hierarchy"(QCH) model, which is one of the more recent models, is demonstrated to be the state-of-art model for predicting human behaviors in normal-form games. The QCH model assumes that agents in games can be both non-strategic (level-0) and strategic (level-$k$). For level-0 agents, they choose their strategies irrespective of other agents. For level-$k$ agents, they assume that other agents would be behaving at levels less than $k$ and best respond against them. However, an important assumption of the QCH model is that the distribution of agents' levels follows a Poisson distribution. In this paper, we relax this assumption and design a learning-based method at the population level to iteratively estimate the empirical distribution of agents' reasoning levels. By using a real-world dataset from the Swedish lowest unique positive integer game, we demonstrate how our refined QCH model and the iterative solution-seeking process can be used in providing a more accurate behavioral model for agents. This leads to better performance in fitting the real data and allows us to track an agent's progress in learning to play strategically over multiple rounds.

在智能体策略性互动的领域，博弈论被广泛应用于预测智能体的行为。然而，博弈论的预测是基于agent是完全理性的并且相信均衡博弈的假设，不幸的是，当人类决策者参与其中时，这种假设大多是不正确的。为了解决这一限制，定义了一些行为博弈论模型来解释人类决策者的有限理性。“量子认知层次”(QCH)模型是较新的模型之一，被证明是预测人类在正常形式游戏中的行为的最先进模型。QCH模型假设游戏中的代理可以是非战略性的(0级)，也可以是战略性的(- k级)。对于0级代理，他们选择自己的策略，而不考虑其他代理。对于k级代理，他们假设其他代理将在低于k级的级别上表现，并对它们做出最佳反应。然而，QCH模型的一个重要假设是代理水平的分布遵循泊松分布。在本文中，我们放宽了这一假设，并设计了一种在总体水平上基于学习的方法来迭代估计智能体推理水平的经验分布。通过使用来自瑞典最低唯一正整数博弈的真实数据集，我们展示了如何使用我们的改进QCH模型和迭代求解过程来为代理提供更准确的行为模型。这将在拟合真实数据方面带来更好的表现，并允许我们跟踪智能体在学习多轮策略游戏中的进展。

{"title":"Improving Quantal Cognitive Hierarchy Model Through Iterative Population Learning","authors":"Yuhong Xu, Shih-Fen Cheng, Xinyu Chen","doi":"10.48550/arXiv.2302.06033","DOIUrl":"https://doi.org/10.48550/arXiv.2302.06033","url":null,"abstract":"In domains where agents interact strategically, game theory is applied widely to predict how agents would behave. However, game-theoretic predictions are based on the assumption that agents are fully rational and believe in equilibrium plays, which unfortunately are mostly not true when human decision makers are involved. To address this limitation, a number of behavioral game-theoretic models are defined to account for the limited rationality of human decision makers. The\"quantal cognitive hierarchy\"(QCH) model, which is one of the more recent models, is demonstrated to be the state-of-art model for predicting human behaviors in normal-form games. The QCH model assumes that agents in games can be both non-strategic (level-0) and strategic (level-$k$). For level-0 agents, they choose their strategies irrespective of other agents. For level-$k$ agents, they assume that other agents would be behaving at levels less than $k$ and best respond against them. However, an important assumption of the QCH model is that the distribution of agents' levels follows a Poisson distribution. In this paper, we relax this assumption and design a learning-based method at the population level to iteratively estimate the empirical distribution of agents' reasoning levels. By using a real-world dataset from the Swedish lowest unique positive integer game, we demonstrate how our refined QCH model and the iterative solution-seeking process can be used in providing a more accurate behavioral model for agents. This leads to better performance in fitting the real data and allows us to track an agent's progress in learning to play strategically over multiple rounds.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134440286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning 深度强化学习中基于动态稀疏训练的自动噪声滤波

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-13 DOI: 10.48550/arXiv.2302.06548

Bram Grooten, Ghada Sokar, Shibhansh Dohare, Elena Mocanu, Matthew E. Taylor, Mykola Pechenizkiy, D. Mocanu

Tomorrow's robots will need to distinguish useful information from noise when performing different tasks. A household robot for instance may continuously receive a plethora of information about the home, but needs to focus on just a small subset to successfully execute its current chore. Filtering distracting inputs that contain irrelevant data has received little attention in the reinforcement learning literature. To start resolving this, we formulate a problem setting in reinforcement learning called the $textit{extremely noisy environment}$ (ENE), where up to $99%$ of the input features are pure noise. Agents need to detect which features provide task-relevant information about the state of the environment. Consequently, we propose a new method termed $textit{Automatic Noise Filtering}$ (ANF), which uses the principles of dynamic sparse training in synergy with various deep reinforcement learning algorithms. The sparse input layer learns to focus its connectivity on task-relevant features, such that ANF-SAC and ANF-TD3 outperform standard SAC and TD3 by a large margin, while using up to $95%$ fewer weights. Furthermore, we devise a transfer learning setting for ENEs, by permuting all features of the environment after 1M timesteps to simulate the fact that other information sources can become relevant as the world evolves. Again, ANF surpasses the baselines in final performance and sample complexity. Our code is available at https://github.com/bramgrooten/automatic-noise-filtering

未来的机器人在执行不同任务时需要从噪音中区分有用的信息。例如，一个家用机器人可能会不断地接收关于家庭的大量信息，但只需要关注一小部分就能成功地完成当前的家务。过滤包含不相关数据的分散输入在强化学习文献中很少受到关注。为了开始解决这个问题，我们在强化学习中制定了一个名为$textit{extremely noisy environment}$ (ENE)的问题设置，其中高达$99%$的输入特征是纯噪声。代理需要检测哪些特性提供了关于环境状态的任务相关信息。因此，我们提出了一种称为$textit{Automatic Noise Filtering}$ (ANF)的新方法，该方法将动态稀疏训练原理与各种深度强化学习算法协同使用。稀疏输入层学习将其连通性集中在与任务相关的特征上，因此，ANF-SAC和ANF-TD3的性能大大优于标准SAC和TD3，同时使用的权重最多减少$95%$。此外，我们为ENEs设计了一个迁移学习设置，通过在1M时间步后排列环境的所有特征来模拟其他信息源随着世界的发展而变得相关的事实。同样，ANF在最终性能和样本复杂性方面超过了基线。我们的代码可在https://github.com/bramgrooten/automatic-noise-filtering上获得

{"title":"Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning","authors":"Bram Grooten, Ghada Sokar, Shibhansh Dohare, Elena Mocanu, Matthew E. Taylor, Mykola Pechenizkiy, D. Mocanu","doi":"10.48550/arXiv.2302.06548","DOIUrl":"https://doi.org/10.48550/arXiv.2302.06548","url":null,"abstract":"Tomorrow's robots will need to distinguish useful information from noise when performing different tasks. A household robot for instance may continuously receive a plethora of information about the home, but needs to focus on just a small subset to successfully execute its current chore. Filtering distracting inputs that contain irrelevant data has received little attention in the reinforcement learning literature. To start resolving this, we formulate a problem setting in reinforcement learning called the $textit{extremely noisy environment}$ (ENE), where up to $99%$ of the input features are pure noise. Agents need to detect which features provide task-relevant information about the state of the environment. Consequently, we propose a new method termed $textit{Automatic Noise Filtering}$ (ANF), which uses the principles of dynamic sparse training in synergy with various deep reinforcement learning algorithms. The sparse input layer learns to focus its connectivity on task-relevant features, such that ANF-SAC and ANF-TD3 outperform standard SAC and TD3 by a large margin, while using up to $95%$ fewer weights. Furthermore, we devise a transfer learning setting for ENEs, by permuting all features of the environment after 1M timesteps to simulate the fact that other information sources can become relevant as the world evolves. Again, ANF surpasses the baselines in final performance and sample complexity. Our code is available at https://github.com/bramgrooten/automatic-noise-filtering","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126173370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Regularization for Strategy Exploration in Empirical Game-Theoretic Analysis 经验博弈论分析中策略探索的正则化

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-09 DOI: 10.48550/arXiv.2302.04928

Yongzhao Wang, Michael P. Wellman

In iterative approaches to empirical game-theoretic analysis (EGTA), the strategy space is expanded incrementally based on analysis of intermediate game models. A common approach to strategy exploration, represented by the double oracle algorithm, is to add strategies that best-respond to a current equilibrium. This approach may suffer from overfitting and other limitations, leading the developers of the policy-space response oracle (PSRO) framework for iterative EGTA to generalize the target of best response, employing what they term meta-strategy solvers (MSSs). Noting that many MSSs can be viewed as perturbed or approximated versions of Nash equilibrium, we adopt an explicit regularization perspective to the specification and analysis of MSSs. We propose a novel MSS called regularized replicator dynamics (RRD), which simply truncates the process based on a regret criterion. We show that RRD is more adaptive than existing MSSs and outperforms them in various games. We extend our study to three-player games, for which the payoff matrix is cubic in the number of strategies and so exhaustively evaluating profiles may not be feasible. We propose a profile search method that can identify solutions from incomplete models, and combine this with iterative model construction using a regularized MSS. Finally, and most importantly, we reveal that the regret of best response targets has a tremendous influence on the performance of strategy exploration through experiments, which provides an explanation for the effectiveness of regularization in PSRO.

在经验博弈论分析(EGTA)的迭代方法中，策略空间是在分析中间博弈模型的基础上逐步扩展的。一种常见的策略探索方法，以双神谕算法为代表，是添加最能响应当前均衡的策略。这种方法可能遭受过拟合和其他限制，导致迭代EGTA的策略空间响应oracle (PSRO)框架的开发人员使用他们所谓的元策略求解器(mss)来概括最佳响应的目标。注意到许多mss可以被视为纳什均衡的摄动或近似版本，我们采用明确的正则化观点来规范和分析mss。我们提出了一种新的MSS，称为正则化复制器动力学(RRD)，它只是基于后悔标准截断过程。我们发现RRD比现有的mss更具适应性，并且在各种游戏中表现优于它们。我们将研究扩展到三人博弈，其中收益矩阵在策略数量上是立方的，因此详尽地评估概况可能是不可行的。我们提出了一种轮廓搜索方法，可以从不完整的模型中识别解，并将其与使用正则化MSS的迭代模型构建相结合。最后，也是最重要的是，我们通过实验揭示了最佳响应目标的后悔对策略探索的性能有巨大的影响，这为正则化在PSRO中的有效性提供了解释。

{"title":"Regularization for Strategy Exploration in Empirical Game-Theoretic Analysis","authors":"Yongzhao Wang, Michael P. Wellman","doi":"10.48550/arXiv.2302.04928","DOIUrl":"https://doi.org/10.48550/arXiv.2302.04928","url":null,"abstract":"In iterative approaches to empirical game-theoretic analysis (EGTA), the strategy space is expanded incrementally based on analysis of intermediate game models. A common approach to strategy exploration, represented by the double oracle algorithm, is to add strategies that best-respond to a current equilibrium. This approach may suffer from overfitting and other limitations, leading the developers of the policy-space response oracle (PSRO) framework for iterative EGTA to generalize the target of best response, employing what they term meta-strategy solvers (MSSs). Noting that many MSSs can be viewed as perturbed or approximated versions of Nash equilibrium, we adopt an explicit regularization perspective to the specification and analysis of MSSs. We propose a novel MSS called regularized replicator dynamics (RRD), which simply truncates the process based on a regret criterion. We show that RRD is more adaptive than existing MSSs and outperforms them in various games. We extend our study to three-player games, for which the payoff matrix is cubic in the number of strategies and so exhaustively evaluating profiles may not be feasible. We propose a profile search method that can identify solutions from incomplete models, and combine this with iterative model construction using a regularized MSS. Finally, and most importantly, we reveal that the regret of best response targets has a tremendous influence on the performance of strategy exploration through experiments, which provides an explanation for the effectiveness of regularization in PSRO.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125860254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Learning Manner of Execution from Partial Corrections 从部分纠错中学习执行方式

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-07 DOI: 10.48550/arXiv.2302.03338

Mattias Appelgren, A. Lascarides

Some actions must be executed in different ways depending on the context. For example, wiping away marker requires vigorous force while wiping away almonds requires more gentle force. In this paper we provide a model where an agent learns which manner of action execution to use in which context, drawing on evidence from trial and error and verbal corrections when it makes a mistake (e.g., ``no, gently''). The learner starts out with a domain model that lacks the concepts denoted by the words in the teacher's feedback; both the words describing the context (e.g., marker) and the adverbs like ``gently''. We show that through the the semantics of coherence, our agent can perform the symbol grounding that's necessary for exploiting the teacher's feedback so as to solve its domain-level planning problem: to perform its actions in the current context in the right way.

某些操作必须根据上下文以不同的方式执行。例如，擦马克笔需要用力，而擦杏仁需要更温和的力量。在这篇论文中，我们提供了一个模型，在这个模型中，智能体学习在什么情况下使用哪种行为执行方式，从试错和口头纠正中吸取证据，当它犯了错误(例如，“不，轻轻地”)。学习者从一个领域模型开始，这个领域模型缺乏由教师反馈中的单词表示的概念;包括描述上下文的单词(如marker)和副词(如“轻轻地”)。我们表明，通过连贯的语义，我们的代理可以执行符号基础，这是利用教师的反馈来解决其领域级规划问题所必需的:以正确的方式在当前上下文中执行其行动。

引用次数: 1

Connectivity Enhanced Safe Neural Network Planner for Lane Changing in Mixed Traffic 基于连通性增强的混合交通变道安全神经网络规划

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-06 DOI: 10.48550/arXiv.2302.02513

Xiangguo Liu, Ruochen Jiao, Bowen Zheng, Davis Liang, Qi Zhu

Connectivity technology has shown great potentials in improving the safety and efficiency of transportation systems by providing information beyond the perception and prediction capabilities of individual vehicles. However, it is expected that human-driven and autonomous vehicles, and connected and non-connected vehicles need to share the transportation network during the transition period to fully connected and automated transportation systems. Such mixed traffic scenarios significantly increase the complexity in analyzing system behavior and quantifying uncertainty for highly interactive scenarios, e.g., lane changing. It is even harder to ensure system safety when neural network based planners are leveraged to further improve efficiency. In this work, we propose a connectivity-enhanced neural network based lane changing planner. By cooperating with surrounding connected vehicles in dynamic environment, our proposed planner will adapt its planned trajectory according to the analysis of a safe evasion trajectory. We demonstrate the strength of our planner design in improving efficiency and ensuring safety in various mixed traffic scenarios with extensive simulations. We also analyze the system robustness when the communication or coordination is not perfect.

互联技术通过提供超出单个车辆感知和预测能力的信息，在提高交通系统的安全性和效率方面显示出巨大的潜力。然而，预计在向全连接和自动化运输系统过渡期间，人类驾驶和自动驾驶车辆以及联网和非联网车辆需要共享交通网络。这种混合交通场景显著增加了分析系统行为和量化高度交互场景(如变道)不确定性的复杂性。当利用基于神经网络的规划器来进一步提高效率时，确保系统安全变得更加困难。在这项工作中，我们提出了一个基于连接性增强神经网络的变道规划器。在动态环境中，我们所提出的规划器通过与周围互联车辆的协作，根据安全规避轨迹的分析，调整其规划轨迹。我们通过大量的模拟，展示了我们的规划器设计在各种混合交通场景下提高效率和确保安全方面的优势。我们还分析了通信或协调不完善时系统的鲁棒性。

{"title":"Connectivity Enhanced Safe Neural Network Planner for Lane Changing in Mixed Traffic","authors":"Xiangguo Liu, Ruochen Jiao, Bowen Zheng, Davis Liang, Qi Zhu","doi":"10.48550/arXiv.2302.02513","DOIUrl":"https://doi.org/10.48550/arXiv.2302.02513","url":null,"abstract":"Connectivity technology has shown great potentials in improving the safety and efficiency of transportation systems by providing information beyond the perception and prediction capabilities of individual vehicles. However, it is expected that human-driven and autonomous vehicles, and connected and non-connected vehicles need to share the transportation network during the transition period to fully connected and automated transportation systems. Such mixed traffic scenarios significantly increase the complexity in analyzing system behavior and quantifying uncertainty for highly interactive scenarios, e.g., lane changing. It is even harder to ensure system safety when neural network based planners are leveraged to further improve efficiency. In this work, we propose a connectivity-enhanced neural network based lane changing planner. By cooperating with surrounding connected vehicles in dynamic environment, our proposed planner will adapt its planned trajectory according to the analysis of a safe evasion trajectory. We demonstrate the strength of our planner design in improving efficiency and ensuring safety in various mixed traffic scenarios with extensive simulations. We also analyze the system robustness when the communication or coordination is not perfect.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121701050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Decentralised and Cooperative Control of Multi-Robot Systems through Distributed Optimisation 基于分布式优化的多机器人系统分散与协同控制

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-03 DOI: 10.48550/arXiv.2302.01728

Yi Dong, Zhongguo Li, Xingyu Zhao, Z. Ding, Xiaowei Huang

Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information shared among neighbouring robots. Technically, a high-level distributed optimisation algorithm for multi-robot systems is presented, which will serve as an optimal reference generator for each individual agent. Then, based on the distributed optimisation algorithm, an output regulation method is utilised to solve the optimal coordination problem for general linear dynamic systems. The convergence of the proposed algorithm is theoretically proved. Both numerical simulations and real-time physical robot experiments are conducted to validate the effectiveness of the proposed cooperative control algorithms.

多机器人协同控制由于在民用、安全、军事等领域的广泛应用而引起了广泛的研究兴趣。针对具有一般线性动力学的多机器人系统，提出了一种协同控制算法。该算法基于分布式协同优化和输出调节，仅利用相邻机器人之间共享的信息实现全局最优。从技术上讲，提出了一种适用于多机器人系统的高级分布式优化算法，该算法将作为每个个体智能体的最优参考生成器。然后，在分布式优化算法的基础上，采用输出调节方法求解一般线性动态系统的最优协调问题。从理论上证明了该算法的收敛性。通过数值仿真和实时物理机器人实验验证了所提协同控制算法的有效性。

引用次数: 1

Online Re-Planning and Adaptive Parameter Update for Multi-Agent Path Finding with Stochastic Travel Times 具有随机行程时间的多智能体寻径在线重新规划与自适应参数更新

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-03 DOI: 10.48550/arXiv.2302.01489

Atsuyoshi Kita, Nobuhiro Suenari, Masashi Okada, T. Taniguchi

This study explores the problem of Multi-Agent Path Finding with continuous and stochastic travel times whose probability distribution is unknown. Our purpose is to manage a group of automated robots that provide package delivery services in a building where pedestrians and a wide variety of robots coexist, such as delivery services in office buildings, hospitals, and apartments. It is often the case with these real-world applications that the time required for the robots to traverse a corridor takes a continuous value and is randomly distributed, and the prior knowledge of the probability distribution of the travel time is limited. Multi-Agent Path Finding has been widely studied and applied to robot management systems; however, automating the robot operation in such environments remains difficult. We propose 1) online re-planning to update the action plan of robots while it is executed, and 2) parameter update to estimate the probability distribution of travel time using Bayesian inference as the delay is observed. We use a greedy heuristic to obtain solutions in a limited computation time. Through simulations, we empirically compare the performance of our method to those of existing methods in terms of the conflict probability and the actual travel time of robots. The simulation results indicate that the proposed method can find travel paths with at least 50% fewer conflicts and a shorter actual total travel time than existing methods. The proposed method requires a small number of trials to achieve the performance because the parameter update is prioritized on the important edges for path planning, thereby satisfying the requirements of quick implementation of robust planning of automated delivery services.

研究了概率分布未知的连续随机行程时间下的多智能体寻径问题。我们的目的是管理一组自动化机器人，在行人和各种机器人共存的建筑物中提供包裹递送服务，例如办公楼，医院和公寓的递送服务。在这些现实世界的应用中，机器人穿越走廊所需的时间通常是连续的，并且是随机分布的，并且对旅行时间概率分布的先验知识是有限的。多智能体寻径技术在机器人管理系统中得到了广泛的研究和应用;然而，在这样的环境中自动化机器人的操作仍然很困难。我们提出1)在线重新规划来更新机器人执行时的行动计划，2)参数更新来估计旅行时间的概率分布，使用贝叶斯推理来观察延迟。我们使用贪婪启发式算法在有限的计算时间内得到解。通过仿真，从机器人的冲突概率和实际行走时间两方面对本文方法与现有方法进行了实证比较。仿真结果表明，与现有方法相比，该方法能找到冲突最少减少50%、实际总行程时间更短的旅行路径。由于该方法在路径规划的重要边缘上优先更新参数，从而满足快速实现自动化配送服务鲁棒规划的要求，因此只需少量的试验即可实现性能。

{"title":"Online Re-Planning and Adaptive Parameter Update for Multi-Agent Path Finding with Stochastic Travel Times","authors":"Atsuyoshi Kita, Nobuhiro Suenari, Masashi Okada, T. Taniguchi","doi":"10.48550/arXiv.2302.01489","DOIUrl":"https://doi.org/10.48550/arXiv.2302.01489","url":null,"abstract":"This study explores the problem of Multi-Agent Path Finding with continuous and stochastic travel times whose probability distribution is unknown. Our purpose is to manage a group of automated robots that provide package delivery services in a building where pedestrians and a wide variety of robots coexist, such as delivery services in office buildings, hospitals, and apartments. It is often the case with these real-world applications that the time required for the robots to traverse a corridor takes a continuous value and is randomly distributed, and the prior knowledge of the probability distribution of the travel time is limited. Multi-Agent Path Finding has been widely studied and applied to robot management systems; however, automating the robot operation in such environments remains difficult. We propose 1) online re-planning to update the action plan of robots while it is executed, and 2) parameter update to estimate the probability distribution of travel time using Bayesian inference as the delay is observed. We use a greedy heuristic to obtain solutions in a limited computation time. Through simulations, we empirically compare the performance of our method to those of existing methods in terms of the conflict probability and the actual travel time of robots. The simulation results indicate that the proposed method can find travel paths with at least 50% fewer conflicts and a shorter actual total travel time than existing methods. The proposed method requires a small number of trials to achieve the performance because the parameter update is prioritized on the important edges for path planning, thereby satisfying the requirements of quick implementation of robust planning of automated delivery services.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125184727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal Capacity Modification for Many-To-One Matching Problems 多对一匹配问题的最优容量修改

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-03 DOI: 10.48550/arXiv.2302.01815

Jiehua Chen, Gergely Cs'aji

We consider many-to-one matching problems, where one side consists of students and the other side of schools with capacity constraints. We study how to optimally increase the capacities of the schools so as to obtain a stable and perfect matching (i.e., every student is matched) or a matching that is stable and Pareto-efficient for the students. We consider two common optimality criteria, one aiming to minimize the sum of capacity increases of all schools (abbrv. as MinSum) and the other aiming to minimize the maximum capacity increase of any school (abbrv. as MinMax). We obtain a complete picture in terms of computational complexity: Except for stable and perfect matchings using the MinMax criteria which is polynomial-time solvable, all three remaining problems are NP-hard. We further investigate the parameterized complexity and approximability and find that achieving stable and Pareto-efficient matchings via minimal capacity increases is much harder than achieving stable and perfect matchings.

我们考虑多对一匹配问题，其中一方由学生组成，另一方由能力受限的学校组成。我们研究如何最优地增加学校的容量，以获得稳定和完美的匹配(即每个学生都匹配)或对学生来说是稳定和帕累托效率的匹配。我们考虑了两个常见的最优性标准，一个旨在最小化所有学校的容量增加总和(简称。作为最小值)和另一个旨在最小化任何学校的最大容量增长(缩写。极大极小)。我们在计算复杂性方面得到了一个完整的画面:除了使用多项式时间可解的MinMax准则的稳定和完美匹配外，其余三个问题都是np困难的。我们进一步研究了参数化的复杂性和近似性，发现通过最小容量增加实现稳定和帕累托有效匹配比实现稳定和完美匹配要困难得多。

引用次数: 1

Fairness in the Assignment Problem with Uncertain Priorities 不确定优先级分配问题中的公平性

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-01-31 DOI: 10.48550/arXiv.2301.13804

Zeyu Shen, Zhiyi Wang, Xingyu Zhu, Brandon Fain, Kamesh Munagala

In the assignment problem, a set of items must be allocated to unit-demand agents who express ordinal preferences (rankings) over the items. In the assignment problem with priorities, agents with higher priority are entitled to their preferred goods with respect to lower priority agents. A priority can be naturally represented as a ranking and an uncertain priority as a distribution over rankings. For example, this models the problem of assigning student applicants to university seats or job applicants to job openings when the admitting body is uncertain about the true priority over applicants. This uncertainty can express the possibility of bias in the generation of the priority ranking. We believe we are the first to explicitly formulate and study the assignment problem with uncertain priorities. We introduce two natural notions of fairness in this problem: stochastic envy-freeness (SEF) and likelihood envy-freeness (LEF). We show that SEF and LEF are incompatible and that LEF is incompatible with ordinal efficiency. We describe two algorithms, Cycle Elimination (CE) and Unit-Time Eating (UTE) that satisfy ordinal efficiency (a form of ex-ante Pareto optimality) and SEF; the well known random serial dictatorship algorithm satisfies LEF and the weaker efficiency guarantee of ex-post Pareto optimality. We also show that CE satisfies a relaxation of LEF that we term 1-LEF which applies only to certain comparisons of priority, while UTE satisfies a version of proportional allocations with ranks. We conclude by demonstrating how a mediator can model a problem of school admission in the face of bias as an assignment problem with uncertain priority.

在分配问题中，必须将一组物品分配给单位需求代理，这些代理对物品表示顺序偏好(排名)。在具有优先级的分配问题中，优先级较高的代理相对于优先级较低的代理有权获得他们喜欢的商品。优先级可以自然地表示为排名，而不确定的优先级可以表示为排名的分布。例如，当录取机构对申请人的真正优先级不确定时，该模型将学生申请人分配到大学席位或将求职者分配到职位空缺。这种不确定性可以表达优先级排序产生偏差的可能性。我们认为，我们是第一个明确制定和研究具有不确定优先级的分配问题的国家。在这个问题中，我们引入了两个自然的公平概念:随机嫉妒自由(SEF)和可能性嫉妒自由(LEF)。我们证明了SEF和LEF是不相容的，并且LEF与序效率不相容。我们描述了两个算法，循环消除(CE)和单位时间进食(UTE)，满足顺序效率(事前帕累托最优的一种形式)和SEF;众所周知的随机序列独裁算法满足LEF和事后帕累托最优的较弱效率保证。我们还证明CE满足LEF的松弛，我们称之为1-LEF，它只适用于某些优先级的比较，而UTE满足一个有等级的比例分配版本。最后，我们展示了中介如何将面对偏见的入学问题建模为具有不确定优先级的分配问题。

{"title":"Fairness in the Assignment Problem with Uncertain Priorities","authors":"Zeyu Shen, Zhiyi Wang, Xingyu Zhu, Brandon Fain, Kamesh Munagala","doi":"10.48550/arXiv.2301.13804","DOIUrl":"https://doi.org/10.48550/arXiv.2301.13804","url":null,"abstract":"In the assignment problem, a set of items must be allocated to unit-demand agents who express ordinal preferences (rankings) over the items. In the assignment problem with priorities, agents with higher priority are entitled to their preferred goods with respect to lower priority agents. A priority can be naturally represented as a ranking and an uncertain priority as a distribution over rankings. For example, this models the problem of assigning student applicants to university seats or job applicants to job openings when the admitting body is uncertain about the true priority over applicants. This uncertainty can express the possibility of bias in the generation of the priority ranking. We believe we are the first to explicitly formulate and study the assignment problem with uncertain priorities. We introduce two natural notions of fairness in this problem: stochastic envy-freeness (SEF) and likelihood envy-freeness (LEF). We show that SEF and LEF are incompatible and that LEF is incompatible with ordinal efficiency. We describe two algorithms, Cycle Elimination (CE) and Unit-Time Eating (UTE) that satisfy ordinal efficiency (a form of ex-ante Pareto optimality) and SEF; the well known random serial dictatorship algorithm satisfies LEF and the weaker efficiency guarantee of ex-post Pareto optimality. We also show that CE satisfies a relaxation of LEF that we term 1-LEF which applies only to certain comparisons of priority, while UTE satisfies a version of proportional allocations with ranks. We conclude by demonstrating how a mediator can model a problem of school admission in the face of bias as an assignment problem with uncertain priority.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127639592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1