Pub Date : 2023-02-14DOI: 10.48550/arXiv.2302.06760
Ahad N. Zehmakan
Consider a graph G with n nodes and m edges, which represents a social network, and assume that initially each node is blue or white. In each round, all nodes simultaneously update their color to the most frequent color in their neighborhood. This is called the Majority Model (MM) if a node keeps its color in case of a tie and the Random Majority Model (RMM) if it chooses blue with probability 1/2 and white otherwise. We prove that there are graphs for which RMM needs exponentially many rounds to reach a stable configuration in expectation, and such a configuration can have exponentially many states (i.e., colorings). This is in contrast to MM, which is known to always reach a stable configuration with one or two states in $O(m)$ rounds. For the special case of a cycle graph C_n, we prove the stronger and tight bounds of $lceil n/2rceil-1$ and $O(n^2)$ in MM and RMM, respectively. Furthermore, we show that the number of stable colorings in MM on C_n is equal to $Theta(Phi^n)$, where $Phi = (1+sqrt{5})/2$ is the golden ratio, while it is equal to 2 for RMM. We also study the minimum size of a winning set, which is a set of nodes whose agreement on a color in the initial coloring enforces the process to end in a coloring where all nodes share that color. We present tight bounds on the minimum size of a winning set for both MM and RMM. Furthermore, we analyze our models for a random initial coloring, where each node is colored blue independently with some probability $p$ and white otherwise. Using some martingale analysis and counting arguments, we prove that the expected final number of blue nodes is respectively equal to $(2p^2-p^3)n/(1-p+p^2)$ and pn in MM and RMM on a cycle graph C_n. Finally, we conduct some experiments which complement our theoretical findings and also lead to the proposal of some intriguing open problems and conjectures to be tackled in future work.
{"title":"Random Majority Opinion Diffusion: Stabilization Time, Absorbing States, and Influential Nodes","authors":"Ahad N. Zehmakan","doi":"10.48550/arXiv.2302.06760","DOIUrl":"https://doi.org/10.48550/arXiv.2302.06760","url":null,"abstract":"Consider a graph G with n nodes and m edges, which represents a social network, and assume that initially each node is blue or white. In each round, all nodes simultaneously update their color to the most frequent color in their neighborhood. This is called the Majority Model (MM) if a node keeps its color in case of a tie and the Random Majority Model (RMM) if it chooses blue with probability 1/2 and white otherwise. We prove that there are graphs for which RMM needs exponentially many rounds to reach a stable configuration in expectation, and such a configuration can have exponentially many states (i.e., colorings). This is in contrast to MM, which is known to always reach a stable configuration with one or two states in $O(m)$ rounds. For the special case of a cycle graph C_n, we prove the stronger and tight bounds of $lceil n/2rceil-1$ and $O(n^2)$ in MM and RMM, respectively. Furthermore, we show that the number of stable colorings in MM on C_n is equal to $Theta(Phi^n)$, where $Phi = (1+sqrt{5})/2$ is the golden ratio, while it is equal to 2 for RMM. We also study the minimum size of a winning set, which is a set of nodes whose agreement on a color in the initial coloring enforces the process to end in a coloring where all nodes share that color. We present tight bounds on the minimum size of a winning set for both MM and RMM. Furthermore, we analyze our models for a random initial coloring, where each node is colored blue independently with some probability $p$ and white otherwise. Using some martingale analysis and counting arguments, we prove that the expected final number of blue nodes is respectively equal to $(2p^2-p^3)n/(1-p+p^2)$ and pn in MM and RMM on a cycle graph C_n. Finally, we conduct some experiments which complement our theoretical findings and also lead to the proposal of some intriguing open problems and conjectures to be tackled in future work.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130587235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-13DOI: 10.48550/arXiv.2302.06033
Yuhong Xu, Shih-Fen Cheng, Xinyu Chen
In domains where agents interact strategically, game theory is applied widely to predict how agents would behave. However, game-theoretic predictions are based on the assumption that agents are fully rational and believe in equilibrium plays, which unfortunately are mostly not true when human decision makers are involved. To address this limitation, a number of behavioral game-theoretic models are defined to account for the limited rationality of human decision makers. The"quantal cognitive hierarchy"(QCH) model, which is one of the more recent models, is demonstrated to be the state-of-art model for predicting human behaviors in normal-form games. The QCH model assumes that agents in games can be both non-strategic (level-0) and strategic (level-$k$). For level-0 agents, they choose their strategies irrespective of other agents. For level-$k$ agents, they assume that other agents would be behaving at levels less than $k$ and best respond against them. However, an important assumption of the QCH model is that the distribution of agents' levels follows a Poisson distribution. In this paper, we relax this assumption and design a learning-based method at the population level to iteratively estimate the empirical distribution of agents' reasoning levels. By using a real-world dataset from the Swedish lowest unique positive integer game, we demonstrate how our refined QCH model and the iterative solution-seeking process can be used in providing a more accurate behavioral model for agents. This leads to better performance in fitting the real data and allows us to track an agent's progress in learning to play strategically over multiple rounds.
{"title":"Improving Quantal Cognitive Hierarchy Model Through Iterative Population Learning","authors":"Yuhong Xu, Shih-Fen Cheng, Xinyu Chen","doi":"10.48550/arXiv.2302.06033","DOIUrl":"https://doi.org/10.48550/arXiv.2302.06033","url":null,"abstract":"In domains where agents interact strategically, game theory is applied widely to predict how agents would behave. However, game-theoretic predictions are based on the assumption that agents are fully rational and believe in equilibrium plays, which unfortunately are mostly not true when human decision makers are involved. To address this limitation, a number of behavioral game-theoretic models are defined to account for the limited rationality of human decision makers. The\"quantal cognitive hierarchy\"(QCH) model, which is one of the more recent models, is demonstrated to be the state-of-art model for predicting human behaviors in normal-form games. The QCH model assumes that agents in games can be both non-strategic (level-0) and strategic (level-$k$). For level-0 agents, they choose their strategies irrespective of other agents. For level-$k$ agents, they assume that other agents would be behaving at levels less than $k$ and best respond against them. However, an important assumption of the QCH model is that the distribution of agents' levels follows a Poisson distribution. In this paper, we relax this assumption and design a learning-based method at the population level to iteratively estimate the empirical distribution of agents' reasoning levels. By using a real-world dataset from the Swedish lowest unique positive integer game, we demonstrate how our refined QCH model and the iterative solution-seeking process can be used in providing a more accurate behavioral model for agents. This leads to better performance in fitting the real data and allows us to track an agent's progress in learning to play strategically over multiple rounds.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134440286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-13DOI: 10.48550/arXiv.2302.06548
Bram Grooten, Ghada Sokar, Shibhansh Dohare, Elena Mocanu, Matthew E. Taylor, Mykola Pechenizkiy, D. Mocanu
Tomorrow's robots will need to distinguish useful information from noise when performing different tasks. A household robot for instance may continuously receive a plethora of information about the home, but needs to focus on just a small subset to successfully execute its current chore. Filtering distracting inputs that contain irrelevant data has received little attention in the reinforcement learning literature. To start resolving this, we formulate a problem setting in reinforcement learning called the $textit{extremely noisy environment}$ (ENE), where up to $99%$ of the input features are pure noise. Agents need to detect which features provide task-relevant information about the state of the environment. Consequently, we propose a new method termed $textit{Automatic Noise Filtering}$ (ANF), which uses the principles of dynamic sparse training in synergy with various deep reinforcement learning algorithms. The sparse input layer learns to focus its connectivity on task-relevant features, such that ANF-SAC and ANF-TD3 outperform standard SAC and TD3 by a large margin, while using up to $95%$ fewer weights. Furthermore, we devise a transfer learning setting for ENEs, by permuting all features of the environment after 1M timesteps to simulate the fact that other information sources can become relevant as the world evolves. Again, ANF surpasses the baselines in final performance and sample complexity. Our code is available at https://github.com/bramgrooten/automatic-noise-filtering
{"title":"Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning","authors":"Bram Grooten, Ghada Sokar, Shibhansh Dohare, Elena Mocanu, Matthew E. Taylor, Mykola Pechenizkiy, D. Mocanu","doi":"10.48550/arXiv.2302.06548","DOIUrl":"https://doi.org/10.48550/arXiv.2302.06548","url":null,"abstract":"Tomorrow's robots will need to distinguish useful information from noise when performing different tasks. A household robot for instance may continuously receive a plethora of information about the home, but needs to focus on just a small subset to successfully execute its current chore. Filtering distracting inputs that contain irrelevant data has received little attention in the reinforcement learning literature. To start resolving this, we formulate a problem setting in reinforcement learning called the $textit{extremely noisy environment}$ (ENE), where up to $99%$ of the input features are pure noise. Agents need to detect which features provide task-relevant information about the state of the environment. Consequently, we propose a new method termed $textit{Automatic Noise Filtering}$ (ANF), which uses the principles of dynamic sparse training in synergy with various deep reinforcement learning algorithms. The sparse input layer learns to focus its connectivity on task-relevant features, such that ANF-SAC and ANF-TD3 outperform standard SAC and TD3 by a large margin, while using up to $95%$ fewer weights. Furthermore, we devise a transfer learning setting for ENEs, by permuting all features of the environment after 1M timesteps to simulate the fact that other information sources can become relevant as the world evolves. Again, ANF surpasses the baselines in final performance and sample complexity. Our code is available at https://github.com/bramgrooten/automatic-noise-filtering","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126173370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-09DOI: 10.48550/arXiv.2302.04928
Yongzhao Wang, Michael P. Wellman
In iterative approaches to empirical game-theoretic analysis (EGTA), the strategy space is expanded incrementally based on analysis of intermediate game models. A common approach to strategy exploration, represented by the double oracle algorithm, is to add strategies that best-respond to a current equilibrium. This approach may suffer from overfitting and other limitations, leading the developers of the policy-space response oracle (PSRO) framework for iterative EGTA to generalize the target of best response, employing what they term meta-strategy solvers (MSSs). Noting that many MSSs can be viewed as perturbed or approximated versions of Nash equilibrium, we adopt an explicit regularization perspective to the specification and analysis of MSSs. We propose a novel MSS called regularized replicator dynamics (RRD), which simply truncates the process based on a regret criterion. We show that RRD is more adaptive than existing MSSs and outperforms them in various games. We extend our study to three-player games, for which the payoff matrix is cubic in the number of strategies and so exhaustively evaluating profiles may not be feasible. We propose a profile search method that can identify solutions from incomplete models, and combine this with iterative model construction using a regularized MSS. Finally, and most importantly, we reveal that the regret of best response targets has a tremendous influence on the performance of strategy exploration through experiments, which provides an explanation for the effectiveness of regularization in PSRO.
{"title":"Regularization for Strategy Exploration in Empirical Game-Theoretic Analysis","authors":"Yongzhao Wang, Michael P. Wellman","doi":"10.48550/arXiv.2302.04928","DOIUrl":"https://doi.org/10.48550/arXiv.2302.04928","url":null,"abstract":"In iterative approaches to empirical game-theoretic analysis (EGTA), the strategy space is expanded incrementally based on analysis of intermediate game models. A common approach to strategy exploration, represented by the double oracle algorithm, is to add strategies that best-respond to a current equilibrium. This approach may suffer from overfitting and other limitations, leading the developers of the policy-space response oracle (PSRO) framework for iterative EGTA to generalize the target of best response, employing what they term meta-strategy solvers (MSSs). Noting that many MSSs can be viewed as perturbed or approximated versions of Nash equilibrium, we adopt an explicit regularization perspective to the specification and analysis of MSSs. We propose a novel MSS called regularized replicator dynamics (RRD), which simply truncates the process based on a regret criterion. We show that RRD is more adaptive than existing MSSs and outperforms them in various games. We extend our study to three-player games, for which the payoff matrix is cubic in the number of strategies and so exhaustively evaluating profiles may not be feasible. We propose a profile search method that can identify solutions from incomplete models, and combine this with iterative model construction using a regularized MSS. Finally, and most importantly, we reveal that the regret of best response targets has a tremendous influence on the performance of strategy exploration through experiments, which provides an explanation for the effectiveness of regularization in PSRO.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125860254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-07DOI: 10.48550/arXiv.2302.03338
Mattias Appelgren, A. Lascarides
Some actions must be executed in different ways depending on the context. For example, wiping away marker requires vigorous force while wiping away almonds requires more gentle force. In this paper we provide a model where an agent learns which manner of action execution to use in which context, drawing on evidence from trial and error and verbal corrections when it makes a mistake (e.g., ``no, gently''). The learner starts out with a domain model that lacks the concepts denoted by the words in the teacher's feedback; both the words describing the context (e.g., marker) and the adverbs like ``gently''. We show that through the the semantics of coherence, our agent can perform the symbol grounding that's necessary for exploiting the teacher's feedback so as to solve its domain-level planning problem: to perform its actions in the current context in the right way.
{"title":"Learning Manner of Execution from Partial Corrections","authors":"Mattias Appelgren, A. Lascarides","doi":"10.48550/arXiv.2302.03338","DOIUrl":"https://doi.org/10.48550/arXiv.2302.03338","url":null,"abstract":"Some actions must be executed in different ways depending on the context. For example, wiping away marker requires vigorous force while wiping away almonds requires more gentle force. In this paper we provide a model where an agent learns which manner of action execution to use in which context, drawing on evidence from trial and error and verbal corrections when it makes a mistake (e.g., ``no, gently''). The learner starts out with a domain model that lacks the concepts denoted by the words in the teacher's feedback; both the words describing the context (e.g., marker) and the adverbs like ``gently''. We show that through the the semantics of coherence, our agent can perform the symbol grounding that's necessary for exploiting the teacher's feedback so as to solve its domain-level planning problem: to perform its actions in the current context in the right way.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114611046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-06DOI: 10.48550/arXiv.2302.02513
Xiangguo Liu, Ruochen Jiao, Bowen Zheng, Davis Liang, Qi Zhu
Connectivity technology has shown great potentials in improving the safety and efficiency of transportation systems by providing information beyond the perception and prediction capabilities of individual vehicles. However, it is expected that human-driven and autonomous vehicles, and connected and non-connected vehicles need to share the transportation network during the transition period to fully connected and automated transportation systems. Such mixed traffic scenarios significantly increase the complexity in analyzing system behavior and quantifying uncertainty for highly interactive scenarios, e.g., lane changing. It is even harder to ensure system safety when neural network based planners are leveraged to further improve efficiency. In this work, we propose a connectivity-enhanced neural network based lane changing planner. By cooperating with surrounding connected vehicles in dynamic environment, our proposed planner will adapt its planned trajectory according to the analysis of a safe evasion trajectory. We demonstrate the strength of our planner design in improving efficiency and ensuring safety in various mixed traffic scenarios with extensive simulations. We also analyze the system robustness when the communication or coordination is not perfect.
{"title":"Connectivity Enhanced Safe Neural Network Planner for Lane Changing in Mixed Traffic","authors":"Xiangguo Liu, Ruochen Jiao, Bowen Zheng, Davis Liang, Qi Zhu","doi":"10.48550/arXiv.2302.02513","DOIUrl":"https://doi.org/10.48550/arXiv.2302.02513","url":null,"abstract":"Connectivity technology has shown great potentials in improving the safety and efficiency of transportation systems by providing information beyond the perception and prediction capabilities of individual vehicles. However, it is expected that human-driven and autonomous vehicles, and connected and non-connected vehicles need to share the transportation network during the transition period to fully connected and automated transportation systems. Such mixed traffic scenarios significantly increase the complexity in analyzing system behavior and quantifying uncertainty for highly interactive scenarios, e.g., lane changing. It is even harder to ensure system safety when neural network based planners are leveraged to further improve efficiency. In this work, we propose a connectivity-enhanced neural network based lane changing planner. By cooperating with surrounding connected vehicles in dynamic environment, our proposed planner will adapt its planned trajectory according to the analysis of a safe evasion trajectory. We demonstrate the strength of our planner design in improving efficiency and ensuring safety in various mixed traffic scenarios with extensive simulations. We also analyze the system robustness when the communication or coordination is not perfect.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121701050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-03DOI: 10.48550/arXiv.2302.01728
Yi Dong, Zhongguo Li, Xingyu Zhao, Z. Ding, Xiaowei Huang
Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information shared among neighbouring robots. Technically, a high-level distributed optimisation algorithm for multi-robot systems is presented, which will serve as an optimal reference generator for each individual agent. Then, based on the distributed optimisation algorithm, an output regulation method is utilised to solve the optimal coordination problem for general linear dynamic systems. The convergence of the proposed algorithm is theoretically proved. Both numerical simulations and real-time physical robot experiments are conducted to validate the effectiveness of the proposed cooperative control algorithms.
{"title":"Decentralised and Cooperative Control of Multi-Robot Systems through Distributed Optimisation","authors":"Yi Dong, Zhongguo Li, Xingyu Zhao, Z. Ding, Xiaowei Huang","doi":"10.48550/arXiv.2302.01728","DOIUrl":"https://doi.org/10.48550/arXiv.2302.01728","url":null,"abstract":"Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information shared among neighbouring robots. Technically, a high-level distributed optimisation algorithm for multi-robot systems is presented, which will serve as an optimal reference generator for each individual agent. Then, based on the distributed optimisation algorithm, an output regulation method is utilised to solve the optimal coordination problem for general linear dynamic systems. The convergence of the proposed algorithm is theoretically proved. Both numerical simulations and real-time physical robot experiments are conducted to validate the effectiveness of the proposed cooperative control algorithms.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115559577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-03DOI: 10.48550/arXiv.2302.01489
Atsuyoshi Kita, Nobuhiro Suenari, Masashi Okada, T. Taniguchi
This study explores the problem of Multi-Agent Path Finding with continuous and stochastic travel times whose probability distribution is unknown. Our purpose is to manage a group of automated robots that provide package delivery services in a building where pedestrians and a wide variety of robots coexist, such as delivery services in office buildings, hospitals, and apartments. It is often the case with these real-world applications that the time required for the robots to traverse a corridor takes a continuous value and is randomly distributed, and the prior knowledge of the probability distribution of the travel time is limited. Multi-Agent Path Finding has been widely studied and applied to robot management systems; however, automating the robot operation in such environments remains difficult. We propose 1) online re-planning to update the action plan of robots while it is executed, and 2) parameter update to estimate the probability distribution of travel time using Bayesian inference as the delay is observed. We use a greedy heuristic to obtain solutions in a limited computation time. Through simulations, we empirically compare the performance of our method to those of existing methods in terms of the conflict probability and the actual travel time of robots. The simulation results indicate that the proposed method can find travel paths with at least 50% fewer conflicts and a shorter actual total travel time than existing methods. The proposed method requires a small number of trials to achieve the performance because the parameter update is prioritized on the important edges for path planning, thereby satisfying the requirements of quick implementation of robust planning of automated delivery services.
{"title":"Online Re-Planning and Adaptive Parameter Update for Multi-Agent Path Finding with Stochastic Travel Times","authors":"Atsuyoshi Kita, Nobuhiro Suenari, Masashi Okada, T. Taniguchi","doi":"10.48550/arXiv.2302.01489","DOIUrl":"https://doi.org/10.48550/arXiv.2302.01489","url":null,"abstract":"This study explores the problem of Multi-Agent Path Finding with continuous and stochastic travel times whose probability distribution is unknown. Our purpose is to manage a group of automated robots that provide package delivery services in a building where pedestrians and a wide variety of robots coexist, such as delivery services in office buildings, hospitals, and apartments. It is often the case with these real-world applications that the time required for the robots to traverse a corridor takes a continuous value and is randomly distributed, and the prior knowledge of the probability distribution of the travel time is limited. Multi-Agent Path Finding has been widely studied and applied to robot management systems; however, automating the robot operation in such environments remains difficult. We propose 1) online re-planning to update the action plan of robots while it is executed, and 2) parameter update to estimate the probability distribution of travel time using Bayesian inference as the delay is observed. We use a greedy heuristic to obtain solutions in a limited computation time. Through simulations, we empirically compare the performance of our method to those of existing methods in terms of the conflict probability and the actual travel time of robots. The simulation results indicate that the proposed method can find travel paths with at least 50% fewer conflicts and a shorter actual total travel time than existing methods. The proposed method requires a small number of trials to achieve the performance because the parameter update is prioritized on the important edges for path planning, thereby satisfying the requirements of quick implementation of robust planning of automated delivery services.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125184727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-03DOI: 10.48550/arXiv.2302.01815
Jiehua Chen, Gergely Cs'aji
We consider many-to-one matching problems, where one side consists of students and the other side of schools with capacity constraints. We study how to optimally increase the capacities of the schools so as to obtain a stable and perfect matching (i.e., every student is matched) or a matching that is stable and Pareto-efficient for the students. We consider two common optimality criteria, one aiming to minimize the sum of capacity increases of all schools (abbrv. as MinSum) and the other aiming to minimize the maximum capacity increase of any school (abbrv. as MinMax). We obtain a complete picture in terms of computational complexity: Except for stable and perfect matchings using the MinMax criteria which is polynomial-time solvable, all three remaining problems are NP-hard. We further investigate the parameterized complexity and approximability and find that achieving stable and Pareto-efficient matchings via minimal capacity increases is much harder than achieving stable and perfect matchings.
{"title":"Optimal Capacity Modification for Many-To-One Matching Problems","authors":"Jiehua Chen, Gergely Cs'aji","doi":"10.48550/arXiv.2302.01815","DOIUrl":"https://doi.org/10.48550/arXiv.2302.01815","url":null,"abstract":"We consider many-to-one matching problems, where one side consists of students and the other side of schools with capacity constraints. We study how to optimally increase the capacities of the schools so as to obtain a stable and perfect matching (i.e., every student is matched) or a matching that is stable and Pareto-efficient for the students. We consider two common optimality criteria, one aiming to minimize the sum of capacity increases of all schools (abbrv. as MinSum) and the other aiming to minimize the maximum capacity increase of any school (abbrv. as MinMax). We obtain a complete picture in terms of computational complexity: Except for stable and perfect matchings using the MinMax criteria which is polynomial-time solvable, all three remaining problems are NP-hard. We further investigate the parameterized complexity and approximability and find that achieving stable and Pareto-efficient matchings via minimal capacity increases is much harder than achieving stable and perfect matchings.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132232540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the assignment problem, a set of items must be allocated to unit-demand agents who express ordinal preferences (rankings) over the items. In the assignment problem with priorities, agents with higher priority are entitled to their preferred goods with respect to lower priority agents. A priority can be naturally represented as a ranking and an uncertain priority as a distribution over rankings. For example, this models the problem of assigning student applicants to university seats or job applicants to job openings when the admitting body is uncertain about the true priority over applicants. This uncertainty can express the possibility of bias in the generation of the priority ranking. We believe we are the first to explicitly formulate and study the assignment problem with uncertain priorities. We introduce two natural notions of fairness in this problem: stochastic envy-freeness (SEF) and likelihood envy-freeness (LEF). We show that SEF and LEF are incompatible and that LEF is incompatible with ordinal efficiency. We describe two algorithms, Cycle Elimination (CE) and Unit-Time Eating (UTE) that satisfy ordinal efficiency (a form of ex-ante Pareto optimality) and SEF; the well known random serial dictatorship algorithm satisfies LEF and the weaker efficiency guarantee of ex-post Pareto optimality. We also show that CE satisfies a relaxation of LEF that we term 1-LEF which applies only to certain comparisons of priority, while UTE satisfies a version of proportional allocations with ranks. We conclude by demonstrating how a mediator can model a problem of school admission in the face of bias as an assignment problem with uncertain priority.
{"title":"Fairness in the Assignment Problem with Uncertain Priorities","authors":"Zeyu Shen, Zhiyi Wang, Xingyu Zhu, Brandon Fain, Kamesh Munagala","doi":"10.48550/arXiv.2301.13804","DOIUrl":"https://doi.org/10.48550/arXiv.2301.13804","url":null,"abstract":"In the assignment problem, a set of items must be allocated to unit-demand agents who express ordinal preferences (rankings) over the items. In the assignment problem with priorities, agents with higher priority are entitled to their preferred goods with respect to lower priority agents. A priority can be naturally represented as a ranking and an uncertain priority as a distribution over rankings. For example, this models the problem of assigning student applicants to university seats or job applicants to job openings when the admitting body is uncertain about the true priority over applicants. This uncertainty can express the possibility of bias in the generation of the priority ranking. We believe we are the first to explicitly formulate and study the assignment problem with uncertain priorities. We introduce two natural notions of fairness in this problem: stochastic envy-freeness (SEF) and likelihood envy-freeness (LEF). We show that SEF and LEF are incompatible and that LEF is incompatible with ordinal efficiency. We describe two algorithms, Cycle Elimination (CE) and Unit-Time Eating (UTE) that satisfy ordinal efficiency (a form of ex-ante Pareto optimality) and SEF; the well known random serial dictatorship algorithm satisfies LEF and the weaker efficiency guarantee of ex-post Pareto optimality. We also show that CE satisfies a relaxation of LEF that we term 1-LEF which applies only to certain comparisons of priority, while UTE satisfies a version of proportional allocations with ranks. We conclude by demonstrating how a mediator can model a problem of school admission in the face of bias as an assignment problem with uncertain priority.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127639592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}