Alvaro Velasquez, Brett Bissey, Lior Barak, Daniel Melcer, Andre Beckus, Ismail R. Alkhouri, George K. Atia
{"title":"Multi-Agent Tree Search with Dynamic Reward Shaping","authors":"Alvaro Velasquez, Brett Bissey, Lior Barak, Daniel Melcer, Andre Beckus, Ismail R. Alkhouri, George K. Atia","doi":"10.1609/icaps.v32i1.19854","DOIUrl":null,"url":null,"abstract":"Sparse rewards and their representation in multi-agent domains remains a challenge for the development of multi-agent planning systems. While techniques from formal methods can be adopted to represent the underlying planning objectives, their use in facilitating and accelerating learning has witnessed limited attention in multi-agent settings. Reward shaping methods that leverage such formal representations in single-agent settings are typically static in the sense that the artificial rewards remain the same throughout the entire learning process. In contrast, we investigate the use of such formal objective representations to define novel reward shaping functions that capture the learned experience of the agents. More specifically, we leverage the automaton representation of the underlying team objectives in mixed cooperative-competitive domains such that each automaton transition is assigned an expected value proportional to the frequency with which it was observed in successful trajectories of past behavior. This form of dynamic reward shaping is proposed within a multi-agent tree search architecture wherein agents can simultaneously reason about the future behavior of other agents as well as their own future behavior.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Automated Planning and Scheduling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icaps.v32i1.19854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Sparse rewards and their representation in multi-agent domains remains a challenge for the development of multi-agent planning systems. While techniques from formal methods can be adopted to represent the underlying planning objectives, their use in facilitating and accelerating learning has witnessed limited attention in multi-agent settings. Reward shaping methods that leverage such formal representations in single-agent settings are typically static in the sense that the artificial rewards remain the same throughout the entire learning process. In contrast, we investigate the use of such formal objective representations to define novel reward shaping functions that capture the learned experience of the agents. More specifically, we leverage the automaton representation of the underlying team objectives in mixed cooperative-competitive domains such that each automaton transition is assigned an expected value proportional to the frequency with which it was observed in successful trajectories of past behavior. This form of dynamic reward shaping is proposed within a multi-agent tree search architecture wherein agents can simultaneously reason about the future behavior of other agents as well as their own future behavior.