基于广义策略改进的多目标多智能体强化学习中的知识转移

IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Computer Science and Information Systems Pub Date : 2023-01-01 DOI:10.2298/csis221210071a
Almeida de, Lucas Alegre, Ana Bazzan
{"title":"基于广义策略改进的多目标多智能体强化学习中的知识转移","authors":"Almeida de, Lucas Alegre, Ana Bazzan","doi":"10.2298/csis221210071a","DOIUrl":null,"url":null,"abstract":"Even though many real-world problems are inherently distributed and multi-objective, most of the reinforcement learning (RL) literature deals with single agents and single objectives. While some of these problems can be solved using a single-agent single-objective RL solution (e.g., by specifying preferences over objectives), there are robustness issues, as well the fact that preferences may change over time, or it might not even be possible to set such preferences. Therefore, a need arises for a way to train multiple agents for any given preference distribution over the objectives. This work thus proposes a multi-objective multi-agent reinforcement learning (MOMARL) method in which agents build a shared set of policies during training, in a decentralized way, and then combine these policies using a generalization of policy improvement and policy evaluation (fundamental operations of RL algorithms) to generate effective behaviors for any possible preference distribution, without requiring any additional training. This method is applied to two different application scenarios: a multi-agent extension of a domain commonly used in the related literature, and traffic signal control, which is more complex, inherently distributed and multi-objective (the flow of both vehicles and pedestrians are considered). Results show that the approach is able to effectively and efficiently generate behaviors for the agents, given any preference over the objectives.","PeriodicalId":50636,"journal":{"name":"Computer Science and Information Systems","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Knowledge transfer in multi-objective multi-agent reinforcement learning via generalized policy improvement\",\"authors\":\"Almeida de, Lucas Alegre, Ana Bazzan\",\"doi\":\"10.2298/csis221210071a\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Even though many real-world problems are inherently distributed and multi-objective, most of the reinforcement learning (RL) literature deals with single agents and single objectives. While some of these problems can be solved using a single-agent single-objective RL solution (e.g., by specifying preferences over objectives), there are robustness issues, as well the fact that preferences may change over time, or it might not even be possible to set such preferences. Therefore, a need arises for a way to train multiple agents for any given preference distribution over the objectives. This work thus proposes a multi-objective multi-agent reinforcement learning (MOMARL) method in which agents build a shared set of policies during training, in a decentralized way, and then combine these policies using a generalization of policy improvement and policy evaluation (fundamental operations of RL algorithms) to generate effective behaviors for any possible preference distribution, without requiring any additional training. This method is applied to two different application scenarios: a multi-agent extension of a domain commonly used in the related literature, and traffic signal control, which is more complex, inherently distributed and multi-objective (the flow of both vehicles and pedestrians are considered). Results show that the approach is able to effectively and efficiently generate behaviors for the agents, given any preference over the objectives.\",\"PeriodicalId\":50636,\"journal\":{\"name\":\"Computer Science and Information Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science and Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2298/csis221210071a\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science and Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2298/csis221210071a","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

尽管许多现实世界的问题本质上是分布式和多目标的,但大多数强化学习(RL)文献都是处理单个智能体和单个目标的。虽然其中一些问题可以使用单智能体单目标RL解决方案来解决(例如,通过指定偏好而不是目标),但存在鲁棒性问题,以及偏好可能随着时间而改变的事实,或者甚至可能无法设置这种偏好。因此,需要一种方法来训练多个代理针对任何给定的偏好分布的目标。因此,这项工作提出了一种多目标多智能体强化学习(MOMARL)方法,其中智能体在训练期间以分散的方式构建一组共享策略,然后使用策略改进和策略评估的泛化(RL算法的基本操作)将这些策略组合在一起,以生成针对任何可能的偏好分布的有效行为,而无需任何额外的训练。该方法应用于两种不同的应用场景:一种是相关文献中常用的领域的多智能体扩展,另一种是交通信号控制,它更复杂,固有分布和多目标(同时考虑车辆和行人的流动)。结果表明,该方法能够有效地生成代理的行为,给定任何优于目标的偏好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Knowledge transfer in multi-objective multi-agent reinforcement learning via generalized policy improvement
Even though many real-world problems are inherently distributed and multi-objective, most of the reinforcement learning (RL) literature deals with single agents and single objectives. While some of these problems can be solved using a single-agent single-objective RL solution (e.g., by specifying preferences over objectives), there are robustness issues, as well the fact that preferences may change over time, or it might not even be possible to set such preferences. Therefore, a need arises for a way to train multiple agents for any given preference distribution over the objectives. This work thus proposes a multi-objective multi-agent reinforcement learning (MOMARL) method in which agents build a shared set of policies during training, in a decentralized way, and then combine these policies using a generalization of policy improvement and policy evaluation (fundamental operations of RL algorithms) to generate effective behaviors for any possible preference distribution, without requiring any additional training. This method is applied to two different application scenarios: a multi-agent extension of a domain commonly used in the related literature, and traffic signal control, which is more complex, inherently distributed and multi-objective (the flow of both vehicles and pedestrians are considered). Results show that the approach is able to effectively and efficiently generate behaviors for the agents, given any preference over the objectives.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Science and Information Systems
Computer Science and Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
2.30
自引率
21.40%
发文量
76
审稿时长
7.5 months
期刊介绍: About the journal Home page Contact information Aims and scope Indexing information Editorial policies ComSIS consortium Journal boards Managing board For authors Information for contributors Paper submission Article submission through OJS Copyright transfer form Download section For readers Forthcoming articles Current issue Archive Subscription For reviewers View and review submissions News Journal''s Facebook page Call for special issue New issue notification Aims and scope Computer Science and Information Systems (ComSIS) is an international refereed journal, published in Serbia. The objective of ComSIS is to communicate important research and development results in the areas of computer science, software engineering, and information systems.
期刊最新文献
Reviewer Acknowledgements for Computer and Information Science, Vol. 16, No. 3 Drawbacks of Traditional Environmental Monitoring Systems Improving the Classification Ability of Delegating Classifiers Using Different Supervised Machine Learning Algorithms Reinforcement learning - based adaptation and scheduling methods for multi-source DASH On the Convergence of Hypergeometric to Binomial Distributions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1