Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems

Sayyed Jaffar Ali Raza, Mingjie Lin
{"title":"Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems","authors":"Sayyed Jaffar Ali Raza, Mingjie Lin","doi":"10.1109/COASE.2019.8843223","DOIUrl":null,"url":null,"abstract":"Policy based reinforcement learning methods are widely used for multi-agent systems to learn optimal actions given any state; with partial or even no model representation. However multi-agent systems with complex structures (curse of dimensionality) or with high constraints (like bio-inspired (a) snake or serpentine robots) show limited performance in such environments due to sparse-reward nature of environment and no fully observable model representation. In this paper we present a constructive learning and planning scheme that reduces the complexity of high-diemensional agent model by decomposing it into identical, connected and scaled down multiagent structure and then apply learning framework in layers of local and global ranking. Our layered hierarchy method also decomposes the final goal into multiple sub-tasks and a global task (final goal) that is bias-induced function of local sub-tasks. Local layer deals with learning ‘reusable’ local policy for a local agent to achieve a sub-task optimally; that local policy can also be reused by other identical local agents. Furthermore, global layer learns a policy to apply right combination of local policies that are parameterized over entire connected structure of local agents to achieve the global task by collaborative construction of local agents. After learning local policies and while learning global policy, the framework generates sub-tasks for each local agent, and accepts local agents’ intrinsic rewards as positive bias towards maximum global reward based of optimal sub-tasks assignments. The advantage of proposed approach includes better exploration due to decomposition of dimensions, and reusability of learning paradigm over extended dimension spaces. We apply the constructive policy method to serpentine robot with hyper-redundant degrees of freedom (DOF), for achieving optimal control and we also outline connection to hierarchical apprenticeship learning methods which can be seen as layered learning framework for complex control tasks.","PeriodicalId":6695,"journal":{"name":"2019 IEEE 15th International Conference on Automation Science and Engineering (CASE)","volume":"23 1","pages":"257-262"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 15th International Conference on Automation Science and Engineering (CASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COASE.2019.8843223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Policy based reinforcement learning methods are widely used for multi-agent systems to learn optimal actions given any state; with partial or even no model representation. However multi-agent systems with complex structures (curse of dimensionality) or with high constraints (like bio-inspired (a) snake or serpentine robots) show limited performance in such environments due to sparse-reward nature of environment and no fully observable model representation. In this paper we present a constructive learning and planning scheme that reduces the complexity of high-diemensional agent model by decomposing it into identical, connected and scaled down multiagent structure and then apply learning framework in layers of local and global ranking. Our layered hierarchy method also decomposes the final goal into multiple sub-tasks and a global task (final goal) that is bias-induced function of local sub-tasks. Local layer deals with learning ‘reusable’ local policy for a local agent to achieve a sub-task optimally; that local policy can also be reused by other identical local agents. Furthermore, global layer learns a policy to apply right combination of local policies that are parameterized over entire connected structure of local agents to achieve the global task by collaborative construction of local agents. After learning local policies and while learning global policy, the framework generates sub-tasks for each local agent, and accepts local agents’ intrinsic rewards as positive bias towards maximum global reward based of optimal sub-tasks assignments. The advantage of proposed approach includes better exploration due to decomposition of dimensions, and reusability of learning paradigm over extended dimension spaces. We apply the constructive policy method to serpentine robot with hyper-redundant degrees of freedom (DOF), for achieving optimal control and we also outline connection to hierarchical apprenticeship learning methods which can be seen as layered learning framework for complex control tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
建设性策略:连接多智能体系统的强化学习方法
基于策略的强化学习方法被广泛应用于多智能体系统中,以学习给定任何状态下的最优行为;有部分甚至没有模型表示。然而,具有复杂结构(维度诅咒)或高约束(如仿生(a)蛇或蛇形机器人)的多智能体系统在这种环境中表现有限,这是由于环境的稀疏奖励性质和没有完全可观察的模型表示。在本文中,我们提出了一种建设性的学习和规划方案,通过将高维智能体模型分解成相同的、连接的和按比例缩小的多智能体结构,从而降低了高维智能体模型的复杂性,然后在局部和全局排序层中应用学习框架。我们的分层分层方法还将最终目标分解为多个子任务和一个全局任务(最终目标),该全局任务是局部子任务的偏置函数。局部层负责为局部代理学习“可重用”的局部策略,以最优地实现子任务;该本地策略也可以被其他相同的本地代理重用。全局层学习策略,将参数化的局部策略正确组合到局部智能体的整个连接结构上,通过局部智能体的协同构建实现全局任务。该框架在学习局部策略和学习全局策略的同时,为每个局部智能体生成子任务,并接受局部智能体的内在奖励作为基于最优子任务分配的最大全局奖励的正偏差。该方法的优点是由于维度的分解而具有更好的探索能力,并且在扩展维度空间上具有学习范式的可重用性。我们将建设性策略方法应用于具有超冗余自由度(DOF)的蛇形机器人,以实现最优控制,并且我们还概述了与分层学徒学习方法的联系,该方法可以被视为复杂控制任务的分层学习框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A proposed mapping method for aligning machine execution data to numerical control code optimizing outpatient Department Staffing Level using Multi-Fidelity Models Advanced Sensor and Target Development to Support Robot Accuracy Degradation Assessment Multi-Task Hierarchical Imitation Learning for Home Automation Deep Reinforcement Learning of Robotic Precision Insertion Skill Accelerated by Demonstrations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1