模块化智能体强化学习中的策略重用

2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT) Pub Date : 2019-03-01 DOI:10.1109/INFOCT.2019.8710861

Sayyed Jaffar Ali Raza, Mingjie Lin

{"title":"模块化智能体强化学习中的策略重用","authors":"Sayyed Jaffar Ali Raza, Mingjie Lin","doi":"10.1109/INFOCT.2019.8710861","DOIUrl":null,"url":null,"abstract":"We present reusable policy method for modular reinforcement learning problem in continuous state space. Our method relies on two-layered learning architecture. The first layer partitions the agent’s problem space into n-folds sub-agents that are inter-connected with each other with dexterity identical to original problem. It further learns a local control policy for standalone 1-fold sub-agent. The second layer learns a global policy to reuse ‘already learnt’ standalone local policy over each n sub-agents by sampling local policy with global parameters for each sub-agent—parameterizing local policy independently to approximate non-linear interconnections between sub-agents. We demonstrate our method on simulation example of 12-DOF modular robot that learns maneuver pattern of snake-like gait. We also compare our proposed method against standard single-policy learning methods to benchmark optimality.","PeriodicalId":369231,"journal":{"name":"2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Policy Reuse in Reinforcement Learning for Modular Agents\",\"authors\":\"Sayyed Jaffar Ali Raza, Mingjie Lin\",\"doi\":\"10.1109/INFOCT.2019.8710861\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present reusable policy method for modular reinforcement learning problem in continuous state space. Our method relies on two-layered learning architecture. The first layer partitions the agent’s problem space into n-folds sub-agents that are inter-connected with each other with dexterity identical to original problem. It further learns a local control policy for standalone 1-fold sub-agent. The second layer learns a global policy to reuse ‘already learnt’ standalone local policy over each n sub-agents by sampling local policy with global parameters for each sub-agent—parameterizing local policy independently to approximate non-linear interconnections between sub-agents. We demonstrate our method on simulation example of 12-DOF modular robot that learns maneuver pattern of snake-like gait. We also compare our proposed method against standard single-policy learning methods to benchmark optimality.\",\"PeriodicalId\":369231,\"journal\":{\"name\":\"2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCT.2019.8710861\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCT.2019.8710861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

针对连续状态空间中的模块化强化学习问题，提出了可重用策略方法。我们的方法依赖于两层学习架构。第一层将智能体的问题空间划分为n层子智能体，这些子智能体以与原问题相同的灵巧度相互连接。它进一步学习了独立1-fold子代理的本地控制策略。第二层学习全局策略，通过对每个子代理使用全局参数采样本地策略，从而在每n个子代理上重用“已经学习过的”独立本地策略，独立参数化本地策略以近似子代理之间的非线性互连。最后以一个学习蛇形步态机动模式的12自由度模块化机器人为例进行了仿真验证。我们还将我们提出的方法与标准的单策略学习方法进行了比较，以衡量最优性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Policy Reuse in Reinforcement Learning for Modular Agents

We present reusable policy method for modular reinforcement learning problem in continuous state space. Our method relies on two-layered learning architecture. The first layer partitions the agent’s problem space into n-folds sub-agents that are inter-connected with each other with dexterity identical to original problem. It further learns a local control policy for standalone 1-fold sub-agent. The second layer learns a global policy to reuse ‘already learnt’ standalone local policy over each n sub-agents by sampling local policy with global parameters for each sub-agent—parameterizing local policy independently to approximate non-linear interconnections between sub-agents. We demonstrate our method on simulation example of 12-DOF modular robot that learns maneuver pattern of snake-like gait. We also compare our proposed method against standard single-policy learning methods to benchmark optimality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT)

自引率

0.00%

发文量