Policy Reuse in Reinforcement Learning for Modular Agents

2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT) Pub Date : 2019-03-01 DOI:10.1109/INFOCT.2019.8710861

Sayyed Jaffar Ali Raza, Mingjie Lin

引用次数: 1

Abstract

We present reusable policy method for modular reinforcement learning problem in continuous state space. Our method relies on two-layered learning architecture. The first layer partitions the agent’s problem space into n-folds sub-agents that are inter-connected with each other with dexterity identical to original problem. It further learns a local control policy for standalone 1-fold sub-agent. The second layer learns a global policy to reuse ‘already learnt’ standalone local policy over each n sub-agents by sampling local policy with global parameters for each sub-agent—parameterizing local policy independently to approximate non-linear interconnections between sub-agents. We demonstrate our method on simulation example of 12-DOF modular robot that learns maneuver pattern of snake-like gait. We also compare our proposed method against standard single-policy learning methods to benchmark optimality.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

模块化智能体强化学习中的策略重用

针对连续状态空间中的模块化强化学习问题，提出了可重用策略方法。我们的方法依赖于两层学习架构。第一层将智能体的问题空间划分为n层子智能体，这些子智能体以与原问题相同的灵巧度相互连接。它进一步学习了独立1-fold子代理的本地控制策略。第二层学习全局策略，通过对每个子代理使用全局参数采样本地策略，从而在每n个子代理上重用“已经学习过的”独立本地策略，独立参数化本地策略以近似子代理之间的非线性互连。最后以一个学习蛇形步态机动模式的12自由度模块化机器人为例进行了仿真验证。我们还将我们提出的方法与标准的单策略学习方法进行了比较，以衡量最优性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT)

自引率

0.00%

发文量