{"title":"Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems","authors":"Sayyed Jaffar Ali Raza, Mingjie Lin","doi":"10.1109/COASE.2019.8843223","DOIUrl":null,"url":null,"abstract":"Policy based reinforcement learning methods are widely used for multi-agent systems to learn optimal actions given any state; with partial or even no model representation. However multi-agent systems with complex structures (curse of dimensionality) or with high constraints (like bio-inspired (a) snake or serpentine robots) show limited performance in such environments due to sparse-reward nature of environment and no fully observable model representation. In this paper we present a constructive learning and planning scheme that reduces the complexity of high-diemensional agent model by decomposing it into identical, connected and scaled down multiagent structure and then apply learning framework in layers of local and global ranking. Our layered hierarchy method also decomposes the final goal into multiple sub-tasks and a global task (final goal) that is bias-induced function of local sub-tasks. Local layer deals with learning ‘reusable’ local policy for a local agent to achieve a sub-task optimally; that local policy can also be reused by other identical local agents. Furthermore, global layer learns a policy to apply right combination of local policies that are parameterized over entire connected structure of local agents to achieve the global task by collaborative construction of local agents. After learning local policies and while learning global policy, the framework generates sub-tasks for each local agent, and accepts local agents’ intrinsic rewards as positive bias towards maximum global reward based of optimal sub-tasks assignments. The advantage of proposed approach includes better exploration due to decomposition of dimensions, and reusability of learning paradigm over extended dimension spaces. We apply the constructive policy method to serpentine robot with hyper-redundant degrees of freedom (DOF), for achieving optimal control and we also outline connection to hierarchical apprenticeship learning methods which can be seen as layered learning framework for complex control tasks.","PeriodicalId":6695,"journal":{"name":"2019 IEEE 15th International Conference on Automation Science and Engineering (CASE)","volume":"23 1","pages":"257-262"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 15th International Conference on Automation Science and Engineering (CASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COASE.2019.8843223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Policy based reinforcement learning methods are widely used for multi-agent systems to learn optimal actions given any state; with partial or even no model representation. However multi-agent systems with complex structures (curse of dimensionality) or with high constraints (like bio-inspired (a) snake or serpentine robots) show limited performance in such environments due to sparse-reward nature of environment and no fully observable model representation. In this paper we present a constructive learning and planning scheme that reduces the complexity of high-diemensional agent model by decomposing it into identical, connected and scaled down multiagent structure and then apply learning framework in layers of local and global ranking. Our layered hierarchy method also decomposes the final goal into multiple sub-tasks and a global task (final goal) that is bias-induced function of local sub-tasks. Local layer deals with learning ‘reusable’ local policy for a local agent to achieve a sub-task optimally; that local policy can also be reused by other identical local agents. Furthermore, global layer learns a policy to apply right combination of local policies that are parameterized over entire connected structure of local agents to achieve the global task by collaborative construction of local agents. After learning local policies and while learning global policy, the framework generates sub-tasks for each local agent, and accepts local agents’ intrinsic rewards as positive bias towards maximum global reward based of optimal sub-tasks assignments. The advantage of proposed approach includes better exploration due to decomposition of dimensions, and reusability of learning paradigm over extended dimension spaces. We apply the constructive policy method to serpentine robot with hyper-redundant degrees of freedom (DOF), for achieving optimal control and we also outline connection to hierarchical apprenticeship learning methods which can be seen as layered learning framework for complex control tasks.