基于分解的多目标XCS多目标强化学习进化算法

2018 IEEE Congress on Evolutionary Computation (CEC) Pub Date : 2018-07-01 DOI:10.1109/CEC.2018.8477931

Xiu Cheng, Will N. Browne, Mengjie Zhang

{"title":"基于分解的多目标XCS多目标强化学习进化算法","authors":"Xiu Cheng, Will N. Browne, Mengjie Zhang","doi":"10.1109/CEC.2018.8477931","DOIUrl":null,"url":null,"abstract":"Learning Classifier Systems (LCSs) have been widely used to tackle Reinforcement Learning (RL) problems as they have a good generalization ability and provide a simple understandable rule-based solution. The accuracy-based LCS, XCS, has been most popularly used for single-objective RL problems. As many real-world problems exhibit multiple conflicting objectives recent work has sought to adapt XCS to Multi-Objective Reinforcement Learning (MORL) tasks. However, many of these algorithms need large storage or cannot discover the Pareto Optimal solutions. This is due to the complexity of finding a policy having multiple steps to multiple possible objectives. This paper aims to employ a decomposition strategy based on MOEA/D in XCS to approximate complex Pareto Fronts. In order to achieve multi-objective learning, a new MORL algorithm has been developed based on XCS and MOEA/D. The experimental results show that on complex bi-objective maze problems our MORL algorithm is able to learn a group of Pareto optimal solutions for MORL problems without huge storage. Analysis of the learned policies shows successful trade-offs between the distance to the reward versus the amount of reward itself.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Decomposition Based Multi-Objective Evolutionary Algorithm in XCS for Multi-Objective Reinforcement Learning\",\"authors\":\"Xiu Cheng, Will N. Browne, Mengjie Zhang\",\"doi\":\"10.1109/CEC.2018.8477931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning Classifier Systems (LCSs) have been widely used to tackle Reinforcement Learning (RL) problems as they have a good generalization ability and provide a simple understandable rule-based solution. The accuracy-based LCS, XCS, has been most popularly used for single-objective RL problems. As many real-world problems exhibit multiple conflicting objectives recent work has sought to adapt XCS to Multi-Objective Reinforcement Learning (MORL) tasks. However, many of these algorithms need large storage or cannot discover the Pareto Optimal solutions. This is due to the complexity of finding a policy having multiple steps to multiple possible objectives. This paper aims to employ a decomposition strategy based on MOEA/D in XCS to approximate complex Pareto Fronts. In order to achieve multi-objective learning, a new MORL algorithm has been developed based on XCS and MOEA/D. The experimental results show that on complex bi-objective maze problems our MORL algorithm is able to learn a group of Pareto optimal solutions for MORL problems without huge storage. Analysis of the learned policies shows successful trade-offs between the distance to the reward versus the amount of reward itself.\",\"PeriodicalId\":212677,\"journal\":{\"name\":\"2018 IEEE Congress on Evolutionary Computation (CEC)\",\"volume\":\"159 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Congress on Evolutionary Computation (CEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEC.2018.8477931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2018.8477931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

学习分类器系统(LCSs)由于具有良好的泛化能力和提供简单易懂的基于规则的解决方案而被广泛用于解决强化学习(RL)问题。基于精度的LCS (XCS)在单目标强化学习问题中应用最为广泛。由于许多现实世界的问题表现出多个相互冲突的目标，最近的工作试图使XCS适应多目标强化学习(MORL)任务。然而，许多算法需要较大的存储空间或无法发现帕累托最优解。这是由于寻找具有多个可能目标的多个步骤的策略的复杂性。本文旨在利用XCS中基于MOEA/D的分解策略来逼近复杂的Pareto front。为了实现多目标学习，提出了一种基于XCS和MOEA/D的MORL算法。实验结果表明，在复杂的双目标迷宫问题上，该算法能够学习到一组Pareto最优解，而无需大量存储。对学习策略的分析表明，在与奖励的距离和奖励本身的数量之间取得了成功的权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Decomposition Based Multi-Objective Evolutionary Algorithm in XCS for Multi-Objective Reinforcement Learning

Learning Classifier Systems (LCSs) have been widely used to tackle Reinforcement Learning (RL) problems as they have a good generalization ability and provide a simple understandable rule-based solution. The accuracy-based LCS, XCS, has been most popularly used for single-objective RL problems. As many real-world problems exhibit multiple conflicting objectives recent work has sought to adapt XCS to Multi-Objective Reinforcement Learning (MORL) tasks. However, many of these algorithms need large storage or cannot discover the Pareto Optimal solutions. This is due to the complexity of finding a policy having multiple steps to multiple possible objectives. This paper aims to employ a decomposition strategy based on MOEA/D in XCS to approximate complex Pareto Fronts. In order to achieve multi-objective learning, a new MORL algorithm has been developed based on XCS and MOEA/D. The experimental results show that on complex bi-objective maze problems our MORL algorithm is able to learn a group of Pareto optimal solutions for MORL problems without huge storage. Analysis of the learned policies shows successful trade-offs between the distance to the reward versus the amount of reward itself.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE Congress on Evolutionary Computation (CEC)

自引率

0.00%

发文量