在线强化学习的进化特征评价

2013 IEEE Conference on Computational Inteligence in Games (CIG) Pub Date : 2013-10-17 DOI:10.1109/CIG.2013.6633648

J. Bishop, R. Miikkulainen

{"title":"在线强化学习的进化特征评价","authors":"J. Bishop, R. Miikkulainen","doi":"10.1109/CIG.2013.6633648","DOIUrl":null,"url":null,"abstract":"Most successful examples of Reinforcement Learning (RL) report the use of carefully designed features, that is, a representation of the problem state that facilitates effective learning. The best features cannot always be known in advance, creating the need to evaluate more features than will ultimately be chosen. This paper presents Temporal Difference Feature Evaluation (TDFE), a novel approach to the problem of feature evaluation in an online RL agent. TDFE combines value function learning by temporal difference methods with an evolutionary algorithm that searches the space of feature subsets, and outputs franking over all individual features. TDFE dynamically adjusts its ranking, avoids the sample complexity multiplier of many population-based approaches, and works with arbitrary feature representations. Online learning experiments are performed in the game of Connect Four, establishing (i) that the choice of features is critical, (ii) that TDFE can evaluate and rank all the available features online, and (iii) that the ranking can be used effectively as the basis of dynamic online feature selection.","PeriodicalId":158902,"journal":{"name":"2013 IEEE Conference on Computational Inteligence in Games (CIG)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Evolutionary Feature Evaluation for Online Reinforcement Learning\",\"authors\":\"J. Bishop, R. Miikkulainen\",\"doi\":\"10.1109/CIG.2013.6633648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most successful examples of Reinforcement Learning (RL) report the use of carefully designed features, that is, a representation of the problem state that facilitates effective learning. The best features cannot always be known in advance, creating the need to evaluate more features than will ultimately be chosen. This paper presents Temporal Difference Feature Evaluation (TDFE), a novel approach to the problem of feature evaluation in an online RL agent. TDFE combines value function learning by temporal difference methods with an evolutionary algorithm that searches the space of feature subsets, and outputs franking over all individual features. TDFE dynamically adjusts its ranking, avoids the sample complexity multiplier of many population-based approaches, and works with arbitrary feature representations. Online learning experiments are performed in the game of Connect Four, establishing (i) that the choice of features is critical, (ii) that TDFE can evaluate and rank all the available features online, and (iii) that the ranking can be used effectively as the basis of dynamic online feature selection.\",\"PeriodicalId\":158902,\"journal\":{\"name\":\"2013 IEEE Conference on Computational Inteligence in Games (CIG)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Conference on Computational Inteligence in Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2013.6633648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Conference on Computational Inteligence in Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2013.6633648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

大多数成功的强化学习(RL)的例子都使用了精心设计的特征，也就是说，问题状态的表示促进了有效的学习。最好的功能并不总是预先知道的，这就需要评估比最终选择的功能更多的功能。提出了一种用于在线RL智能体特征评估的新方法——时间差分特征评估(TDFE)。TDFE结合了时间差分法的值函数学习和一种搜索特征子集空间的进化算法，并输出所有单个特征的排序。TDFE动态调整其排序，避免了许多基于种群的方法的样本复杂度乘数，并且可以使用任意的特征表示。在Connect Four的游戏中进行了在线学习实验，建立了(i)特征的选择是至关重要的，(ii) TDFE可以在线评估和排名所有可用的特征，以及(iii)排名可以有效地用作动态在线特征选择的基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Evolutionary Feature Evaluation for Online Reinforcement Learning

Most successful examples of Reinforcement Learning (RL) report the use of carefully designed features, that is, a representation of the problem state that facilitates effective learning. The best features cannot always be known in advance, creating the need to evaluate more features than will ultimately be chosen. This paper presents Temporal Difference Feature Evaluation (TDFE), a novel approach to the problem of feature evaluation in an online RL agent. TDFE combines value function learning by temporal difference methods with an evolutionary algorithm that searches the space of feature subsets, and outputs franking over all individual features. TDFE dynamically adjusts its ranking, avoids the sample complexity multiplier of many population-based approaches, and works with arbitrary feature representations. Online learning experiments are performed in the game of Connect Four, establishing (i) that the choice of features is critical, (ii) that TDFE can evaluate and rank all the available features online, and (iii) that the ranking can be used effectively as the basis of dynamic online feature selection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE Conference on Computational Inteligence in Games (CIG)

自引率

0.00%

发文量

期刊最新文献

QL-BT: Enhancing behaviour tree design and implementation with Q-learning Landscape automata for search based procedural content generation The structure of a 3-state finite transducer representation for Prisoner's Dilemma LGOAP: Adaptive layered planning for real-time videogames Evolved weapons for RPG drop systems