Greedy feature replacement for online value function approximation

Journal of Zhejiang University-Science C-Computers & Electronics Pub Date : 2014-03-01 DOI:10.1631/jzus.C1300246

Fengfei Zhao, Zhengyang Qin, Zhuo Shao, Jun Fang, Bo-yan Ren

{"title":"Greedy feature replacement for online value function approximation","authors":"Fengfei Zhao, Zhengyang Qin, Zhuo Shao, Jun Fang, Bo-yan Ren","doi":"10.1631/jzus.C1300246","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement (GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference (TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.","PeriodicalId":49947,"journal":{"name":"Journal of Zhejiang University-Science C-Computers & Electronics","volume":"15 1","pages":"223 - 231"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1631/jzus.C1300246","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Zhejiang University-Science C-Computers & Electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1631/jzus.C1300246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Reinforcement learning (RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement (GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference (TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

贪心特征替换在线值函数逼近

现实问题中的强化学习(RL)需要依赖于选择合适的特征表示的函数近似。表示展开技术可以使线性逼近器更有效地表示值函数;然而，这些技术大多只适用于低维问题。在本文中，我们提出了贪婪特征替换(GFR)，这是一种新的在线扩展技术，用于使用二进制特征的基于值的RL算法。给定一个简单的初始表示，特征表示是增量扩展的。新的特征依赖自动添加到当前表示中，并使用连接特征贪婪地替换当前特征。记录每个连接特征的虚拟时间差(TD)误差，以判断替换是否可以改善近似。给出了GFR的正确性保证和计算复杂度分析。两个领域的实验结果表明，GFR实现了更快的学习速度，并具有处理大规模问题的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊