Reinforcement learning with state-dependent discount factor

N. Yoshida, E. Uchibe, K. Doya
{"title":"Reinforcement learning with state-dependent discount factor","authors":"N. Yoshida, E. Uchibe, K. Doya","doi":"10.1109/DEVLRN.2013.6652533","DOIUrl":null,"url":null,"abstract":"Conventional reinforcement learning algorithms have several parameters which determine the feature of learning process, called meta-parameters. In this study, we focus on the discount factor that influences the time scale of the tradeoff between immediate and delayed rewards. The discount factor is usually considered as a constant value, but we introduce the state-dependent discount function and a new optimization criterion for the reinforcement learning algorithm. We first derive a new algorithm under the criterion, named ExQ-learning and we prove that the algorithm converges to the optimal action-value function in the meaning of new criterion w.p.1. We then present a framework to optimize the discount factor and the discount function by using an evolutionary algorithm. In order to validate the proposed method, we conduct a simple computer simulation and show that the proposed algorithm can find an appropriate state-dependent discount function with which performs better than that with a constant discount factor.","PeriodicalId":106997,"journal":{"name":"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2013.6652533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Conventional reinforcement learning algorithms have several parameters which determine the feature of learning process, called meta-parameters. In this study, we focus on the discount factor that influences the time scale of the tradeoff between immediate and delayed rewards. The discount factor is usually considered as a constant value, but we introduce the state-dependent discount function and a new optimization criterion for the reinforcement learning algorithm. We first derive a new algorithm under the criterion, named ExQ-learning and we prove that the algorithm converges to the optimal action-value function in the meaning of new criterion w.p.1. We then present a framework to optimize the discount factor and the discount function by using an evolutionary algorithm. In order to validate the proposed method, we conduct a simple computer simulation and show that the proposed algorithm can find an appropriate state-dependent discount function with which performs better than that with a constant discount factor.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
状态相关折现因子的强化学习
传统的强化学习算法有几个参数,这些参数决定了学习过程的特征,称为元参数。在本研究中,我们关注影响即时和延迟奖励权衡的时间尺度的折扣因子。折扣因子通常被认为是一个常数,但我们引入了状态相关的折扣函数和一个新的强化学习算法优化准则。我们首先在该准则下推导了一个新的算法ExQ-learning,并证明了该算法收敛于新准则w.p.1意义下的最优动作值函数。然后,我们提出了一个利用进化算法优化折现因子和折现函数的框架。为了验证所提出的方法,我们进行了简单的计算机模拟,并表明所提出的算法可以找到合适的状态相关折扣函数,该函数比使用恒定折扣因子的算法性能更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Epigenetic adaptation through hormone modulation in autonomous robots Attentional constraints and statistics in toddlers' word learning Do humans need learning to read humanoid lifting actions? Temporal emphasis for goal extraction in task demonstration to a humanoid robot by naive users Developing learnability — The case for reduced dimensionality
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1