{"title":"Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis","authors":"Guoyong Wang, Tiange Fu, Ruijuan Zheng, Xuhui Zhao, Junlong Zhu, Mingchuan Zhang","doi":"10.1007/s40747-024-01757-w","DOIUrl":null,"url":null,"abstract":"<p>Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (<b>AdaBNTD</b>) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for <b>AdaBNTD</b> within the Markovian observation framework. In particular, <b>AdaBNTD</b> is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of <span>\\({{\\mathcal {O}}}(1/\\sqrt{K})\\)</span>, where <i>K</i> denotes the iteration count. Besides, the effectiveness <b>AdaBNTD</b> is also verified through several reinforcement learning benchmark domains.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"31 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01757-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (AdaBNTD) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for AdaBNTD within the Markovian observation framework. In particular, AdaBNTD is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of \({{\mathcal {O}}}(1/\sqrt{K})\), where K denotes the iteration count. Besides, the effectiveness AdaBNTD is also verified through several reinforcement learning benchmark domains.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.