{"title":"Improving reinforcement learning algorithms: Towards optimal learning rate policies","authors":"Othmane Mounjid, Charles-Albert Lehalle","doi":"10.1111/mafi.12378","DOIUrl":null,"url":null,"abstract":"<p>This paper shows how to use results of statistical learning theory and stochastic algorithms to have a better understanding of the convergence of Reinforcement Learning (RL) once it is formulated as a fixed point problem. This can be used to propose improvement of RL learning rates. First, our analysis shows that the classical asymptotic convergence rate <math>\n <semantics>\n <mrow>\n <mi>O</mi>\n <mo>(</mo>\n <mn>1</mn>\n <mo>/</mo>\n <msqrt>\n <mi>N</mi>\n </msqrt>\n <mo>)</mo>\n </mrow>\n <annotation>$O(1/\\sqrt {N})$</annotation>\n </semantics></math> is pessimistic and can be replaced by <math>\n <semantics>\n <mrow>\n <mi>O</mi>\n <mo>(</mo>\n <msup>\n <mrow>\n <mo>(</mo>\n <mi>log</mi>\n <mrow>\n <mo>(</mo>\n <mi>N</mi>\n <mo>)</mo>\n </mrow>\n <mo>/</mo>\n <mi>N</mi>\n <mo>)</mo>\n </mrow>\n <mi>β</mi>\n </msup>\n <mo>)</mo>\n </mrow>\n <annotation>$O((\\log (N)/N)^{\\beta })$</annotation>\n </semantics></math> with <math>\n <semantics>\n <mrow>\n <mfrac>\n <mn>1</mn>\n <mn>2</mn>\n </mfrac>\n <mo>≤</mo>\n <mi>β</mi>\n <mo>≤</mo>\n <mn>1</mn>\n </mrow>\n <annotation>$\\frac{1}{2}\\le \\beta \\le 1$</annotation>\n </semantics></math>, and <math>\n <semantics>\n <mi>N</mi>\n <annotation>$N$</annotation>\n </semantics></math> the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate used in RL. We decompose our policy into two interacting levels: the inner and outer levels. In the inner level, we present the PASS algorithm (for “PAst Sign Search”) which, based on a predefined sequence of learning rates, constructs a new sequence for which the error decreases faster. The convergence of PASS is proved and error bounds are established. In the outer level, we propose an optimal methodology for the selection of the predefined sequence. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in RL for the three following applications: the estimation of a drift, the optimal placement of limit orders, and the optimal execution of a large number of shares.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"34 2","pages":"588-621"},"PeriodicalIF":1.6000,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12378","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Finance","FirstCategoryId":"96","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/mafi.12378","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper shows how to use results of statistical learning theory and stochastic algorithms to have a better understanding of the convergence of Reinforcement Learning (RL) once it is formulated as a fixed point problem. This can be used to propose improvement of RL learning rates. First, our analysis shows that the classical asymptotic convergence rate is pessimistic and can be replaced by with , and the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate used in RL. We decompose our policy into two interacting levels: the inner and outer levels. In the inner level, we present the PASS algorithm (for “PAst Sign Search”) which, based on a predefined sequence of learning rates, constructs a new sequence for which the error decreases faster. The convergence of PASS is proved and error bounds are established. In the outer level, we propose an optimal methodology for the selection of the predefined sequence. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in RL for the three following applications: the estimation of a drift, the optimal placement of limit orders, and the optimal execution of a large number of shares.
期刊介绍:
Mathematical Finance seeks to publish original research articles focused on the development and application of novel mathematical and statistical methods for the analysis of financial problems.
The journal welcomes contributions on new statistical methods for the analysis of financial problems. Empirical results will be appropriate to the extent that they illustrate a statistical technique, validate a model or provide insight into a financial problem. Papers whose main contribution rests on empirical results derived with standard approaches will not be considered.