Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration

arXiv - QuantFin - Portfolio Management Pub Date : 2023-12-19 DOI:arxiv-2312.11797

Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou

{"title":"Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration","authors":"Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou","doi":"arxiv-2312.11797","DOIUrl":null,"url":null,"abstract":"We study Merton's expected utility maximization problem in an incomplete\nmarket, characterized by a factor process in addition to the stock price\nprocess, where all the model primitives are unknown. We take the reinforcement\nlearning (RL) approach to learn optimal portfolio policies directly by\nexploring the unknown market, without attempting to estimate the model\nparameters. Based on the entropy-regularization framework for general\ncontinuous-time RL formulated in Wang et al. (2020), we propose a recursive\nweighting scheme on exploration that endogenously discounts the current\nexploration reward by the past accumulative amount of exploration. Such a\nrecursive regularization restores the optimality of Gaussian exploration.\nHowever, contrary to the existing results, the optimal Gaussian policy turns\nout to be biased in general, due to the interwinding needs for hedging and for\nexploration. We present an asymptotic analysis of the resulting errors to show\nhow the level of exploration affects the learned policies. Furthermore, we\nestablish a policy improvement theorem and design several RL algorithms to\nlearn Merton's optimal strategies. At last, we carry out both simulation and\nempirical studies with a stochastic volatility environment to demonstrate the\nefficiency and robustness of the RL algorithms in comparison to the\nconventional plug-in method.","PeriodicalId":501045,"journal":{"name":"arXiv - QuantFin - Portfolio Management","volume":"80 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Portfolio Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.11797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. We take the reinforcement learning (RL) approach to learn optimal portfolio policies directly by exploring the unknown market, without attempting to estimate the model parameters. Based on the entropy-regularization framework for general continuous-time RL formulated in Wang et al. (2020), we propose a recursive weighting scheme on exploration that endogenously discounts the current exploration reward by the past accumulative amount of exploration. Such a recursive regularization restores the optimality of Gaussian exploration. However, contrary to the existing results, the optimal Gaussian policy turns out to be biased in general, due to the interwinding needs for hedging and for exploration. We present an asymptotic analysis of the resulting errors to show how the level of exploration affects the learned policies. Furthermore, we establish a policy improvement theorem and design several RL algorithms to learn Merton's optimal strategies. At last, we carry out both simulation and empirical studies with a stochastic volatility environment to demonstrate the efficiency and robustness of the RL algorithms in comparison to the conventional plug-in method.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在不完全市场中学习默顿策略：递归熵正则化与有偏高斯探索

我们研究了不完全市场中的默顿预期效用最大化问题，该市场的特点是除了股票价格过程之外还有一个因子过程，其中所有的模型基元都是未知的。我们采用强化学习（RL）方法，通过探索未知市场直接学习最优投资组合策略，而不试图估计模型参数。基于 Wang 等人（2020）中提出的一般连续时间 RL 的熵正则化框架，我们提出了一种探索递归加权方案，该方案通过过去的探索累积量对当前探索奖励进行内生折扣。然而，与现有结果相反，由于对冲和前探索的相互缠绕需求，最优高斯策略在一般情况下是有偏差的。我们对由此产生的误差进行了渐近分析，以说明探索水平如何影响学习到的策略。此外，我们还建立了一个策略改进定理，并设计了几种 RL 算法来学习 Merton 的最优策略。最后，我们在随机波动环境下进行了模拟和实证研究，证明了 RL 算法与传统插件法相比的高效性和稳健性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - QuantFin - Portfolio Management

自引率

0.00%

发文量

期刊最新文献

Optimal Investment with Costly Expert Opinions Anatomy of Machines for Markowitz: Decision-Focused Learning for Mean-Variance Portfolio Optimization Disentangling the sources of cyber risk premia A Deep Reinforcement Learning Framework For Financial Portfolio Management Betting Against (Bad) Beta