Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration

Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou
{"title":"Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration","authors":"Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou","doi":"arxiv-2312.11797","DOIUrl":null,"url":null,"abstract":"We study Merton's expected utility maximization problem in an incomplete\nmarket, characterized by a factor process in addition to the stock price\nprocess, where all the model primitives are unknown. We take the reinforcement\nlearning (RL) approach to learn optimal portfolio policies directly by\nexploring the unknown market, without attempting to estimate the model\nparameters. Based on the entropy-regularization framework for general\ncontinuous-time RL formulated in Wang et al. (2020), we propose a recursive\nweighting scheme on exploration that endogenously discounts the current\nexploration reward by the past accumulative amount of exploration. Such a\nrecursive regularization restores the optimality of Gaussian exploration.\nHowever, contrary to the existing results, the optimal Gaussian policy turns\nout to be biased in general, due to the interwinding needs for hedging and for\nexploration. We present an asymptotic analysis of the resulting errors to show\nhow the level of exploration affects the learned policies. Furthermore, we\nestablish a policy improvement theorem and design several RL algorithms to\nlearn Merton's optimal strategies. At last, we carry out both simulation and\nempirical studies with a stochastic volatility environment to demonstrate the\nefficiency and robustness of the RL algorithms in comparison to the\nconventional plug-in method.","PeriodicalId":501045,"journal":{"name":"arXiv - QuantFin - Portfolio Management","volume":"80 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Portfolio Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.11797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. We take the reinforcement learning (RL) approach to learn optimal portfolio policies directly by exploring the unknown market, without attempting to estimate the model parameters. Based on the entropy-regularization framework for general continuous-time RL formulated in Wang et al. (2020), we propose a recursive weighting scheme on exploration that endogenously discounts the current exploration reward by the past accumulative amount of exploration. Such a recursive regularization restores the optimality of Gaussian exploration. However, contrary to the existing results, the optimal Gaussian policy turns out to be biased in general, due to the interwinding needs for hedging and for exploration. We present an asymptotic analysis of the resulting errors to show how the level of exploration affects the learned policies. Furthermore, we establish a policy improvement theorem and design several RL algorithms to learn Merton's optimal strategies. At last, we carry out both simulation and empirical studies with a stochastic volatility environment to demonstrate the efficiency and robustness of the RL algorithms in comparison to the conventional plug-in method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在不完全市场中学习默顿策略:递归熵正则化与有偏高斯探索
我们研究了不完全市场中的默顿预期效用最大化问题,该市场的特点是除了股票价格过程之外还有一个因子过程,其中所有的模型基元都是未知的。我们采用强化学习(RL)方法,通过探索未知市场直接学习最优投资组合策略,而不试图估计模型参数。基于 Wang 等人(2020)中提出的一般连续时间 RL 的熵正则化框架,我们提出了一种探索递归加权方案,该方案通过过去的探索累积量对当前探索奖励进行内生折扣。然而,与现有结果相反,由于对冲和前探索的相互缠绕需求,最优高斯策略在一般情况下是有偏差的。我们对由此产生的误差进行了渐近分析,以说明探索水平如何影响学习到的策略。此外,我们还建立了一个策略改进定理,并设计了几种 RL 算法来学习 Merton 的最优策略。最后,我们在随机波动环境下进行了模拟和实证研究,证明了 RL 算法与传统插件法相比的高效性和稳健性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal Investment with Costly Expert Opinions Anatomy of Machines for Markowitz: Decision-Focused Learning for Mean-Variance Portfolio Optimization Disentangling the sources of cyber risk premia A Deep Reinforcement Learning Framework For Financial Portfolio Management Betting Against (Bad) Beta
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1