Reinforcement Learning for Jump-Diffusions

arXiv - QuantFin - Mathematical Finance Pub Date : 2024-05-26 DOI:arxiv-2405.16449

Xuefeng Gao, Lingfei Li, Xun Yu Zhou

引用次数: 0

Abstract

We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and q-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. Finally, we investigate as an application the mean-variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

跳跃扩散的强化学习

我们研究了系统动态受跳跃-扩散过程支配的随机控制连续时间强化学习（RL）。我们提出了一个具有随机策略的熵正则化探索控制问题，以实现 RL 所必需的探索-开发平衡。与 Wang 等人（2020）最初研究的 purediffusion 情况不同，在跳跃-扩散情况下探索动力学的推导需要对跳跃部分进行细致的表述。通过理论分析，我们发现可以简单地使用 Jia 和 Zhou（2022a,2023）中最初为受控扩散而开发的相同的策略评估和 q-learning 算法，而无需先验地检查基础数据是来自纯扩散还是跳跃扩散。然而，我们发现跳跃的存在一般会影响行为者和批评者的参数设置。最后，我们将股票价格模拟为跳跃扩散的均值方差投资组合选择问题作为一个应用进行了研究，结果表明 RL 算法和参数化对于跳跃都是不变的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - QuantFin - Mathematical Finance

自引率

0.00%

发文量

期刊最新文献

A market resilient data-driven approach to option pricing COMEX Copper Futures Volatility Forecasting: Econometric Models and Deep Learning Ergodicity and Law-of-large numbers for the Volterra Cox-Ingersoll-Ross process Irreversible investment under weighted discounting: effects of decreasing impatience Long-term decomposition of robust pricing kernels under G-expectation