动态凸风险度量的强化学习

IF 1.6 3区经济学 Q3 BUSINESS, FINANCE Mathematical Finance Pub Date : 2023-04-17 DOI:10.1111/mafi.12388

Anthony Coache, Sebastian Jaimungal

{"title":"动态凸风险度量的强化学习","authors":"Anthony Coache, Sebastian Jaimungal","doi":"10.1111/mafi.12388","DOIUrl":null,"url":null,"abstract":"<p>We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"34 2","pages":"557-587"},"PeriodicalIF":1.6000,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12388","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning with dynamic convex risk measures\",\"authors\":\"Anthony Coache, Sebastian Jaimungal\",\"doi\":\"10.1111/mafi.12388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.</p>\",\"PeriodicalId\":49867,\"journal\":{\"name\":\"Mathematical Finance\",\"volume\":\"34 2\",\"pages\":\"557-587\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12388\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematical Finance\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/mafi.12388\",\"RegionNum\":3,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BUSINESS, FINANCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Finance","FirstCategoryId":"96","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/mafi.12388","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}

引用次数: 0

摘要

我们开发了一种利用无模型强化学习（RL）解决时间一致性风险敏感随机优化问题的方法。具体来说，我们假设代理使用动态凸风险度量来评估随机变量序列的风险。我们采用时间一致的动态编程原理来确定特定政策的价值，并开发了有助于获得最佳政策的政策梯度更新规则。我们还进一步开发了一种使用神经网络对政策进行优化的演员批评式算法。最后，我们将我们的方法应用于三个优化问题，展示了它的性能和灵活性：统计套利交易策略、金融对冲和避障机器人控制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reinforcement learning with dynamic convex risk measures

We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Mathematical Finance 数学-数学跨学科应用

CiteScore

4.10

自引率

6.20%

发文量

审稿时长

>12 weeks

期刊介绍： Mathematical Finance seeks to publish original research articles focused on the development and application of novel mathematical and statistical methods for the analysis of financial problems. The journal welcomes contributions on new statistical methods for the analysis of financial problems. Empirical results will be appropriate to the extent that they illustrate a statistical technique, validate a model or provide insight into a financial problem. Papers whose main contribution rests on empirical results derived with standard approaches will not be considered.

期刊最新文献

Issue Information Issue Information Designing stablecoins Systemic risk in markets with multiple central counterparties Joint calibration to SPX and VIX options with signature-based models