Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

arXiv - QuantFin - Risk Management Pub Date : 2024-06-21 DOI:arxiv-2406.15612

Parisa Davar, Frédéric Godin, Jose Garrido

引用次数: 0

Abstract

This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于极值理论策略梯度的灾难风险意识强化学习

本文探讨了在连续决策过程中降低灾难性风险（即频率很低但严重程度很高的风险）的问题。由于累积成本（负回报）分布远端观测值的稀缺性，这个问题尤其具有挑战性。我们开发了一种策略梯度算法，我们称之为 POTPG。该算法基于极值理论得出的尾部风险近似值。数值实验表明，我们的方法优于依赖经验分布的普通基准。本文介绍了该方法在金融风险管理中的应用，更确切地说，是在金融期权动态对冲中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - QuantFin - Risk Management

自引率

0.00%

发文量