Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

Parisa Davar, Frédéric Godin, Jose Garrido
{"title":"Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients","authors":"Parisa Davar, Frédéric Godin, Jose Garrido","doi":"arxiv-2406.15612","DOIUrl":null,"url":null,"abstract":"This paper tackles the problem of mitigating catastrophic risk (which is risk\nwith very low frequency but very high severity) in the context of a sequential\ndecision making process. This problem is particularly challenging due to the\nscarcity of observations in the far tail of the distribution of cumulative\ncosts (negative rewards). A policy gradient algorithm is developed, that we\ncall POTPG. It is based on approximations of the tail risk derived from extreme\nvalue theory. Numerical experiments highlight the out-performance of our method\nover common benchmarks, relying on the empirical distribution. An application\nto financial risk management, more precisely to the dynamic hedging of a\nfinancial option, is presented.","PeriodicalId":501128,"journal":{"name":"arXiv - QuantFin - Risk Management","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Risk Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.15612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于极值理论策略梯度的灾难风险意识强化学习
本文探讨了在连续决策过程中降低灾难性风险(即频率很低但严重程度很高的风险)的问题。由于累积成本(负回报)分布远端观测值的稀缺性,这个问题尤其具有挑战性。我们开发了一种策略梯度算法,我们称之为 POTPG。该算法基于极值理论得出的尾部风险近似值。数值实验表明,我们的方法优于依赖经验分布的普通基准。本文介绍了该方法在金融风险管理中的应用,更确切地说,是在金融期权动态对冲中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DeFi Arbitrage in Hedged Liquidity Tokens Decomposition Pipeline for Large-Scale Portfolio Optimization with Applications to Near-Term Quantum Computing Research and Design of a Financial Intelligent Risk Control Platform Based on Big Data Analysis and Deep Machine Learning Credit Spreads' Term Structure: Stochastic Modeling with CIR++ Intensity Claims processing and costs under capacity constraints
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1