{"title":"Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients","authors":"Parisa Davar, Frédéric Godin, Jose Garrido","doi":"arxiv-2406.15612","DOIUrl":null,"url":null,"abstract":"This paper tackles the problem of mitigating catastrophic risk (which is risk\nwith very low frequency but very high severity) in the context of a sequential\ndecision making process. This problem is particularly challenging due to the\nscarcity of observations in the far tail of the distribution of cumulative\ncosts (negative rewards). A policy gradient algorithm is developed, that we\ncall POTPG. It is based on approximations of the tail risk derived from extreme\nvalue theory. Numerical experiments highlight the out-performance of our method\nover common benchmarks, relying on the empirical distribution. An application\nto financial risk management, more precisely to the dynamic hedging of a\nfinancial option, is presented.","PeriodicalId":501128,"journal":{"name":"arXiv - QuantFin - Risk Management","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Risk Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.15612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper tackles the problem of mitigating catastrophic risk (which is risk
with very low frequency but very high severity) in the context of a sequential
decision making process. This problem is particularly challenging due to the
scarcity of observations in the far tail of the distribution of cumulative
costs (negative rewards). A policy gradient algorithm is developed, that we
call POTPG. It is based on approximations of the tail risk derived from extreme
value theory. Numerical experiments highlight the out-performance of our method
over common benchmarks, relying on the empirical distribution. An application
to financial risk management, more precisely to the dynamic hedging of a
financial option, is presented.