{"title":"基于极值理论策略梯度的灾难风险意识强化学习","authors":"Parisa Davar, Frédéric Godin, Jose Garrido","doi":"arxiv-2406.15612","DOIUrl":null,"url":null,"abstract":"This paper tackles the problem of mitigating catastrophic risk (which is risk\nwith very low frequency but very high severity) in the context of a sequential\ndecision making process. This problem is particularly challenging due to the\nscarcity of observations in the far tail of the distribution of cumulative\ncosts (negative rewards). A policy gradient algorithm is developed, that we\ncall POTPG. It is based on approximations of the tail risk derived from extreme\nvalue theory. Numerical experiments highlight the out-performance of our method\nover common benchmarks, relying on the empirical distribution. An application\nto financial risk management, more precisely to the dynamic hedging of a\nfinancial option, is presented.","PeriodicalId":501128,"journal":{"name":"arXiv - QuantFin - Risk Management","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients\",\"authors\":\"Parisa Davar, Frédéric Godin, Jose Garrido\",\"doi\":\"arxiv-2406.15612\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper tackles the problem of mitigating catastrophic risk (which is risk\\nwith very low frequency but very high severity) in the context of a sequential\\ndecision making process. This problem is particularly challenging due to the\\nscarcity of observations in the far tail of the distribution of cumulative\\ncosts (negative rewards). A policy gradient algorithm is developed, that we\\ncall POTPG. It is based on approximations of the tail risk derived from extreme\\nvalue theory. Numerical experiments highlight the out-performance of our method\\nover common benchmarks, relying on the empirical distribution. An application\\nto financial risk management, more precisely to the dynamic hedging of a\\nfinancial option, is presented.\",\"PeriodicalId\":501128,\"journal\":{\"name\":\"arXiv - QuantFin - Risk Management\",\"volume\":\"72 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Risk Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.15612\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Risk Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.15612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients
This paper tackles the problem of mitigating catastrophic risk (which is risk
with very low frequency but very high severity) in the context of a sequential
decision making process. This problem is particularly challenging due to the
scarcity of observations in the far tail of the distribution of cumulative
costs (negative rewards). A policy gradient algorithm is developed, that we
call POTPG. It is based on approximations of the tail risk derived from extreme
value theory. Numerical experiments highlight the out-performance of our method
over common benchmarks, relying on the empirical distribution. An application
to financial risk management, more precisely to the dynamic hedging of a
financial option, is presented.