通过持续时间强化学习实现均方差效率

Proceedings of the Third ACM International Conference on AI in Finance Pub Date : 2022-10-26 DOI:10.1145/3533271.3561760

Yilie Huang, Yanwei Jia, X. Zhou

{"title":"通过持续时间强化学习实现均方差效率","authors":"Yilie Huang, Yanwei Jia, X. Zhou","doi":"10.1145/3533271.3561760","DOIUrl":null,"url":null,"abstract":"We conduct an extensive empirical analysis to evaluate the performance of the recently developed reinforcement learning algorithms by Jia and Zhou [11] in asset allocation tasks. We propose an efficient implementation of the algorithms in a dynamic mean-variance portfolio selection setting. We compare it with the conventional plug-in estimator and two state-of-the-art deep reinforcement learning algorithms, deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO), with both simulated and real market data. On both data sets, our algorithm significantly outperforms the others. In particular, using the US stocks data from Jan 2000 to Dec 2019, we demonstrate the effectiveness of our algorithm in reaching the target return and maximizing the Sharpe ratio for various periods under consideration, including the period of the financial crisis in 2007-2008. By contrast, the plug-in estimator performs poorly on real data sets, and PPO performs better than DDPG but still has lower Sharpe ratio than the market. Our algorithm also outperforms two well-diversified portfolios: the market and equally weighted portfolios.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Achieving Mean–Variance Efficiency by Continuous-Time Reinforcement Learning\",\"authors\":\"Yilie Huang, Yanwei Jia, X. Zhou\",\"doi\":\"10.1145/3533271.3561760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We conduct an extensive empirical analysis to evaluate the performance of the recently developed reinforcement learning algorithms by Jia and Zhou [11] in asset allocation tasks. We propose an efficient implementation of the algorithms in a dynamic mean-variance portfolio selection setting. We compare it with the conventional plug-in estimator and two state-of-the-art deep reinforcement learning algorithms, deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO), with both simulated and real market data. On both data sets, our algorithm significantly outperforms the others. In particular, using the US stocks data from Jan 2000 to Dec 2019, we demonstrate the effectiveness of our algorithm in reaching the target return and maximizing the Sharpe ratio for various periods under consideration, including the period of the financial crisis in 2007-2008. By contrast, the plug-in estimator performs poorly on real data sets, and PPO performs better than DDPG but still has lower Sharpe ratio than the market. Our algorithm also outperforms two well-diversified portfolios: the market and equally weighted portfolios.\",\"PeriodicalId\":134888,\"journal\":{\"name\":\"Proceedings of the Third ACM International Conference on AI in Finance\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third ACM International Conference on AI in Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3533271.3561760\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533271.3561760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们进行了广泛的实证分析，以评估Jia和Zhou[11]最近开发的强化学习算法在资产配置任务中的性能。我们提出了一种在动态均值-方差组合选择设置下的有效实现算法。我们将其与传统的插件估计器和两种最先进的深度强化学习算法，深度确定性策略梯度(DDPG)和近端策略优化(PPO)，以及模拟和真实市场数据进行比较。在这两个数据集上，我们的算法明显优于其他算法。特别是，使用2000年1月至2019年12月的美国股票数据，我们证明了我们的算法在达到目标回报和最大化夏普比率方面的有效性，包括2007-2008年金融危机期间。相比之下，插件估计器在真实数据集上的性能较差，PPO的性能优于DDPG，但夏普比率仍低于市场。我们的算法也优于两种多元化的投资组合:市场投资组合和同等权重的投资组合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Achieving Mean–Variance Efficiency by Continuous-Time Reinforcement Learning

We conduct an extensive empirical analysis to evaluate the performance of the recently developed reinforcement learning algorithms by Jia and Zhou [11] in asset allocation tasks. We propose an efficient implementation of the algorithms in a dynamic mean-variance portfolio selection setting. We compare it with the conventional plug-in estimator and two state-of-the-art deep reinforcement learning algorithms, deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO), with both simulated and real market data. On both data sets, our algorithm significantly outperforms the others. In particular, using the US stocks data from Jan 2000 to Dec 2019, we demonstrate the effectiveness of our algorithm in reaching the target return and maximizing the Sharpe ratio for various periods under consideration, including the period of the financial crisis in 2007-2008. By contrast, the plug-in estimator performs poorly on real data sets, and PPO performs better than DDPG but still has lower Sharpe ratio than the market. Our algorithm also outperforms two well-diversified portfolios: the market and equally weighted portfolios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Third ACM International Conference on AI in Finance

自引率

0.00%

发文量

期刊最新文献

Core Matrix Regression and Prediction with Regularization Risk-Aware Linear Bandits with Application in Smart Order Routing Addressing Extreme Market Responses Using Secure Aggregation Addressing Non-Stationarity in FX Trading with Online Model Selection of Offline RL Experts Objective Driven Portfolio Construction Using Reinforcement Learning