通过持续时间强化学习实现均方差效率

Yilie Huang, Yanwei Jia, X. Zhou
{"title":"通过持续时间强化学习实现均方差效率","authors":"Yilie Huang, Yanwei Jia, X. Zhou","doi":"10.1145/3533271.3561760","DOIUrl":null,"url":null,"abstract":"We conduct an extensive empirical analysis to evaluate the performance of the recently developed reinforcement learning algorithms by Jia and Zhou [11] in asset allocation tasks. We propose an efficient implementation of the algorithms in a dynamic mean-variance portfolio selection setting. We compare it with the conventional plug-in estimator and two state-of-the-art deep reinforcement learning algorithms, deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO), with both simulated and real market data. On both data sets, our algorithm significantly outperforms the others. In particular, using the US stocks data from Jan 2000 to Dec 2019, we demonstrate the effectiveness of our algorithm in reaching the target return and maximizing the Sharpe ratio for various periods under consideration, including the period of the financial crisis in 2007-2008. By contrast, the plug-in estimator performs poorly on real data sets, and PPO performs better than DDPG but still has lower Sharpe ratio than the market. Our algorithm also outperforms two well-diversified portfolios: the market and equally weighted portfolios.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Achieving Mean–Variance Efficiency by Continuous-Time Reinforcement Learning\",\"authors\":\"Yilie Huang, Yanwei Jia, X. Zhou\",\"doi\":\"10.1145/3533271.3561760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We conduct an extensive empirical analysis to evaluate the performance of the recently developed reinforcement learning algorithms by Jia and Zhou [11] in asset allocation tasks. We propose an efficient implementation of the algorithms in a dynamic mean-variance portfolio selection setting. We compare it with the conventional plug-in estimator and two state-of-the-art deep reinforcement learning algorithms, deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO), with both simulated and real market data. On both data sets, our algorithm significantly outperforms the others. In particular, using the US stocks data from Jan 2000 to Dec 2019, we demonstrate the effectiveness of our algorithm in reaching the target return and maximizing the Sharpe ratio for various periods under consideration, including the period of the financial crisis in 2007-2008. By contrast, the plug-in estimator performs poorly on real data sets, and PPO performs better than DDPG but still has lower Sharpe ratio than the market. Our algorithm also outperforms two well-diversified portfolios: the market and equally weighted portfolios.\",\"PeriodicalId\":134888,\"journal\":{\"name\":\"Proceedings of the Third ACM International Conference on AI in Finance\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third ACM International Conference on AI in Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3533271.3561760\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533271.3561760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

我们进行了广泛的实证分析,以评估Jia和Zhou[11]最近开发的强化学习算法在资产配置任务中的性能。我们提出了一种在动态均值-方差组合选择设置下的有效实现算法。我们将其与传统的插件估计器和两种最先进的深度强化学习算法,深度确定性策略梯度(DDPG)和近端策略优化(PPO),以及模拟和真实市场数据进行比较。在这两个数据集上,我们的算法明显优于其他算法。特别是,使用2000年1月至2019年12月的美国股票数据,我们证明了我们的算法在达到目标回报和最大化夏普比率方面的有效性,包括2007-2008年金融危机期间。相比之下,插件估计器在真实数据集上的性能较差,PPO的性能优于DDPG,但夏普比率仍低于市场。我们的算法也优于两种多元化的投资组合:市场投资组合和同等权重的投资组合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Achieving Mean–Variance Efficiency by Continuous-Time Reinforcement Learning
We conduct an extensive empirical analysis to evaluate the performance of the recently developed reinforcement learning algorithms by Jia and Zhou [11] in asset allocation tasks. We propose an efficient implementation of the algorithms in a dynamic mean-variance portfolio selection setting. We compare it with the conventional plug-in estimator and two state-of-the-art deep reinforcement learning algorithms, deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO), with both simulated and real market data. On both data sets, our algorithm significantly outperforms the others. In particular, using the US stocks data from Jan 2000 to Dec 2019, we demonstrate the effectiveness of our algorithm in reaching the target return and maximizing the Sharpe ratio for various periods under consideration, including the period of the financial crisis in 2007-2008. By contrast, the plug-in estimator performs poorly on real data sets, and PPO performs better than DDPG but still has lower Sharpe ratio than the market. Our algorithm also outperforms two well-diversified portfolios: the market and equally weighted portfolios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Core Matrix Regression and Prediction with Regularization Risk-Aware Linear Bandits with Application in Smart Order Routing Addressing Extreme Market Responses Using Secure Aggregation Addressing Non-Stationarity in FX Trading with Online Model Selection of Offline RL Experts Objective Driven Portfolio Construction Using Reinforcement Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1