基于夏普比率奖励框架的投资组合管理强化学习

Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China Pub Date : 1900-01-01 DOI:10.4108/eai.18-11-2022.2327121

Z. Liu

{"title":"基于夏普比率奖励框架的投资组合管理强化学习","authors":"Z. Liu","doi":"10.4108/eai.18-11-2022.2327121","DOIUrl":null,"url":null,"abstract":"— Portfolio management is a financial operation which aims at maximizing the return or optimizing the Sharpe Ratio. One widely used portfolio management strategy, Mean-Variance Optimization, also known as Modern Portfolio Theory, mainly profits by focusing on finding out the expected return and variance of stocks based on historical data to maximize Sharpe Ratio. Yet, it is not easy and accurate to simply predict future return and variance based on a formula. So, in this paper, two Models-free framework, Sharpe Ratio reward based Deep Q-Network (DQN-S) and Return reward (DQN-R) are proposed to overcome the limitations above. Deep Q-learning was employed to train a neural network to manage a stock portfolio of 10 stocks. Stock price was defined as environment of NN, weight of portfolio was defined as action of neural network agent, and reward was indicated to train the model. Traditional portfolio allocation strategy Mean Variance Optimization (MVO) and Naïve Portfolio Allocation (NPA) were also introduced as benchmark to evaluate the performance of reinforcement learning. Moreover, the extensiveness of DQN-S was discussed. The result shows that the MVO is dominating the NPA with a 5% higher annual return and 0.5 higher of Sharpe ratio, although the MDD is slightly higher, indicating the superiority of Sharpe Ratio oriented strategy.","PeriodicalId":436941,"journal":{"name":"Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning in Portfolio Management with Sharpe Ratio Rewarding Based Framework\",\"authors\":\"Z. Liu\",\"doi\":\"10.4108/eai.18-11-2022.2327121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"— Portfolio management is a financial operation which aims at maximizing the return or optimizing the Sharpe Ratio. One widely used portfolio management strategy, Mean-Variance Optimization, also known as Modern Portfolio Theory, mainly profits by focusing on finding out the expected return and variance of stocks based on historical data to maximize Sharpe Ratio. Yet, it is not easy and accurate to simply predict future return and variance based on a formula. So, in this paper, two Models-free framework, Sharpe Ratio reward based Deep Q-Network (DQN-S) and Return reward (DQN-R) are proposed to overcome the limitations above. Deep Q-learning was employed to train a neural network to manage a stock portfolio of 10 stocks. Stock price was defined as environment of NN, weight of portfolio was defined as action of neural network agent, and reward was indicated to train the model. Traditional portfolio allocation strategy Mean Variance Optimization (MVO) and Naïve Portfolio Allocation (NPA) were also introduced as benchmark to evaluate the performance of reinforcement learning. Moreover, the extensiveness of DQN-S was discussed. The result shows that the MVO is dominating the NPA with a 5% higher annual return and 0.5 higher of Sharpe ratio, although the MDD is slightly higher, indicating the superiority of Sharpe Ratio oriented strategy.\",\"PeriodicalId\":436941,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4108/eai.18-11-2022.2327121\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/eai.18-11-2022.2327121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

-投资组合管理是一种财务操作，其目的是最大化回报或优化夏普比率。一种被广泛使用的投资组合管理策略是Mean-Variance Optimization，也被称为Modern portfolio Theory，主要是通过根据历史数据找出股票的预期收益和方差来最大化Sharpe Ratio。然而，简单地根据公式预测未来的收益和方差是不容易和准确的。因此，本文提出了基于Sharpe Ratio奖励的深度Q-Network (DQN-S)和Return reward (DQN-R)两种无模型框架来克服上述局限性。使用深度q -学习来训练一个神经网络来管理一个由10只股票组成的股票组合。将股票价格定义为神经网络的环境，将投资组合的权重定义为神经网络代理的行为，并通过奖励来训练模型。引入传统的投资组合配置策略均值方差优化(Mean Variance Optimization, MVO)和Naïve投资组合配置(portfolio allocation, NPA)作为评价强化学习性能的基准。此外，还讨论了DQN-S的广泛性。结果表明，MVO主导着NPA，年化收益率高出5%，夏普比率高出0.5，但MDD略高，说明夏普比率导向策略具有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reinforcement Learning in Portfolio Management with Sharpe Ratio Rewarding Based Framework

— Portfolio management is a financial operation which aims at maximizing the return or optimizing the Sharpe Ratio. One widely used portfolio management strategy, Mean-Variance Optimization, also known as Modern Portfolio Theory, mainly profits by focusing on finding out the expected return and variance of stocks based on historical data to maximize Sharpe Ratio. Yet, it is not easy and accurate to simply predict future return and variance based on a formula. So, in this paper, two Models-free framework, Sharpe Ratio reward based Deep Q-Network (DQN-S) and Return reward (DQN-R) are proposed to overcome the limitations above. Deep Q-learning was employed to train a neural network to manage a stock portfolio of 10 stocks. Stock price was defined as environment of NN, weight of portfolio was defined as action of neural network agent, and reward was indicated to train the model. Traditional portfolio allocation strategy Mean Variance Optimization (MVO) and Naïve Portfolio Allocation (NPA) were also introduced as benchmark to evaluate the performance of reinforcement learning. Moreover, the extensiveness of DQN-S was discussed. The result shows that the MVO is dominating the NPA with a 5% higher annual return and 0.5 higher of Sharpe ratio, although the MDD is slightly higher, indicating the superiority of Sharpe Ratio oriented strategy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China

自引率

0.00%

发文量