Thammasorn Harnpadungkij, Warasinee Chaisangmongkon, P. Phunchongharn
{"title":"基于C51算法的风险敏感投资组合管理","authors":"Thammasorn Harnpadungkij, Warasinee Chaisangmongkon, P. Phunchongharn","doi":"10.12982/cmjs.2022.094","DOIUrl":null,"url":null,"abstract":"Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.","PeriodicalId":9884,"journal":{"name":"Chiang Mai Journal of Science","volume":"13 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Risk-Sensitive Portfolio Management by Using C51 Algorithm\",\"authors\":\"Thammasorn Harnpadungkij, Warasinee Chaisangmongkon, P. Phunchongharn\",\"doi\":\"10.12982/cmjs.2022.094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.\",\"PeriodicalId\":9884,\"journal\":{\"name\":\"Chiang Mai Journal of Science\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chiang Mai Journal of Science\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.12982/cmjs.2022.094\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chiang Mai Journal of Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.12982/cmjs.2022.094","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Risk-Sensitive Portfolio Management by Using C51 Algorithm
Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.
期刊介绍:
The Chiang Mai Journal of Science is an international English language peer-reviewed journal which is published in open access electronic format 6 times a year in January, March, May, July, September and November by the Faculty of Science, Chiang Mai University. Manuscripts in most areas of science are welcomed except in areas such as agriculture, engineering and medical science which are outside the scope of the Journal. Currently, we focus on manuscripts in biology, chemistry, physics, materials science and environmental science. Papers in mathematics statistics and computer science are also included but should be of an applied nature rather than purely theoretical. Manuscripts describing experiments on humans or animals are required to provide proof that all experiments have been carried out according to the ethical regulations of the respective institutional and/or governmental authorities and this should be clearly stated in the manuscript itself. The Editor reserves the right to reject manuscripts that fail to do so.