Risk-Sensitive Portfolio Management by Using C51 Algorithm

IF 0.6 4区 综合性期刊 Q3 MULTIDISCIPLINARY SCIENCES Chiang Mai Journal of Science Pub Date : 2022-09-30 DOI:10.12982/cmjs.2022.094
Thammasorn Harnpadungkij, Warasinee Chaisangmongkon, P. Phunchongharn
{"title":"Risk-Sensitive Portfolio Management by Using C51 Algorithm","authors":"Thammasorn Harnpadungkij, Warasinee Chaisangmongkon, P. Phunchongharn","doi":"10.12982/cmjs.2022.094","DOIUrl":null,"url":null,"abstract":"Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.","PeriodicalId":9884,"journal":{"name":"Chiang Mai Journal of Science","volume":"13 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chiang Mai Journal of Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.12982/cmjs.2022.094","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于C51算法的风险敏感投资组合管理
金融交易是近年来强化学习最热门的问题之一。其中一个重要的挑战是,投资是一个多目标问题。也就是说,专业投资者不仅根据预期利润行事,而且还会仔细考虑特定投资的潜在风险。为了应对这样的挑战,以前的研究已经探索了各种风险敏感型回报,例如,由固定长度的先前回报计算的夏普比率。这项工作提出了一种新的方法来处理利润与风险的权衡,通过应用分布式强化学习来构建风险意识策略,而不是简单的基于风险的奖励函数。我们的新策略,称为C51-Sharpe,是根据从收益的概率质量函数计算出的夏普比率来选择行动。与使用纯粹基于利润的策略的c51算法相比,这在不牺牲利润的情况下产生了明显更高的夏普比率和更低的最大下降。此外,它可以优于其他基准测试,例如具有夏普比率奖励函数的深度Q-Network (DQN)。除了策略之外,我们还研究了使用双网络的效果和探索策略的选择,以确定最优的训练配置。我们发现epsilon-greedy策略是最适合C51-Sharpe的探索策略,双网络的使用对性能没有显著影响。我们的研究为使用分布式强化算法和优化的训练过程实现风险敏感策略的效率提供了统计证据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Chiang Mai Journal of Science
Chiang Mai Journal of Science MULTIDISCIPLINARY SCIENCES-
CiteScore
1.00
自引率
25.00%
发文量
103
审稿时长
3 months
期刊介绍: The Chiang Mai Journal of Science is an international English language peer-reviewed journal which is published in open access electronic format 6 times a year in January, March, May, July, September and November by the Faculty of Science, Chiang Mai University. Manuscripts in most areas of science are welcomed except in areas such as agriculture, engineering and medical science which are outside the scope of the Journal. Currently, we focus on manuscripts in biology, chemistry, physics, materials science and environmental science. Papers in mathematics statistics and computer science are also included but should be of an applied nature rather than purely theoretical. Manuscripts describing experiments on humans or animals are required to provide proof that all experiments have been carried out according to the ethical regulations of the respective institutional and/or governmental authorities and this should be clearly stated in the manuscript itself. The Editor reserves the right to reject manuscripts that fail to do so.
期刊最新文献
Drying Characteristics and Mitragynine Content of Kratom Leaves Biodiesel Production from Waste Cooking Oil using Heterogeneous CaO/Zn Catalyst: Yield and Reusability Performance Performance of Solar-based Electrochemical System as Post-treatment of Hospital Wastewater Contaminated with Ciprofloxacin Carbon-supported Ternary Nanocatalyst Palladium-Vanadium-Cobalt for Hydrodechlorination of 2,4-Dichlorophenol Synergistic Effects of Plant Growth-Promoting Microorganisms on Growth and Development of Terap (Artocarpus odoratissimus Blanco) Seedlings
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1