{"title":"Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management","authors":"Gang Hu, Ming Gu","doi":"arxiv-2405.05449","DOIUrl":null,"url":null,"abstract":"Investment portfolios, central to finance, balance potential returns and\nrisks. This paper introduces a hybrid approach combining Markowitz's portfolio\ntheory with reinforcement learning, utilizing knowledge distillation for\ntraining agents. In particular, our proposed method, called KDD (Knowledge\nDistillation DDPG), consist of two training stages: supervised and\nreinforcement learning stages. The trained agents optimize portfolio assembly.\nA comparative analysis against standard financial models and AI frameworks,\nusing metrics like returns, the Sharpe ratio, and nine evaluation indices,\nreveals our model's superiority. It notably achieves the highest yield and\nSharpe ratio of 2.03, ensuring top profitability with the lowest risk in\ncomparable return scenarios.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.05449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Investment portfolios, central to finance, balance potential returns and
risks. This paper introduces a hybrid approach combining Markowitz's portfolio
theory with reinforcement learning, utilizing knowledge distillation for
training agents. In particular, our proposed method, called KDD (Knowledge
Distillation DDPG), consist of two training stages: supervised and
reinforcement learning stages. The trained agents optimize portfolio assembly.
A comparative analysis against standard financial models and AI frameworks,
using metrics like returns, the Sharpe ratio, and nine evaluation indices,
reveals our model's superiority. It notably achieves the highest yield and
Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in
comparable return scenarios.