Cost-Efficient Reinforcement Learning for Optimal Trade Execution on Dynamic Market Environment

Proceedings of the Third ACM International Conference on AI in Finance Pub Date : 2022-10-26 DOI:10.1145/3533271.3561761

Di Chen, Yada Zhu, Miao Liu, Jianbo Li

{"title":"Cost-Efficient Reinforcement Learning for Optimal Trade Execution on Dynamic Market Environment","authors":"Di Chen, Yada Zhu, Miao Liu, Jianbo Li","doi":"10.1145/3533271.3561761","DOIUrl":null,"url":null,"abstract":"Learning a high-performance trade execution model via reinforcement learning (RL) requires interaction with the real dynamic market. However, the massive interactions required by direct RL would result in a significant training overhead. In this paper, we propose a cost-efficient reinforcement learning (RL) approach called Deep Dyna-Double Q-learning (D3Q), which integrates deep reinforcement learning and planning to reduce the training overhead while improving the trading performance. Specifically, D3Q includes a learnable market environment model, which approximates the market impact using real market experience, to enhance policy learning via the learned environment. Meanwhile, we propose a novel state-balanced exploration scheme to solve the exploration bias caused by the non-increasing residual inventory during the trade execution to accelerate model learning. As demonstrated by our extensive experiments, the proposed D3Q framework significantly increases sample efficiency and outperforms state-of-the-art methods on average trading cost as well.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"58 11","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533271.3561761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Learning a high-performance trade execution model via reinforcement learning (RL) requires interaction with the real dynamic market. However, the massive interactions required by direct RL would result in a significant training overhead. In this paper, we propose a cost-efficient reinforcement learning (RL) approach called Deep Dyna-Double Q-learning (D3Q), which integrates deep reinforcement learning and planning to reduce the training overhead while improving the trading performance. Specifically, D3Q includes a learnable market environment model, which approximates the market impact using real market experience, to enhance policy learning via the learned environment. Meanwhile, we propose a novel state-balanced exploration scheme to solve the exploration bias caused by the non-increasing residual inventory during the trade execution to accelerate model learning. As demonstrated by our extensive experiments, the proposed D3Q framework significantly increases sample efficiency and outperforms state-of-the-art methods on average trading cost as well.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

动态市场环境下最优交易执行的成本-效率强化学习

通过强化学习(RL)学习高性能的交易执行模型需要与真实的动态市场进行交互。然而，直接强化学习所需的大量交互将导致大量的训练开销。在本文中，我们提出了一种成本高效的强化学习(RL)方法，称为深度动力-双q学习(D3Q)，它将深度强化学习和计划相结合，以减少训练开销，同时提高交易性能。具体而言，D3Q包括一个可学习的市场环境模型，该模型使用真实的市场经验来近似市场影响，以增强通过学习环境的政策学习。同时，我们提出了一种新的状态平衡探索方案来解决交易执行过程中由于剩余库存不增加而导致的探索偏差，以加速模型的学习。正如我们广泛的实验所证明的那样，所提出的D3Q框架显着提高了样本效率，并且在平均交易成本方面优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Third ACM International Conference on AI in Finance

自引率

0.00%

发文量

期刊最新文献

Core Matrix Regression and Prediction with Regularization Risk-Aware Linear Bandits with Application in Smart Order Routing Addressing Extreme Market Responses Using Secure Aggregation Addressing Non-Stationarity in FX Trading with Online Model Selection of Offline RL Experts Objective Driven Portfolio Construction Using Reinforcement Learning