Data time travel and consistent market making: taming reinforcement learning in multi-agent systems with anonymous data

arXiv - QuantFin - Trading and Market Microstructure Pub Date : 2024-08-05 DOI:arxiv-2408.02322

Vincent Ragel, Damien Challet

{"title":"Data time travel and consistent market making: taming reinforcement learning in multi-agent systems with anonymous data","authors":"Vincent Ragel, Damien Challet","doi":"arxiv-2408.02322","DOIUrl":null,"url":null,"abstract":"Reinforcement learning works best when the impact of the agent's actions on\nits environment can be perfectly simulated or fully appraised from available\ndata. Some systems are however both hard to simulate and very sensitive to\nsmall perturbations. An additional difficulty arises when an RL agent must\nlearn to be part of a multi-agent system using only anonymous data, which makes\nit impossible to infer the state of each agent, thus to use data directly.\nTypical examples are competitive systems without agent-resolved data such as\nfinancial markets. We introduce consistent data time travel for offline RL as a\nremedy for these problems: instead of using historical data in a sequential\nway, we argue that one needs to perform time travel in historical data, i.e.,\nto adjust the time index so that both the past state and the influence of the\nRL agent's action on the state coincide with real data. This both alleviates\nthe need to resort to imperfect models and consistently accounts for both the\nimmediate and long-term reactions of the system when using anonymous historical\ndata. We apply this idea to market making in limit order books, a notoriously\ndifficult task for RL; it turns out that the gain of the agent is significantly\nhigher with data time travel than with naive sequential data, which suggests\nthat the difficulty of this task for RL may have been overestimated.","PeriodicalId":501478,"journal":{"name":"arXiv - QuantFin - Trading and Market Microstructure","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Trading and Market Microstructure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Reinforcement learning works best when the impact of the agent's actions on its environment can be perfectly simulated or fully appraised from available data. Some systems are however both hard to simulate and very sensitive to small perturbations. An additional difficulty arises when an RL agent must learn to be part of a multi-agent system using only anonymous data, which makes it impossible to infer the state of each agent, thus to use data directly. Typical examples are competitive systems without agent-resolved data such as financial markets. We introduce consistent data time travel for offline RL as a remedy for these problems: instead of using historical data in a sequential way, we argue that one needs to perform time travel in historical data, i.e., to adjust the time index so that both the past state and the influence of the RL agent's action on the state coincide with real data. This both alleviates the need to resort to imperfect models and consistently accounts for both the immediate and long-term reactions of the system when using anonymous historical data. We apply this idea to market making in limit order books, a notoriously difficult task for RL; it turns out that the gain of the agent is significantly higher with data time travel than with naive sequential data, which suggests that the difficulty of this task for RL may have been overestimated.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据时间旅行与一致的市场决策：利用匿名数据驯服多代理系统中的强化学习

当代理的行为对环境的影响可以完全模拟或从可用数据中完全评估时，强化学习的效果最佳。然而，有些系统既难以模拟，又对微小的扰动非常敏感。当一个 RL 代理必须仅使用匿名数据来学习成为多代理系统的一部分时，就会出现额外的困难，这使得它无法推断每个代理的状态，从而无法直接使用数据。我们为离线 RL 引入了一致数据时间旅行，作为解决这些问题的方法：我们认为，与其顺序使用历史数据，不如在历史数据中执行时间旅行，即调整时间指数，使过去的状态和 RL 代理的行动对状态的影响与真实数据相吻合。这既减轻了使用不完全模型的需要，又能在使用匿名历史数据时始终如一地考虑到系统的即时和长期反应。我们将这一想法应用于限价订单簿中的做市交易--这对 RL 来说是一项众所周知的困难任务；结果表明，代理在数据时间旅行中的收益要明显高于在天真的顺序数据中的收益，这表明这项任务对 RL 来说的难度可能被高估了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - QuantFin - Trading and Market Microstructure

自引率

0.00%

发文量