Data time travel and consistent market making: taming reinforcement learning in multi-agent systems with anonymous data

Vincent Ragel, Damien Challet
{"title":"Data time travel and consistent market making: taming reinforcement learning in multi-agent systems with anonymous data","authors":"Vincent Ragel, Damien Challet","doi":"arxiv-2408.02322","DOIUrl":null,"url":null,"abstract":"Reinforcement learning works best when the impact of the agent's actions on\nits environment can be perfectly simulated or fully appraised from available\ndata. Some systems are however both hard to simulate and very sensitive to\nsmall perturbations. An additional difficulty arises when an RL agent must\nlearn to be part of a multi-agent system using only anonymous data, which makes\nit impossible to infer the state of each agent, thus to use data directly.\nTypical examples are competitive systems without agent-resolved data such as\nfinancial markets. We introduce consistent data time travel for offline RL as a\nremedy for these problems: instead of using historical data in a sequential\nway, we argue that one needs to perform time travel in historical data, i.e.,\nto adjust the time index so that both the past state and the influence of the\nRL agent's action on the state coincide with real data. This both alleviates\nthe need to resort to imperfect models and consistently accounts for both the\nimmediate and long-term reactions of the system when using anonymous historical\ndata. We apply this idea to market making in limit order books, a notoriously\ndifficult task for RL; it turns out that the gain of the agent is significantly\nhigher with data time travel than with naive sequential data, which suggests\nthat the difficulty of this task for RL may have been overestimated.","PeriodicalId":501478,"journal":{"name":"arXiv - QuantFin - Trading and Market Microstructure","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Trading and Market Microstructure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning works best when the impact of the agent's actions on its environment can be perfectly simulated or fully appraised from available data. Some systems are however both hard to simulate and very sensitive to small perturbations. An additional difficulty arises when an RL agent must learn to be part of a multi-agent system using only anonymous data, which makes it impossible to infer the state of each agent, thus to use data directly. Typical examples are competitive systems without agent-resolved data such as financial markets. We introduce consistent data time travel for offline RL as a remedy for these problems: instead of using historical data in a sequential way, we argue that one needs to perform time travel in historical data, i.e., to adjust the time index so that both the past state and the influence of the RL agent's action on the state coincide with real data. This both alleviates the need to resort to imperfect models and consistently accounts for both the immediate and long-term reactions of the system when using anonymous historical data. We apply this idea to market making in limit order books, a notoriously difficult task for RL; it turns out that the gain of the agent is significantly higher with data time travel than with naive sequential data, which suggests that the difficulty of this task for RL may have been overestimated.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据时间旅行与一致的市场决策:利用匿名数据驯服多代理系统中的强化学习
当代理的行为对环境的影响可以完全模拟或从可用数据中完全评估时,强化学习的效果最佳。然而,有些系统既难以模拟,又对微小的扰动非常敏感。当一个 RL 代理必须仅使用匿名数据来学习成为多代理系统的一部分时,就会出现额外的困难,这使得它无法推断每个代理的状态,从而无法直接使用数据。我们为离线 RL 引入了一致数据时间旅行,作为解决这些问题的方法:我们认为,与其顺序使用历史数据,不如在历史数据中执行时间旅行,即调整时间指数,使过去的状态和 RL 代理的行动对状态的影响与真实数据相吻合。这既减轻了使用不完全模型的需要,又能在使用匿名历史数据时始终如一地考虑到系统的即时和长期反应。我们将这一想法应用于限价订单簿中的做市交易--这对 RL 来说是一项众所周知的困难任务;结果表明,代理在数据时间旅行中的收益要明显高于在天真的顺序数据中的收益,这表明这项任务对 RL 来说的难度可能被高估了。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal position-building strategies in Competition MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model Logarithmic regret in the ergodic Avellaneda-Stoikov market making model A Financial Time Series Denoiser Based on Diffusion Model Simulation of Social Media-Driven Bubble Formation in Financial Markets using an Agent-Based Model with Hierarchical Influence Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1