Trading Volume Maximization with Online Learning

Tommaso Cesari, Roberto Colomboni
{"title":"Trading Volume Maximization with Online Learning","authors":"Tommaso Cesari, Roberto Colomboni","doi":"arxiv-2405.13102","DOIUrl":null,"url":null,"abstract":"We explore brokerage between traders in an online learning framework. At any\nround $t$, two traders meet to exchange an asset, provided the exchange is\nmutually beneficial. The broker proposes a trading price, and each trader tries\nto sell their asset or buy the asset from the other party, depending on whether\nthe price is higher or lower than their private valuations. A trade happens if\none trader is willing to sell and the other is willing to buy at the proposed\nprice. Previous work provided guidance to a broker aiming at enhancing traders'\ntotal earnings by maximizing the gain from trade, defined as the sum of the\ntraders' net utilities after each interaction. In contrast, we investigate how\nthe broker should behave to maximize the trading volume, i.e., the total number\nof trades. We model the traders' valuations as an i.i.d. process with an\nunknown distribution. If the traders' valuations are revealed after each\ninteraction (full-feedback), and the traders' valuations cumulative\ndistribution function (cdf) is continuous, we provide an algorithm achieving\nlogarithmic regret and show its optimality up to constant factors. If only\ntheir willingness to sell or buy at the proposed price is revealed after each\ninteraction ($2$-bit feedback), we provide an algorithm achieving\npoly-logarithmic regret when the traders' valuations cdf is Lipschitz and show\nthat this rate is near-optimal. We complement our results by analyzing the\nimplications of dropping the regularity assumptions on the unknown traders'\nvaluations cdf. If we drop the continuous cdf assumption, the regret rate\ndegrades to $\\Theta(\\sqrt{T})$ in the full-feedback case, where $T$ is the time\nhorizon. If we drop the Lipschitz cdf assumption, learning becomes impossible\nin the $2$-bit feedback case.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"28 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.13102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We explore brokerage between traders in an online learning framework. At any round $t$, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction ($2$-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to $\Theta(\sqrt{T})$ in the full-feedback case, where $T$ is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the $2$-bit feedback case.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过在线学习实现交易量最大化
我们在在线学习框架下探讨交易者之间的经纪活动。在任意一轮 $t$,两个交易者相遇交换资产,前提是交换对双方都有利。经纪人提出一个交易价格,每个交易者根据价格高于或低于他们的私人估值,尝试出售自己的资产或从另一方购买资产。如果一个交易者愿意卖出,而另一方愿意以提议的价格买入,交易就会发生。以前的工作为经纪人提供了指导,旨在通过最大化交易收益来提高交易者的总收益,交易收益被定义为每次互动后交易者的净效用总和。与此相反,我们研究的是经纪商应该如何做才能使交易量(即交易总数)最大化。我们将交易者的估值建模为分布未知的 i.i.d. 过程。如果交易者的估值在每次互动后都被揭示(完全反馈),且交易者的估值累积分布函数(ccd)是连续的,我们提供了一种实现对数遗憾的算法,并证明了其在常数因子以内的最优性。如果每次互动(2 美元位反馈)后只透露交易者按提议价格卖出或买入的意愿,当交易者的估值 cdf 为 Lipschitz 时,我们提供了一种实现对数遗憾的算法,并证明这一比率接近最优。我们通过分析放弃对未知交易者估值 cdf 的正则性假设的影响来补充我们的结果。如果我们放弃连续 cdf 假设,那么在全反馈情况下,后悔率就会降为 $\θ(\sqrt{T})$,其中 $T$ 是时间跨度。如果我们放弃 Lipschitz cdf 假设,在 2 $ 位反馈的情况下,学习将变得不可能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A deep primal-dual BSDE method for optimal stopping problems Robust financial calibration: a Bayesian approach for neural SDEs MANA-Net: Mitigating Aggregated Sentiment Homogenization with News Weighting for Enhanced Market Prediction QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE Signature of maturity in cryptocurrency volatility
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1