Trading Volume Maximization with Online Learning

arXiv - QuantFin - Computational Finance Pub Date : 2024-05-21 DOI:arxiv-2405.13102

Tommaso Cesari, Roberto Colomboni

{"title":"Trading Volume Maximization with Online Learning","authors":"Tommaso Cesari, Roberto Colomboni","doi":"arxiv-2405.13102","DOIUrl":null,"url":null,"abstract":"We explore brokerage between traders in an online learning framework. At any\nround $t$, two traders meet to exchange an asset, provided the exchange is\nmutually beneficial. The broker proposes a trading price, and each trader tries\nto sell their asset or buy the asset from the other party, depending on whether\nthe price is higher or lower than their private valuations. A trade happens if\none trader is willing to sell and the other is willing to buy at the proposed\nprice. Previous work provided guidance to a broker aiming at enhancing traders'\ntotal earnings by maximizing the gain from trade, defined as the sum of the\ntraders' net utilities after each interaction. In contrast, we investigate how\nthe broker should behave to maximize the trading volume, i.e., the total number\nof trades. We model the traders' valuations as an i.i.d. process with an\nunknown distribution. If the traders' valuations are revealed after each\ninteraction (full-feedback), and the traders' valuations cumulative\ndistribution function (cdf) is continuous, we provide an algorithm achieving\nlogarithmic regret and show its optimality up to constant factors. If only\ntheir willingness to sell or buy at the proposed price is revealed after each\ninteraction ($2$-bit feedback), we provide an algorithm achieving\npoly-logarithmic regret when the traders' valuations cdf is Lipschitz and show\nthat this rate is near-optimal. We complement our results by analyzing the\nimplications of dropping the regularity assumptions on the unknown traders'\nvaluations cdf. If we drop the continuous cdf assumption, the regret rate\ndegrades to $\\Theta(\\sqrt{T})$ in the full-feedback case, where $T$ is the time\nhorizon. If we drop the Lipschitz cdf assumption, learning becomes impossible\nin the $2$-bit feedback case.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"28 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.13102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We explore brokerage between traders in an online learning framework. At any round $t$, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction ($2$-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to $\Theta(\sqrt{T})$ in the full-feedback case, where $T$ is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the $2$-bit feedback case.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过在线学习实现交易量最大化

我们在在线学习框架下探讨交易者之间的经纪活动。在任意一轮 $t$，两个交易者相遇交换资产，前提是交换对双方都有利。经纪人提出一个交易价格，每个交易者根据价格高于或低于他们的私人估值，尝试出售自己的资产或从另一方购买资产。如果一个交易者愿意卖出，而另一方愿意以提议的价格买入，交易就会发生。以前的工作为经纪人提供了指导，旨在通过最大化交易收益来提高交易者的总收益，交易收益被定义为每次互动后交易者的净效用总和。与此相反，我们研究的是经纪商应该如何做才能使交易量（即交易总数）最大化。我们将交易者的估值建模为分布未知的 i.i.d. 过程。如果交易者的估值在每次互动后都被揭示（完全反馈），且交易者的估值累积分布函数（ccd）是连续的，我们提供了一种实现对数遗憾的算法，并证明了其在常数因子以内的最优性。如果每次互动（2 美元位反馈）后只透露交易者按提议价格卖出或买入的意愿，当交易者的估值 cdf 为 Lipschitz 时，我们提供了一种实现对数遗憾的算法，并证明这一比率接近最优。我们通过分析放弃对未知交易者估值 cdf 的正则性假设的影响来补充我们的结果。如果我们放弃连续 cdf 假设，那么在全反馈情况下，后悔率就会降为 $\θ(\sqrt{T})$，其中 $T$ 是时间跨度。如果我们放弃 Lipschitz cdf 假设，在 2 $ 位反馈的情况下，学习将变得不可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - QuantFin - Computational Finance

自引率

0.00%

发文量