{"title":"通过在线学习实现交易量最大化","authors":"Tommaso Cesari, Roberto Colomboni","doi":"arxiv-2405.13102","DOIUrl":null,"url":null,"abstract":"We explore brokerage between traders in an online learning framework. At any\nround $t$, two traders meet to exchange an asset, provided the exchange is\nmutually beneficial. The broker proposes a trading price, and each trader tries\nto sell their asset or buy the asset from the other party, depending on whether\nthe price is higher or lower than their private valuations. A trade happens if\none trader is willing to sell and the other is willing to buy at the proposed\nprice. Previous work provided guidance to a broker aiming at enhancing traders'\ntotal earnings by maximizing the gain from trade, defined as the sum of the\ntraders' net utilities after each interaction. In contrast, we investigate how\nthe broker should behave to maximize the trading volume, i.e., the total number\nof trades. We model the traders' valuations as an i.i.d. process with an\nunknown distribution. If the traders' valuations are revealed after each\ninteraction (full-feedback), and the traders' valuations cumulative\ndistribution function (cdf) is continuous, we provide an algorithm achieving\nlogarithmic regret and show its optimality up to constant factors. If only\ntheir willingness to sell or buy at the proposed price is revealed after each\ninteraction ($2$-bit feedback), we provide an algorithm achieving\npoly-logarithmic regret when the traders' valuations cdf is Lipschitz and show\nthat this rate is near-optimal. We complement our results by analyzing the\nimplications of dropping the regularity assumptions on the unknown traders'\nvaluations cdf. If we drop the continuous cdf assumption, the regret rate\ndegrades to $\\Theta(\\sqrt{T})$ in the full-feedback case, where $T$ is the time\nhorizon. If we drop the Lipschitz cdf assumption, learning becomes impossible\nin the $2$-bit feedback case.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"28 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Trading Volume Maximization with Online Learning\",\"authors\":\"Tommaso Cesari, Roberto Colomboni\",\"doi\":\"arxiv-2405.13102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We explore brokerage between traders in an online learning framework. At any\\nround $t$, two traders meet to exchange an asset, provided the exchange is\\nmutually beneficial. The broker proposes a trading price, and each trader tries\\nto sell their asset or buy the asset from the other party, depending on whether\\nthe price is higher or lower than their private valuations. A trade happens if\\none trader is willing to sell and the other is willing to buy at the proposed\\nprice. Previous work provided guidance to a broker aiming at enhancing traders'\\ntotal earnings by maximizing the gain from trade, defined as the sum of the\\ntraders' net utilities after each interaction. In contrast, we investigate how\\nthe broker should behave to maximize the trading volume, i.e., the total number\\nof trades. We model the traders' valuations as an i.i.d. process with an\\nunknown distribution. If the traders' valuations are revealed after each\\ninteraction (full-feedback), and the traders' valuations cumulative\\ndistribution function (cdf) is continuous, we provide an algorithm achieving\\nlogarithmic regret and show its optimality up to constant factors. If only\\ntheir willingness to sell or buy at the proposed price is revealed after each\\ninteraction ($2$-bit feedback), we provide an algorithm achieving\\npoly-logarithmic regret when the traders' valuations cdf is Lipschitz and show\\nthat this rate is near-optimal. We complement our results by analyzing the\\nimplications of dropping the regularity assumptions on the unknown traders'\\nvaluations cdf. If we drop the continuous cdf assumption, the regret rate\\ndegrades to $\\\\Theta(\\\\sqrt{T})$ in the full-feedback case, where $T$ is the time\\nhorizon. If we drop the Lipschitz cdf assumption, learning becomes impossible\\nin the $2$-bit feedback case.\",\"PeriodicalId\":501294,\"journal\":{\"name\":\"arXiv - QuantFin - Computational Finance\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Computational Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.13102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.13102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We explore brokerage between traders in an online learning framework. At any
round $t$, two traders meet to exchange an asset, provided the exchange is
mutually beneficial. The broker proposes a trading price, and each trader tries
to sell their asset or buy the asset from the other party, depending on whether
the price is higher or lower than their private valuations. A trade happens if
one trader is willing to sell and the other is willing to buy at the proposed
price. Previous work provided guidance to a broker aiming at enhancing traders'
total earnings by maximizing the gain from trade, defined as the sum of the
traders' net utilities after each interaction. In contrast, we investigate how
the broker should behave to maximize the trading volume, i.e., the total number
of trades. We model the traders' valuations as an i.i.d. process with an
unknown distribution. If the traders' valuations are revealed after each
interaction (full-feedback), and the traders' valuations cumulative
distribution function (cdf) is continuous, we provide an algorithm achieving
logarithmic regret and show its optimality up to constant factors. If only
their willingness to sell or buy at the proposed price is revealed after each
interaction ($2$-bit feedback), we provide an algorithm achieving
poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show
that this rate is near-optimal. We complement our results by analyzing the
implications of dropping the regularity assumptions on the unknown traders'
valuations cdf. If we drop the continuous cdf assumption, the regret rate
degrades to $\Theta(\sqrt{T})$ in the full-feedback case, where $T$ is the time
horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible
in the $2$-bit feedback case.