Player-optimal Stable Regret for Bandit Learning in Matching Markets

Fang-yuan Kong, Shuai Li
{"title":"Player-optimal Stable Regret for Bandit Learning in Matching Markets","authors":"Fang-yuan Kong, Shuai Li","doi":"10.1137/1.9781611977554.ch55","DOIUrl":null,"url":null,"abstract":"The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \\citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by $O(K\\log T/\\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\\Delta$ is the players' minimum preference gap among the first $N+1$-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.","PeriodicalId":92709,"journal":{"name":"Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms","volume":"16 1","pages":"1512-1522"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/1.9781611977554.ch55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\Delta$ is the players' minimum preference gap among the first $N+1$-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
匹配市场中盗贼学习的玩家最优稳定遗憾
匹配市场问题由于其广泛的应用范围,在文献中已经被研究了很长时间。寻找一个稳定的匹配是这一问题中常见的平衡目标。由于市场参与者通常不确定自己的偏好,最近有大量研究在线环境的工作,其中一方参与者(玩家)从与另一方(手臂)的反复互动中了解他们未知的偏好。这方面以往的研究大多只能推导出玩家-悲观稳定遗憾的理论保证,而玩家-悲观稳定遗憾是通过玩家最不喜欢的稳定匹配来定义的。而在悲观稳定匹配下,参与者在所有稳定匹配中获得的奖励最少。为了最大化玩家的利益,玩家最优的稳定匹配是最理想的。虽然\citet{basu21beyond}成功地给出了玩家最优稳定后悔的上限,但如果玩家的偏好差距很小,他们的结果可能会呈指数级增长。这种遗憾是否存在多项式保证是一个重要但仍未解决的问题。在这项工作中,我们提供了一种名为探索-然后- gale - shapley (ETGS)的新算法,并表明每个参与者的最优稳定后悔可以由$O(K\log T/\Delta^2)$上界,其中$K$是手臂的数量,$T$是视界,$\Delta$是参与者在前$N+1$ -排名手臂之间的最小偏好差距。这一结果显著改善了以前的工作,这些工作要么具有较弱的参与者悲观稳定匹配目标,要么仅适用于具有特殊假设的市场。当参与者的偏好满足某些特殊条件时,我们的遗憾上界也与之前导出的下界匹配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.50
自引率
0.00%
发文量
0
期刊最新文献
A Polynomial Time Algorithm for Finding a Minimum 4-Partition of a Submodular Function Player-optimal Stable Regret for Bandit Learning in Matching Markets Optimal Square Detection Over General Alphabets Fully Dynamic Exact Edge Connectivity in Sublinear Time Maximal k-Edge-Connected Subgraphs in Weighted Graphs via Local Random Contraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1