Competing Bandits in Time Varying Matching Markets

Conference on Learning for Dynamics & Control Pub Date : 2022-10-21 DOI:10.48550/arXiv.2210.11692

Deepan Muthirayan, C. Maheshwari, P. Khargonekar, S. Sastry

引用次数: 0

Abstract

We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preferences of the agents, $L_T$. Therefore, we show that the optimal rates for single-agent learning can be achieved in spite of the competition up to a difference of a constant factor. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

时变匹配市场中的竞争强盗

我们研究了双边非平稳匹配市场中的在线学习问题，其目标是收敛到一个稳定匹配。特别地，我们考虑这样一种情况，即市场的一方，即武器，对另一方，即参与者有固定的已知偏好。当参与者有固定但未知的偏好时，这个问题已经被研究过，在这项工作中，我们研究的问题是，当参与者的偏好随时间变化且未知时，如何学习。我们的贡献是一种可以处理任何类型的偏好结构和变化场景的方法。我们表明，使用所提出的算法，每个参与者都收到一个统一的亚线性遗憾{$\ widdetilde {\mathcal{O}}(L^{1/2}_TT^{1/2})$}，直至代理的潜在偏好的变化次数$L_T$。因此，我们证明了单智能体学习的最优速率可以在竞争达到一个常数因素差异的情况下实现。我们还讨论了该算法的扩展到不需要先验地知道变化数量的情况。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Conference on Learning for Dynamics & Control

自引率

0.00%

发文量