Sebastian Perez-Salazar, Mohit Singh, Alejandro Toriello
{"title":"Robust Online Selection with Uncertain Offer Acceptance","authors":"Sebastian Perez-Salazar, Mohit Singh, Alejandro Toriello","doi":"10.1287/moor.2023.0210","DOIUrl":null,"url":null,"abstract":"Online advertising has motivated interest in online selection problems. Displaying ads to the right users benefits both the platform (e.g., via pay-per-click) and the advertisers (by increasing their reach). In practice, not all users click on displayed ads, while the platform’s algorithm may miss the users most disposed to do so. This mismatch decreases the platform’s revenue and the advertiser’s chances to reach the right customers. With this motivation, we propose a secretary problem where a candidate may or may not accept an offer according to a known probability p. Because we do not know the top candidate willing to accept an offer, the goal is to maximize a robust objective defined as the minimum over integers k of the probability of choosing one of the top k candidates, given that one of these candidates will accept an offer. Using Markov decision process theory, we derive a linear program for this max-min objective whose solution encodes an optimal policy. The derivation may be of independent interest, as it is generalizable and can be used to obtain linear programs for many online selection models. We further relax this linear program into an infinite counterpart, which we use to provide bounds for the objective and closed-form policies. For [Formula: see text], an optimal policy is a simple threshold rule that observes the first [Formula: see text] fraction of candidates and subsequently makes offers to the best candidate observed so far.Funding: Financial support from the U.S. National Science Foundation [Grants CCF-2106444, CCF-1910423, and CMMI 1552479] is gratefully acknowledged.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"8 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics of Operations Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1287/moor.2023.0210","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
Online advertising has motivated interest in online selection problems. Displaying ads to the right users benefits both the platform (e.g., via pay-per-click) and the advertisers (by increasing their reach). In practice, not all users click on displayed ads, while the platform’s algorithm may miss the users most disposed to do so. This mismatch decreases the platform’s revenue and the advertiser’s chances to reach the right customers. With this motivation, we propose a secretary problem where a candidate may or may not accept an offer according to a known probability p. Because we do not know the top candidate willing to accept an offer, the goal is to maximize a robust objective defined as the minimum over integers k of the probability of choosing one of the top k candidates, given that one of these candidates will accept an offer. Using Markov decision process theory, we derive a linear program for this max-min objective whose solution encodes an optimal policy. The derivation may be of independent interest, as it is generalizable and can be used to obtain linear programs for many online selection models. We further relax this linear program into an infinite counterpart, which we use to provide bounds for the objective and closed-form policies. For [Formula: see text], an optimal policy is a simple threshold rule that observes the first [Formula: see text] fraction of candidates and subsequently makes offers to the best candidate observed so far.Funding: Financial support from the U.S. National Science Foundation [Grants CCF-2106444, CCF-1910423, and CMMI 1552479] is gratefully acknowledged.
在线广告激发了人们对在线选择问题的兴趣。向合适的用户展示广告既有利于平台(例如通过点击付费),也有利于广告商(通过扩大覆盖面)。实际上,并非所有用户都会点击显示的广告,而平台的算法可能会错过最有意愿点击广告的用户。这种不匹配降低了平台的收入和广告商接触到合适客户的机会。由于我们不知道最愿意接受报价的候选人是谁,因此我们的目标是最大化一个稳健目标,该目标被定义为在前 k 名候选人中有一人接受报价的情况下,选择其中一名候选人的概率在整数 k 上的最小值。利用马尔可夫决策过程理论,我们为这个最大最小目标推导出了一个线性程序,其解法编码了一个最优策略。这一推导可能具有独立的意义,因为它具有通用性,可用于获得许多在线选择模型的线性程序。我们进一步将这个线性程序放宽为无限对应程序,并用它来提供目标和闭式策略的边界。对于[公式:见正文],最优策略是一个简单的阈值规则,即观察第一批[公式:见正文]候选人,然后向迄今为止观察到的最佳候选人发出邀请:感谢美国国家科学基金会[CCF-2106444、CCF-1910423 和 CMMI 1552479]的资助。
期刊介绍:
Mathematics of Operations Research is an international journal of the Institute for Operations Research and the Management Sciences (INFORMS). The journal invites articles concerned with the mathematical and computational foundations in the areas of continuous, discrete, and stochastic optimization; mathematical programming; dynamic programming; stochastic processes; stochastic models; simulation methodology; control and adaptation; networks; game theory; and decision theory. Also sought are contributions to learning theory and machine learning that have special relevance to decision making, operations research, and management science. The emphasis is on originality, quality, and importance; correctness alone is not sufficient. Significant developments in operations research and management science not having substantial mathematical interest should be directed to other journals such as Management Science or Operations Research.