Non-Stochastic CDF Estimation Using Threshold Queries

Princewill Okoroafor, Vaishnavi Gupta, Robert D. Kleinberg, Eleanor Goh
{"title":"Non-Stochastic CDF Estimation Using Threshold Queries","authors":"Princewill Okoroafor, Vaishnavi Gupta, Robert D. Kleinberg, Eleanor Goh","doi":"10.48550/arXiv.2301.05682","DOIUrl":null,"url":null,"abstract":"Estimating the empirical distribution of a scalar-valued data set is a basic and fundamental task. In this paper, we tackle the problem of estimating an empirical distribution in a setting with two challenging features. First, the algorithm does not directly observe the data; instead, it only asks a limited number of threshold queries about each sample. Second, the data are not assumed to be independent and identically distributed; instead, we allow for an arbitrary process generating the samples, including an adaptive adversary. These considerations are relevant, for example, when modeling a seller experimenting with posted prices to estimate the distribution of consumers' willingness to pay for a product: offering a price and observing a consumer's purchase decision is equivalent to asking a single threshold query about their value, and the distribution of consumers' values may be non-stationary over time, as early adopters may differ markedly from late adopters. Our main result quantifies, to within a constant factor, the sample complexity of estimating the empirical CDF of a sequence of elements of $[n]$, up to $\\varepsilon$ additive error, using one threshold query per sample. The complexity depends only logarithmically on $n$, and our result can be interpreted as extending the existing logarithmic-complexity results for noisy binary search to the more challenging setting where noise is non-stochastic. Along the way to designing our algorithm, we consider a more general model in which the algorithm is allowed to make a limited number of simultaneous threshold queries on each sample. We solve this problem using Blackwell's Approachability Theorem and the exponential weights method. As a side result of independent interest, we characterize the minimum number of simultaneous threshold queries required by deterministic CDF estimation algorithms.","PeriodicalId":92709,"journal":{"name":"Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms","volume":"24 1","pages":"3551-3572"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.05682","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Estimating the empirical distribution of a scalar-valued data set is a basic and fundamental task. In this paper, we tackle the problem of estimating an empirical distribution in a setting with two challenging features. First, the algorithm does not directly observe the data; instead, it only asks a limited number of threshold queries about each sample. Second, the data are not assumed to be independent and identically distributed; instead, we allow for an arbitrary process generating the samples, including an adaptive adversary. These considerations are relevant, for example, when modeling a seller experimenting with posted prices to estimate the distribution of consumers' willingness to pay for a product: offering a price and observing a consumer's purchase decision is equivalent to asking a single threshold query about their value, and the distribution of consumers' values may be non-stationary over time, as early adopters may differ markedly from late adopters. Our main result quantifies, to within a constant factor, the sample complexity of estimating the empirical CDF of a sequence of elements of $[n]$, up to $\varepsilon$ additive error, using one threshold query per sample. The complexity depends only logarithmically on $n$, and our result can be interpreted as extending the existing logarithmic-complexity results for noisy binary search to the more challenging setting where noise is non-stochastic. Along the way to designing our algorithm, we consider a more general model in which the algorithm is allowed to make a limited number of simultaneous threshold queries on each sample. We solve this problem using Blackwell's Approachability Theorem and the exponential weights method. As a side result of independent interest, we characterize the minimum number of simultaneous threshold queries required by deterministic CDF estimation algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用阈值查询的非随机CDF估计
估计标量值数据集的经验分布是一项基本和基本的任务。在本文中,我们解决了在具有两个挑战性特征的设置中估计经验分布的问题。首先,该算法不直接观察数据;相反,它只询问关于每个样本的有限数量的阈值查询。其次,不假设数据是独立和同分布的;相反,我们允许任意过程生成样本,包括自适应对手。这些考虑因素是相关的,例如,当对一个卖家进行实验,用公布的价格来估计消费者购买产品的意愿分布时:提供价格并观察消费者的购买决定相当于询问关于其价值的单一阈值查询,消费者价值的分布可能随着时间的推移而非平稳,因为早期采用者可能与后期采用者明显不同。我们的主要结果量化,在一个常数因子内,估计$[n]$元素序列的经验CDF的样本复杂性,直到$\varepsilon$加性误差,每个样本使用一个阈值查询。复杂度仅以对数方式依赖于$n$,我们的结果可以被解释为将现有的噪声二叉搜索的对数复杂度结果扩展到噪声是非随机的更具挑战性的设置。在设计算法的过程中,我们考虑了一个更通用的模型,在这个模型中,算法允许对每个样本同时进行有限数量的阈值查询。我们利用Blackwell的可接近性定理和指数权重法解决了这个问题。作为独立兴趣的附带结果,我们描述了确定性CDF估计算法所需的同时阈值查询的最小数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.50
自引率
0.00%
发文量
0
期刊最新文献
A Polynomial Time Algorithm for Finding a Minimum 4-Partition of a Submodular Function Player-optimal Stable Regret for Bandit Learning in Matching Markets Optimal Square Detection Over General Alphabets Fully Dynamic Exact Edge Connectivity in Sublinear Time Maximal k-Edge-Connected Subgraphs in Weighted Graphs via Local Random Contraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1