Princewill Okoroafor, Vaishnavi Gupta, Robert D. Kleinberg, Eleanor Goh
{"title":"Non-Stochastic CDF Estimation Using Threshold Queries","authors":"Princewill Okoroafor, Vaishnavi Gupta, Robert D. Kleinberg, Eleanor Goh","doi":"10.48550/arXiv.2301.05682","DOIUrl":null,"url":null,"abstract":"Estimating the empirical distribution of a scalar-valued data set is a basic and fundamental task. In this paper, we tackle the problem of estimating an empirical distribution in a setting with two challenging features. First, the algorithm does not directly observe the data; instead, it only asks a limited number of threshold queries about each sample. Second, the data are not assumed to be independent and identically distributed; instead, we allow for an arbitrary process generating the samples, including an adaptive adversary. These considerations are relevant, for example, when modeling a seller experimenting with posted prices to estimate the distribution of consumers' willingness to pay for a product: offering a price and observing a consumer's purchase decision is equivalent to asking a single threshold query about their value, and the distribution of consumers' values may be non-stationary over time, as early adopters may differ markedly from late adopters. Our main result quantifies, to within a constant factor, the sample complexity of estimating the empirical CDF of a sequence of elements of $[n]$, up to $\\varepsilon$ additive error, using one threshold query per sample. The complexity depends only logarithmically on $n$, and our result can be interpreted as extending the existing logarithmic-complexity results for noisy binary search to the more challenging setting where noise is non-stochastic. Along the way to designing our algorithm, we consider a more general model in which the algorithm is allowed to make a limited number of simultaneous threshold queries on each sample. We solve this problem using Blackwell's Approachability Theorem and the exponential weights method. As a side result of independent interest, we characterize the minimum number of simultaneous threshold queries required by deterministic CDF estimation algorithms.","PeriodicalId":92709,"journal":{"name":"Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms","volume":"24 1","pages":"3551-3572"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.05682","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Estimating the empirical distribution of a scalar-valued data set is a basic and fundamental task. In this paper, we tackle the problem of estimating an empirical distribution in a setting with two challenging features. First, the algorithm does not directly observe the data; instead, it only asks a limited number of threshold queries about each sample. Second, the data are not assumed to be independent and identically distributed; instead, we allow for an arbitrary process generating the samples, including an adaptive adversary. These considerations are relevant, for example, when modeling a seller experimenting with posted prices to estimate the distribution of consumers' willingness to pay for a product: offering a price and observing a consumer's purchase decision is equivalent to asking a single threshold query about their value, and the distribution of consumers' values may be non-stationary over time, as early adopters may differ markedly from late adopters. Our main result quantifies, to within a constant factor, the sample complexity of estimating the empirical CDF of a sequence of elements of $[n]$, up to $\varepsilon$ additive error, using one threshold query per sample. The complexity depends only logarithmically on $n$, and our result can be interpreted as extending the existing logarithmic-complexity results for noisy binary search to the more challenging setting where noise is non-stochastic. Along the way to designing our algorithm, we consider a more general model in which the algorithm is allowed to make a limited number of simultaneous threshold queries on each sample. We solve this problem using Blackwell's Approachability Theorem and the exponential weights method. As a side result of independent interest, we characterize the minimum number of simultaneous threshold queries required by deterministic CDF estimation algorithms.