Prior-free Data Acquisition for Accurate Statistical Estimation

Yiling Chen, Shuran Zheng
{"title":"Prior-free Data Acquisition for Accurate Statistical Estimation","authors":"Yiling Chen, Shuran Zheng","doi":"10.1145/3328526.3329564","DOIUrl":null,"url":null,"abstract":"We study a data analyst's problem of acquiring data from self-interested individuals to obtain an accurate estimation of some statistic of a population, subject to an expected budget constraint. Each data holder incurs a cost, which is unknown to the data analyst, to acquire and report his data. The cost can be arbitrarily correlated with the data. The data analyst has an expected budget that she can use to incentivize individuals to provide their data. The goal is to design a joint acquisition-estimation mechanism to optimize the performance of the produced estimator, without any prior information on the underlying distribution of cost and data. We investigate two types of estimations: unbiased point estimation and confidence interval estimation. Unbiased estimators: We design a truthful, individually rational, online mechanism to acquire data from individuals and output an unbiased estimator of the population mean when the data analyst has no prior information on the cost-data distribution and individuals arrive in a random order. The performance of this mechanism matches that of the optimal mechanism, which knows the true cost distribution, within a constant factor. The performance of an estimator is evaluated by its variance under the worst-case cost-data correlation. Confidence intervals: We characterize an approximately optimal (within a factor 2) mechanism for obtaining a confidence interval of the population mean when the data analyst knows the true cost distribution at the beginning. This mechanism is efficiently computable. We then design a truthful, individually rational, online algorithm that is only worse than the approximately optimal mechanism by a constant factor. The performance of an estimator is evaluated by its expected length under the worst-case cost-data correlation.","PeriodicalId":416173,"journal":{"name":"Proceedings of the 2019 ACM Conference on Economics and Computation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3328526.3329564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

We study a data analyst's problem of acquiring data from self-interested individuals to obtain an accurate estimation of some statistic of a population, subject to an expected budget constraint. Each data holder incurs a cost, which is unknown to the data analyst, to acquire and report his data. The cost can be arbitrarily correlated with the data. The data analyst has an expected budget that she can use to incentivize individuals to provide their data. The goal is to design a joint acquisition-estimation mechanism to optimize the performance of the produced estimator, without any prior information on the underlying distribution of cost and data. We investigate two types of estimations: unbiased point estimation and confidence interval estimation. Unbiased estimators: We design a truthful, individually rational, online mechanism to acquire data from individuals and output an unbiased estimator of the population mean when the data analyst has no prior information on the cost-data distribution and individuals arrive in a random order. The performance of this mechanism matches that of the optimal mechanism, which knows the true cost distribution, within a constant factor. The performance of an estimator is evaluated by its variance under the worst-case cost-data correlation. Confidence intervals: We characterize an approximately optimal (within a factor 2) mechanism for obtaining a confidence interval of the population mean when the data analyst knows the true cost distribution at the beginning. This mechanism is efficiently computable. We then design a truthful, individually rational, online algorithm that is only worse than the approximately optimal mechanism by a constant factor. The performance of an estimator is evaluated by its expected length under the worst-case cost-data correlation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于准确统计估计的无先验数据采集
我们研究了一个数据分析师的问题,即在预期预算约束下,从自利的个人那里获取数据,以获得对群体某些统计数据的准确估计。每个数据持有者都有获取和报告数据的成本,这是数据分析师所不知道的。成本可以任意地与数据相关联。数据分析师有一个预期预算,她可以用它来激励个人提供他们的数据。目标是设计一个联合获取-估计机制来优化生成的估计器的性能,而不需要任何关于成本和数据的潜在分布的先验信息。我们研究了两种类型的估计:无偏点估计和置信区间估计。无偏估计:我们设计了一个真实的,个体理性的,在线的机制,从个体获取数据,当数据分析师没有关于成本数据分布的先验信息,并且个体以随机顺序到达时,输出总体均值的无偏估计。该机制的性能与知道真实成本分布的最优机制的性能在一个常数因子内相匹配。估计器的性能是通过在最坏情况下成本-数据相关性下的方差来评价的。置信区间:当数据分析师一开始就知道真实的成本分布时,我们描述了一种近似最优(在因子2以内)的机制,用于获得总体均值的置信区间。这种机制是可有效计算的。然后,我们设计了一个真实的、个体理性的在线算法,它只比近似最优机制差一个常数因子。估计器的性能是通过其在最坏情况下的期望长度来评估的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Computing Core-Stable Outcomes in Combinatorial Exchanges with Financially Constrained Bidders No Stratification Without Representation How to Sell a Dataset? Pricing Policies for Data Monetization Prophet Inequalities for I.I.D. Random Variables from an Unknown Distribution Incorporating Compatible Pairs in Kidney Exchange: A Dynamic Weighted Matching Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1