Pretest estimation in combining probability and non-probability samples

IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Electronic Journal of Statistics Pub Date : 2023-01-01 DOI:10.1214/23-ejs2137
Chenyin Gao, Shu Yang
{"title":"Pretest estimation in combining probability and non-probability samples","authors":"Chenyin Gao, Shu Yang","doi":"10.1214/23-ejs2137","DOIUrl":null,"url":null,"abstract":"Multiple heterogeneous data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we develop a unified framework of the test-and-pool approach to general parameter estimation by combining gold-standard probability and non-probability samples. We focus on the case when the study variable is observed in both datasets for estimating the target parameters, and each contains other auxiliary variables. Utilizing the probability design, we conduct a pretest procedure to determine the comparability of the non-probability data with the probability data and decide whether or not to leverage the non-probability data in a pooled analysis. When the probability and non-probability data are comparable, our approach combines both data for efficient estimation. Otherwise, we retain only the probability data for estimation. We also characterize the asymptotic distribution of the proposed test-and-pool estimator under a local alternative and provide a data-adaptive procedure to select the critical tuning parameters that target the smallest mean square error of the test-and-pool estimator. Lastly, to deal with the non-regularity of the test-and-pool estimator, we construct a robust confidence interval that has a good finite-sample coverage property.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Journal of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/23-ejs2137","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 2

Abstract

Multiple heterogeneous data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we develop a unified framework of the test-and-pool approach to general parameter estimation by combining gold-standard probability and non-probability samples. We focus on the case when the study variable is observed in both datasets for estimating the target parameters, and each contains other auxiliary variables. Utilizing the probability design, we conduct a pretest procedure to determine the comparability of the non-probability data with the probability data and decide whether or not to leverage the non-probability data in a pooled analysis. When the probability and non-probability data are comparable, our approach combines both data for efficient estimation. Otherwise, we retain only the probability data for estimation. We also characterize the asymptotic distribution of the proposed test-and-pool estimator under a local alternative and provide a data-adaptive procedure to select the critical tuning parameters that target the smallest mean square error of the test-and-pool estimator. Lastly, to deal with the non-regularity of the test-and-pool estimator, we construct a robust confidence interval that has a good finite-sample coverage property.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合概率和非概率样本的预测估计
在大数据时代,多种异构数据源越来越多地可用于统计分析。作为有限总体推断中的一个重要例子,我们通过结合金标准概率和非概率样本,开发了一个通用参数估计的测试和池方法的统一框架。我们关注的情况是,在两个数据集中都观察到研究变量,用于估计目标参数,并且每个数据集都包含其他辅助变量。利用概率设计,我们进行了一个预测试程序,以确定非概率数据与概率数据的可比性,并决定是否在汇总分析中利用非概率数据。当概率数据和非概率数据具有可比性时,我们的方法将这两个数据结合起来进行有效估计。否则,我们只保留概率数据进行估计。我们还刻画了所提出的测试和池估计器在局部备选方案下的渐近分布,并提供了一个数据自适应过程来选择关键的调谐参数,该参数的目标是测试和池估计量的最小均方误差。最后,为了解决检验和池估计的非正则性,我们构造了一个具有良好有限样本覆盖性质的鲁棒置信区间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Electronic Journal of Statistics
Electronic Journal of Statistics STATISTICS & PROBABILITY-
CiteScore
1.80
自引率
9.10%
发文量
100
审稿时长
3 months
期刊介绍: The Electronic Journal of Statistics (EJS) publishes research articles and short notes on theoretical, computational and applied statistics. The journal is open access. Articles are refereed and are held to the same standard as articles in other IMS journals. Articles become publicly available shortly after they are accepted.
期刊最新文献
Direct Bayesian linear regression for distribution-valued covariates. Statistical inference via conditional Bayesian posteriors in high-dimensional linear regression Subnetwork estimation for spatial autoregressive models in large-scale networks Tests for high-dimensional single-index models Variable selection for single-index varying-coefficients models with applications to synergistic G × E interactions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1