Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error

IF 1.6 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Journal of Survey Statistics and Methodology Pub Date : 2023-06-10 DOI:10.1093/jssam/smad016
David Dutwin, Patrick Coyle, I. Bilgen, N. English
{"title":"Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error","authors":"David Dutwin, Patrick Coyle, I. Bilgen, N. English","doi":"10.1093/jssam/smad016","DOIUrl":null,"url":null,"abstract":"\n Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Survey Statistics and Methodology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jssam/smad016","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用多来源大数据的预测建模,提高样本效率,减少调查无响应误差
大数据作为调查数据的补充,有时作为替代,在最好的情况下,作为“力量倍增器”来提高调查分析和洞察力。我们详细介绍了一个用例,即大数据分类器(BDC),以替代更传统的针对特定家庭和个人属性进行调查抽样的目标家庭方法。与地理定位和商业供应商标志的使用非常相似,我们详细描述了bdc预测任何给定家庭(例如,家庭中有孩子或西班牙裔人)的可能性的能力。我们专门构建了15个bdc,结合了来自全国代表性的基于概率的大型面板的数据和来自公共和私人来源的一系列大数据,然后评估了这些bdc在三个大型调查数据集中成功预测其预测属性范围的有效性。对于每个BDC和每个数据应用程序,我们将BDC与地理聚类和供应商标志的历史样本目标技术的相对有效性进行了比较。总体而言,bdc在针对亚群体的能力方面略有提高。我们发现预测的类别始终更有效,而其他bdc与供应商标记相当,尽管总是优于地理聚类。我们提出了一些bdc的相对优势和劣势,作为识别和随后采样低发病率和其他人群的新方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.30
自引率
9.50%
发文量
40
期刊介绍: The Journal of Survey Statistics and Methodology, sponsored by AAPOR and the American Statistical Association, began publishing in 2013. Its objective is to publish cutting edge scholarly articles on statistical and methodological issues for sample surveys, censuses, administrative record systems, and other related data. It aims to be the flagship journal for research on survey statistics and methodology. Topics of interest include survey sample design, statistical inference, nonresponse, measurement error, the effects of modes of data collection, paradata and responsive survey design, combining data from multiple sources, record linkage, disclosure limitation, and other issues in survey statistics and methodology. The journal publishes both theoretical and applied papers, provided the theory is motivated by an important applied problem and the applied papers report on research that contributes generalizable knowledge to the field. Review papers are also welcomed. Papers on a broad range of surveys are encouraged, including (but not limited to) surveys concerning business, economics, marketing research, social science, environment, epidemiology, biostatistics and official statistics. The journal has three sections. The Survey Statistics section presents papers on innovative sampling procedures, imputation, weighting, measures of uncertainty, small area inference, new methods of analysis, and other statistical issues related to surveys. The Survey Methodology section presents papers that focus on methodological research, including methodological experiments, methods of data collection and use of paradata. The Applications section contains papers involving innovative applications of methods and providing practical contributions and guidance, and/or significant new findings.
期刊最新文献
Recent Innovations and Advances in Mixed-Mode Surveys Effects of a Web–Mail Mode on Response Rates and Responses to a Care Experience Survey: Results of a Randomized Experiment The efficacy of propensity score matching for separating selection and measurement effects across different survey modes Bayesian Multisource Hierarchical Models with Applications to the Monthly Retail Trade Survey Sequential and Concurrent Mixed-Mode Designs: A Tailored Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1