首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
Effects of Address Coverage Enhancement on Estimates from Address-Based Sampling Studies. 地址覆盖增强对基于地址的抽样研究估计的影响。
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-04-01 DOI: 10.1093/jssam/smab032
Michael Jones, J Michael Brick, Wendy Van De Kerckhove

For over a decade, address-based sampling (ABS) frames have often been used to draw samples for multistage area sample surveys in lieu of traditionally listed (or enumerated) address frames. However, it is well known that the use of ABS frames for face-to-face surveys suffer from undercoverage due to, for example, households that receive mail via a PO Box rather than being delivered to the household's street address. Undercoverage of ABS frames has typically been more prominent in rural areas but can also occur in urban areas where recent construction of households has taken place. Procedures have been developed to supplement ABS frames to address this undercoverage. In this article, we investigate a procedure called Address Coverage Enhancement (ACE) that supplements the ABS frame with addresses not found on the frame, and the resulting effects the addresses added to the sample through ACE have on estimates. Weighted estimates from two studies, the Population Assessment of Tobacco and Health Study and the 2017 US Program for the International Assessment of Adult Competencies, are calculated with and without supplemental addresses. Estimates are then calculated to assess if poststratifying analysis weights to control for urbanicity at the person level brings estimates closer to estimates from the supplemented frame. Our findings show that the noncoverage bias was likely minimal across both studies for a range of estimates. The main reason is because the Computerized Delivery Sequence file coverage rate is high, and when the coverage rate is high, only very large differences between the covered and not covered will result in meaningful bias.

十多年来,基于地址的抽样(ABS)框架经常被用来为多阶段区域抽样调查抽取样本,而不是传统的列出(或枚举)地址框架。然而,众所周知,使用ABS框架进行面对面调查会受到覆盖不足的影响,例如,由于家庭通过邮政信箱接收邮件,而不是送到家庭的街道地址。ABS框架的覆盖不足通常在农村地区更为突出,但也可能发生在最近建房的城市地区。已经制定了程序来补充ABS框架,以解决这一覆盖不足的问题。在本文中,我们研究了一个称为地址覆盖增强(ACE)的过程,该过程用帧上没有找到的地址补充ABS帧,以及通过ACE添加到样本中的地址对估计的最终影响。来自两项研究的加权估计,烟草与健康研究的人口评估和2017年美国成人能力国际评估计划,在有和没有补充地址的情况下进行计算。然后计算估计值,以评估在个人水平上控制城市化的后分层分析权重是否使估计值更接近补充框架的估计值。我们的研究结果表明,在两项研究的估计范围内,非覆盖偏倚可能是最小的。主要原因是因为计算机化交付序列文件的覆盖率很高,当覆盖率很高时,只有覆盖和未覆盖之间的非常大的差异才会导致有意义的偏差。
{"title":"Effects of Address Coverage Enhancement on Estimates from Address-Based Sampling Studies.","authors":"Michael Jones,&nbsp;J Michael Brick,&nbsp;Wendy Van De Kerckhove","doi":"10.1093/jssam/smab032","DOIUrl":"https://doi.org/10.1093/jssam/smab032","url":null,"abstract":"<p><p>For over a decade, address-based sampling (ABS) frames have often been used to draw samples for multistage area sample surveys in lieu of traditionally listed (or enumerated) address frames. However, it is well known that the use of ABS frames for face-to-face surveys suffer from undercoverage due to, for example, households that receive mail via a PO Box rather than being delivered to the household's street address. Undercoverage of ABS frames has typically been more prominent in rural areas but can also occur in urban areas where recent construction of households has taken place. Procedures have been developed to supplement ABS frames to address this undercoverage. In this article, we investigate a procedure called Address Coverage Enhancement (ACE) that supplements the ABS frame with addresses not found on the frame, and the resulting effects the addresses added to the sample through ACE have on estimates. Weighted estimates from two studies, the Population Assessment of Tobacco and Health Study and the 2017 US Program for the International Assessment of Adult Competencies, are calculated with and without supplemental addresses. Estimates are then calculated to assess if poststratifying analysis weights to control for urbanicity at the person level brings estimates closer to estimates from the supplemented frame. Our findings show that the noncoverage bias was likely minimal across both studies for a range of estimates. The main reason is because the Computerized Delivery Sequence file coverage rate is high, and when the coverage rate is high, only very large differences between the covered and not covered will result in meaningful bias.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10080217/pdf/smab032.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9274583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deriving Priors for Bayesian Prediction of Daily Response Propensity in Responsive Survey Design: Historical Data Analysis Versus Literature Review. 在响应性调查设计中推导贝叶斯预测每日反应倾向的先验:历史数据分析与文献综述。
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-04-01 DOI: 10.1093/jssam/smab036
Brady T West, James Wagner, Stephanie Coffey, Michael R Elliott

Responsive survey design (RSD) aims to increase the efficiency of survey data collection via live monitoring of paradata and the introduction of protocol changes when survey errors and increased costs seem imminent. Daily predictions of response propensity for all active sampled cases are among the most important quantities for live monitoring of data collection outcomes, making sound predictions of these propensities essential for the success of RSD. Because it relies on real-time updates of prior beliefs about key design quantities, such as predicted response propensities, RSD stands to benefit from Bayesian approaches. However, empirical evidence of the merits of these approaches is lacking in the literature, and the derivation of informative prior distributions is required for these approaches to be effective. In this paper, we evaluate the ability of two approaches to deriving prior distributions for the coefficients defining daily response propensity models to improve predictions of daily response propensity in a real data collection employing RSD. The first approach involves analyses of historical data from the same survey, and the second approach involves literature review. We find that Bayesian methods based on these two approaches result in higher-quality predictions of response propensity than more standard approaches ignoring prior information. This is especially true during the early-to-middle periods of data collection, when survey managers using RSD often consider interventions.

响应式调查设计(Responsive survey design, RSD)旨在提高调查数据收集的效率,通过实时监测数据,并在调查错误和成本增加迫在眉睫时引入协议变更。对所有活跃样本病例的响应倾向的每日预测是实时监测数据收集结果的最重要数量之一,对这些倾向做出合理的预测对于RSD的成功至关重要。由于RSD依赖于对关键设计量的实时更新,例如预测的响应倾向,因此RSD可以从贝叶斯方法中获益。然而,文献中缺乏这些方法优点的经验证据,并且为了使这些方法有效,需要推导信息先验分布。在本文中,我们评估了两种方法的能力,以获得定义每日反应倾向模型的系数的先验分布,以改进使用RSD在实际数据收集中对每日反应倾向的预测。第一种方法涉及对同一调查的历史数据进行分析,第二种方法涉及文献综述。我们发现基于这两种方法的贝叶斯方法比忽略先验信息的标准方法对反应倾向的预测质量更高。在数据收集的早期到中期尤其如此,此时使用RSD的调查经理通常会考虑干预措施。
{"title":"Deriving Priors for Bayesian Prediction of Daily Response Propensity in Responsive Survey Design: Historical Data Analysis Versus Literature Review.","authors":"Brady T West,&nbsp;James Wagner,&nbsp;Stephanie Coffey,&nbsp;Michael R Elliott","doi":"10.1093/jssam/smab036","DOIUrl":"https://doi.org/10.1093/jssam/smab036","url":null,"abstract":"<p><p>Responsive survey design (RSD) aims to increase the efficiency of survey data collection via live monitoring of paradata and the introduction of protocol changes when survey errors and increased costs seem imminent. Daily predictions of response propensity for all active sampled cases are among the most important quantities for live monitoring of data collection outcomes, making sound predictions of these propensities essential for the success of RSD. Because it relies on real-time updates of prior beliefs about key design quantities, such as predicted response propensities, RSD stands to benefit from Bayesian approaches. However, empirical evidence of the merits of these approaches is lacking in the literature, and the derivation of informative prior distributions is required for these approaches to be effective. In this paper, we evaluate the ability of two approaches to deriving prior distributions for the coefficients defining daily response propensity models to improve predictions of daily response propensity in a real data collection employing RSD. The first approach involves analyses of historical data from the same survey, and the second approach involves literature review. We find that Bayesian methods based on these two approaches result in higher-quality predictions of response propensity than more standard approaches ignoring prior information. This is especially true during the early-to-middle periods of data collection, when survey managers using RSD often consider interventions.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10080219/pdf/smab036.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9652642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Reducing Burden in a Web Survey through Dependent Interviewing 通过依赖访谈减轻网络调查的负担
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-03-15 DOI: 10.1093/jssam/smad006
Curtiss Engstrom, J. Sinibaldi
Longitudinal surveys provide valuable data for tracking change in a cohort of individuals over time. Respondents are often asked to provide similar, if not the same, data at multiple time points. One could argue that this unnecessarily increases respondent burden, especially for information that does not change frequently. One way to reduce burden while still capturing up-to-date information may be to implement dependent interviewing (DI), where the respondent is provided information from the last data collection to aid in answering the current survey. If the information is still correct, then no change is needed, but if incorrect, the respondent has the option to change the response. To test this, we implemented two different versions of DI in a self-administered web survey and compared these against a traditional version of the web survey. We examined respondent burden by analyzing timing data and respondent enjoyment by analyzing debriefing questions. To assess the success of the implementation, we looked at timing data and undesirable behavior (missing data and backtracking). Finally, to evaluate measurement error, we looked at the number of meaningful changes. We found that DI is faster, more enjoyable, easily executed by the respondent (more so in one of our experimental formats), and significant measurement error was not introduced. In addition, DI provided consistency in the data, minimizing the noise introduced by nonmeaningful changes. The findings have significant implications for implementing DI in self-administered modes without an interviewer present.
纵向调查为跟踪一组个体随时间的变化提供了有价值的数据。受访者经常被要求在多个时间点提供相似(如果不相同的话)的数据。有人可能会说,这不必要地增加了受访者的负担,尤其是对于不经常变化的信息。在获取最新信息的同时减轻负担的一种方法可能是实施依赖性访谈(DI),向受访者提供上次数据收集的信息,以帮助回答当前调查。如果信息仍然正确,则无需更改,但如果信息不正确,受访者可以选择更改回复。为了测试这一点,我们在一个自我管理的网络调查中实现了两个不同版本的DI,并将其与传统版本的网络调查进行了比较。我们通过分析时间数据来考察受访者的负担,并通过分析汇报问题来考察受访者享受。为了评估实现的成功,我们查看了定时数据和不良行为(丢失数据和回溯)。最后,为了评估测量误差,我们观察了有意义的变化的数量。我们发现DI更快、更愉快、更容易被受访者执行(在我们的一种实验格式中更是如此),并且没有引入显著的测量误差。此外,DI提供了数据的一致性,最大限度地减少了非意义变化带来的噪声。研究结果对在没有面试官在场的情况下以自我管理模式实施DI具有重要意义。
{"title":"Reducing Burden in a Web Survey through Dependent Interviewing","authors":"Curtiss Engstrom, J. Sinibaldi","doi":"10.1093/jssam/smad006","DOIUrl":"https://doi.org/10.1093/jssam/smad006","url":null,"abstract":"\u0000 Longitudinal surveys provide valuable data for tracking change in a cohort of individuals over time. Respondents are often asked to provide similar, if not the same, data at multiple time points. One could argue that this unnecessarily increases respondent burden, especially for information that does not change frequently. One way to reduce burden while still capturing up-to-date information may be to implement dependent interviewing (DI), where the respondent is provided information from the last data collection to aid in answering the current survey. If the information is still correct, then no change is needed, but if incorrect, the respondent has the option to change the response. To test this, we implemented two different versions of DI in a self-administered web survey and compared these against a traditional version of the web survey. We examined respondent burden by analyzing timing data and respondent enjoyment by analyzing debriefing questions. To assess the success of the implementation, we looked at timing data and undesirable behavior (missing data and backtracking). Finally, to evaluate measurement error, we looked at the number of meaningful changes. We found that DI is faster, more enjoyable, easily executed by the respondent (more so in one of our experimental formats), and significant measurement error was not introduced. In addition, DI provided consistency in the data, minimizing the noise introduced by nonmeaningful changes. The findings have significant implications for implementing DI in self-administered modes without an interviewer present.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44304074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implicates as Instrumental Variables: An Approach for Estimation and Inference with Probabilistically Matched Data 隐含作为工具变量:一种利用概率匹配数据进行估计和推理的方法
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-03-10 DOI: 10.1093/jssam/smad005
Dhiren Patki, M. Shapiro
Linkage errors in probabilistically matched data sets can cause biases in the estimation of regression coefficients. This article proposes an approach to obtain consistent estimates and valid inference that relies on instrumental variables. The novelty of the method is to show that instrumental variables arise naturally in the course of probabilistic record linkage thereby allowing for off-the-shelf implementation. Relative to existing approaches, the instrumental variable approach does not require integration of the record linkage and regression analysis steps, the estimation of complex models of linkage error, or computationally expensive methods to estimate standard errors. The instrumental variables approach performs well in Monte Carlo simulations of an environment highlighting a many-to-one linkage problem.
概率匹配数据集中的连锁误差可能导致回归系数估计中的偏差。本文提出了一种获得一致估计和有效推理的方法,该方法依赖于工具变量。该方法的新颖性在于表明,工具变量在概率记录链接过程中自然产生,从而允许现成的实现。相对于现有方法,工具变量方法不需要整合记录链接和回归分析步骤,不需要估计链接误差的复杂模型,也不需要估计标准误差的计算成本高昂的方法。工具变量方法在突出多对一链接问题的环境的蒙特卡罗模拟中表现良好。
{"title":"Implicates as Instrumental Variables: An Approach for Estimation and Inference with Probabilistically Matched Data","authors":"Dhiren Patki, M. Shapiro","doi":"10.1093/jssam/smad005","DOIUrl":"https://doi.org/10.1093/jssam/smad005","url":null,"abstract":"\u0000 Linkage errors in probabilistically matched data sets can cause biases in the estimation of regression coefficients. This article proposes an approach to obtain consistent estimates and valid inference that relies on instrumental variables. The novelty of the method is to show that instrumental variables arise naturally in the course of probabilistic record linkage thereby allowing for off-the-shelf implementation. Relative to existing approaches, the instrumental variable approach does not require integration of the record linkage and regression analysis steps, the estimation of complex models of linkage error, or computationally expensive methods to estimate standard errors. The instrumental variables approach performs well in Monte Carlo simulations of an environment highlighting a many-to-one linkage problem.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46248838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Combining National Surveys with Composite Calibration to Improve the Precision of Estimates from the United Kingdom's Living Costs and Food Survey 将全国调查与综合校准相结合,以提高英国生活成本和食品调查的估计精度
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-03-08 DOI: 10.1093/jssam/smad001
T. Merkouris, Paul A. Smith, A. Fallows
The United Kingdom’s Living Costs and Food (LCF) Survey has a relatively small sample size but produces estimates which are widely used, notably as a key input to the calculation of weights for consumer price indices. There has been a recent call for the use of additional data sources to improve the estimates from the LCF. Since some LCF variables are shared with the much larger Labour Force Survey (LFS), we investigate combining data from these surveys using composite calibration to improve the precision of estimates from the LCF. We undertake model selection to choose a suitable set of common variables for the composite calibration using the effect on the estimated variances for national and regional totals of important LCF variables. The variances of estimates for common variables are reduced to around 5 percent of their original size. Variances of national estimates are reduced (across several quarters) by around 10 percent for expenditure and 25 percent for income; these are the variables of primary interest in the LCF. Reductions in the variances of regional estimates vary more but are mostly large when using common variables at the regional level in the composite calibration. The composite calibration also makes the LCF estimates for employment status almost consistent with the outputs of the LFS, which is an important property for users of the statistics. A novel alternative method for variance estimation, using stored information produced by the composite calibration, is also presented.
英国的生活成本和食品(LCF)调查的样本量相对较小,但得出的估计值被广泛使用,特别是作为计算消费者价格指数权重的关键输入。最近有人呼吁使用更多的数据来源来改进LCF的估计。由于一些LCF变量与更大的劳动力调查(LFS)共享,我们使用复合校准来研究这些调查的数据,以提高LCF估计的精度。我们进行模型选择,以选择一组合适的公共变量进行组合校准,使用对重要LCF变量的国家和地区总数的估计方差的影响。常见变量的估计方差减小到其原始大小的5%左右。国家估计数的差异(跨越几个季度)在支出方面减少了约10%,在收入方面减少了25%;这些是LCF中最重要的变量。区域估计值的方差减少幅度更大,但在复合校准中使用区域一级的共同变量时,减少幅度大多很大。复合校准还使就业状况的LCF估计与LFS的输出几乎一致,这对统计数据的用户来说是一个重要的属性。本文还提出了一种新的方差估计替代方法,即利用组合标定产生的存储信息进行方差估计。
{"title":"Combining National Surveys with Composite Calibration to Improve the Precision of Estimates from the United Kingdom's Living Costs and Food Survey","authors":"T. Merkouris, Paul A. Smith, A. Fallows","doi":"10.1093/jssam/smad001","DOIUrl":"https://doi.org/10.1093/jssam/smad001","url":null,"abstract":"\u0000 The United Kingdom’s Living Costs and Food (LCF) Survey has a relatively small sample size but produces estimates which are widely used, notably as a key input to the calculation of weights for consumer price indices. There has been a recent call for the use of additional data sources to improve the estimates from the LCF. Since some LCF variables are shared with the much larger Labour Force Survey (LFS), we investigate combining data from these surveys using composite calibration to improve the precision of estimates from the LCF. We undertake model selection to choose a suitable set of common variables for the composite calibration using the effect on the estimated variances for national and regional totals of important LCF variables. The variances of estimates for common variables are reduced to around 5 percent of their original size. Variances of national estimates are reduced (across several quarters) by around 10 percent for expenditure and 25 percent for income; these are the variables of primary interest in the LCF. Reductions in the variances of regional estimates vary more but are mostly large when using common variables at the regional level in the composite calibration. The composite calibration also makes the LCF estimates for employment status almost consistent with the outputs of the LFS, which is an important property for users of the statistics. A novel alternative method for variance estimation, using stored information produced by the composite calibration, is also presented.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42475076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Data Fusion Methods to Improve Income Modeling 评估数据融合方法以改进收入建模
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-03-02 DOI: 10.1093/jssam/smac033
Jana Emmenegger, R. Münnich, Jannik Schaller
Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.
收入是衡量生活水平和个人福祉的重要经济指标。在德国,不同的数据来源为分析收入分配提供了模棱两可的证据。税务统计(TS)是一份收入登记簿,记录了2014年德国超过4000万纳税人的总收入,其中包含了涵盖全部收入分布的最可靠的收入信息。然而,它只提供了收入分析所必需的有限范围的社会人口变量。为了应对这一挑战,我们利用来自德国人口的1%代表性样本——微观人口普查(Microcensus)的教育和工作时间信息来丰富税收数据。我们研究了两种类型的数据融合方法,非常适合于TS和微观人口普查的特定数据融合场景:缺失数据方法和性能预测模型。我们进行了模拟研究并提供了一个经验应用,比较了所提出的数据融合方法,结果表明多项式回归和随机森林是最适合我们的数据融合场景的方法。
{"title":"Evaluating Data Fusion Methods to Improve Income Modeling","authors":"Jana Emmenegger, R. Münnich, Jannik Schaller","doi":"10.1093/jssam/smac033","DOIUrl":"https://doi.org/10.1093/jssam/smac033","url":null,"abstract":"\u0000 Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44968251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure 异方差结构小面积估计的共轭建模方法
4区 数学 Q1 Social Sciences Pub Date : 2023-02-25 DOI: 10.1093/jssam/smad002
Paul A Parker, Scott H Holan, Ryan Janicki
Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.
摘要小面积估计(SAE)是官方统计中的一种重要工具,用于构建小样本域的人口数量估计。典型的区域级模型作为一种异方差回归,其中每个域的方差被假设为已知的,并在基于设计的估计之后插入。最近的工作考虑了方差的层次模型,其中基于设计的估计被用作附加的数据点来模拟每个领域的潜在真实方差。这些分层模型可能包含协变量信息,但很难从高维设置中进行采样。利用最新的分布理论,我们探索了一类SAE的贝叶斯分层模型,该模型平滑了基于设计的均值和方差估计。此外,我们还建立了一类异方差高斯响应数据的单位级模型。重要的是,我们结合了协变量信息和空间依赖性,同时保留了允许有效采样的共轭模型结构。我们通过实证模拟研究以及使用美国社区调查数据的应用程序来说明我们的方法。
{"title":"Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure","authors":"Paul A Parker, Scott H Holan, Ryan Janicki","doi":"10.1093/jssam/smad002","DOIUrl":"https://doi.org/10.1093/jssam/smad002","url":null,"abstract":"Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136081685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equipping the Offline Population with Internet Access in an Online Panel: Does It Make a Difference? 在一个在线小组中为离线人口提供互联网接入:它会产生影响吗?
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-02-24 DOI: 10.1093/jssam/smad003
Ruben L. Bach, Carina Cornesse, Jessica Daikeler
Online panel surveys are often criticized for their inability to cover the offline population, potentially resulting in coverage error. Previous research has demonstrated that non-internet users in fact differ from online individuals on several sociodemographic characteristics. In attempts to reduce coverage error due to missing the offline population, several probability-based online panels equip offline households with an internet connection and a simple computer or tablet. However, the question remains whether the recruitment of offline individuals for an online panel leads to substantial changes in survey estimates. That is, it is unclear whether estimates derived from the survey data are affected by the differences between the groups of online and offline individuals. Against this background, we investigate how the inclusion of the previously offline population into the German Internet Panel affects various survey estimates such as voting behavior and social engagement. Overall, we find little evidence for the claim that equipping otherwise offline individuals with online access affects the estimates derived from previously online individuals only.
在线小组调查经常因无法覆盖离线人群而受到批评,这可能导致报道错误。先前的研究已经表明,非互联网用户实际上在几个社会人口学特征上与在线个人不同。为了减少由于错过离线人口而导致的覆盖误差,几个基于概率的在线面板为离线家庭配备了互联网连接和一台简单的电脑或平板电脑。然而,问题仍然存在,在线小组招募线下个人是否会导致调查估计的实质性变化。也就是说,目前尚不清楚从调查数据中得出的估计是否受到在线和离线个人群体之间差异的影响。在此背景下,我们研究了将以前离线的人口纳入德国互联网小组如何影响各种调查估计,如投票行为和社会参与。总的来说,我们发现很少有证据表明,为离线个人提供在线访问只会影响先前在线个人的估计。
{"title":"Equipping the Offline Population with Internet Access in an Online Panel: Does It Make a Difference?","authors":"Ruben L. Bach, Carina Cornesse, Jessica Daikeler","doi":"10.1093/jssam/smad003","DOIUrl":"https://doi.org/10.1093/jssam/smad003","url":null,"abstract":"\u0000 Online panel surveys are often criticized for their inability to cover the offline population, potentially resulting in coverage error. Previous research has demonstrated that non-internet users in fact differ from online individuals on several sociodemographic characteristics. In attempts to reduce coverage error due to missing the offline population, several probability-based online panels equip offline households with an internet connection and a simple computer or tablet. However, the question remains whether the recruitment of offline individuals for an online panel leads to substantial changes in survey estimates. That is, it is unclear whether estimates derived from the survey data are affected by the differences between the groups of online and offline individuals. Against this background, we investigate how the inclusion of the previously offline population into the German Internet Panel affects various survey estimates such as voting behavior and social engagement. Overall, we find little evidence for the claim that equipping otherwise offline individuals with online access affects the estimates derived from previously online individuals only.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45916954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visible Cash, a Second Incentive, and Priority Mail? An Experimental Evaluation of Mailing Strategies for a Screening Questionnaire in a National Push-to-Web/Mail Survey. 可见现金、第二激励和优先邮件?全国推送网络/邮件调查中筛选问卷的邮寄策略的实验评估
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-02-22 eCollection Date: 2023-11-01 DOI: 10.1093/jssam/smac041
Shiyu Zhang, Brady T West, James Wagner, Mick P Couper, Rebecca Gatward, William G Axinn

In push-to-web surveys that use postal mail to contact sampled cases, participation is contingent on the mail being opened and the survey invitations being delivered. The design of the mailings is crucial to the success of the survey. We address the question of how to design invitation mailings that can grab potential respondents' attention and sway them to be interested in the survey in a short window of time. In the household screening stage of a national survey, the American Family Health Study, we experimentally tested three mailing design techniques for recruiting respondents: (1) a visible cash incentive in the initial mailing, (2) a second incentive for initial nonrespondents, and (3) use of Priority Mail in the nonresponse follow-up mailing. We evaluated the three techniques' overall effects on response rates as well as how they differentially attracted respondents with different characteristics. We found that all three techniques were useful in increasing the screening response rates, but there was little evidence that they had differential effects on sample subgroups that could help to reduce nonresponse biases.

在使用邮政邮件联系抽样案例的推送到网络调查中,参与取决于邮件是否被打开和调查邀请是否被发送。邮件的设计对调查的成功至关重要。我们解决的问题是如何设计邀请邮件,可以抓住潜在受访者的注意力,并在短时间内影响他们对调查的兴趣。在美国家庭健康研究(American Family Health Study)的一项全国性调查的家庭筛选阶段,我们实验测试了三种招募受访者的邮件设计技术:(1)在初始邮件中使用可见的现金激励,(2)对初始非受访者的第二次激励,以及(3)在无回应的后续邮件中使用优先邮件。我们评估了这三种技术对回复率的总体影响,以及它们如何以不同的方式吸引不同特征的受访者。我们发现这三种技术在提高筛查反应率方面都是有用的,但几乎没有证据表明它们对样本亚组有不同的影响,可以帮助减少无反应偏差。
{"title":"Visible Cash, a Second Incentive, and Priority Mail? An Experimental Evaluation of Mailing Strategies for a Screening Questionnaire in a National Push-to-Web/Mail Survey.","authors":"Shiyu Zhang, Brady T West, James Wagner, Mick P Couper, Rebecca Gatward, William G Axinn","doi":"10.1093/jssam/smac041","DOIUrl":"10.1093/jssam/smac041","url":null,"abstract":"<p><p>In push-to-web surveys that use postal mail to contact sampled cases, participation is contingent on the mail being opened and the survey invitations being delivered. The design of the mailings is crucial to the success of the survey. We address the question of how to design invitation mailings that can grab potential respondents' attention and sway them to be interested in the survey in a short window of time. In the household screening stage of a national survey, the American Family Health Study, we experimentally tested three mailing design techniques for recruiting respondents: (1) a visible cash incentive in the initial mailing, (2) a second incentive for initial nonrespondents, and (3) use of Priority Mail in the nonresponse follow-up mailing. We evaluated the three techniques' overall effects on response rates as well as how they differentially attracted respondents with different characteristics. We found that all three techniques were useful in increasing the screening response rates, but there was little evidence that they had differential effects on sample subgroups that could help to reduce nonresponse biases.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646700/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43534591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Handling Missing Values in Surveys With Complex Study Design: A Simulation Study 用复杂的研究设计处理调查中的缺失值:一项模拟研究
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-02-20 DOI: 10.1093/jssam/smac039
N. Kalpourtzi, James R. Carpenter, G. Touloumi
The inverse probability weighting (IPW) method is commonly used to deal with missing-at-random outcome (response) data collected by surveys with complex sampling designs. However, IPW methods generally assume that fully observed predictor variables are available for all sampled units, and it is unclear how to appropriately implement these methods when one or more independent variables are subject to missing values. Multiple imputation (MI) methods are well suited for a variety of missingness patterns but are not as easily adapted to complex sampling designs. In this case study, we consider the National Survey of Morbidity and Risk Factors (EMENO), a multistage probability sample survey. To understand the strengths and limitations of using either missing data treatment method for the EMENO, we present an extensive simulation study modeled on the EMENO health survey, with the target analysis being the estimation of population prevalence of hypertension as well as the association between hypertension and income. Both variables are subject to missingness. We test a variety of IPW and MI methods in simulation and on empirical data from the survey, assessing robustness by varying missingness mechanisms, proportions of missingness, and strengths of fitted response propensity models.
反概率加权(IPW)方法通常用于处理通过复杂抽样设计的调查收集的随机结果(响应)数据的缺失。然而,IPW方法通常假设所有采样单元都可以获得完全观察到的预测变量,并且当一个或多个自变量存在缺失值时,尚不清楚如何适当地实现这些方法。多重插补(MI)方法非常适合各种缺失模式,但不太容易适应复杂的抽样设计。在本案例研究中,我们考虑了全国发病率和风险因素调查(EMENO),这是一项多阶段概率抽样调查。为了了解EMENO使用缺失数据处理方法的优势和局限性,我们在EMENO健康调查的基础上进行了一项广泛的模拟研究,目标分析是估计高血压的人群患病率以及高血压与收入之间的关系。这两个变量都可能缺失。我们在模拟和调查的经验数据上测试了各种IPW和MI方法,通过不同的缺失机制、缺失比例和拟合的反应倾向模型的强度来评估稳健性。
{"title":"Handling Missing Values in Surveys With Complex Study Design: A Simulation Study","authors":"N. Kalpourtzi, James R. Carpenter, G. Touloumi","doi":"10.1093/jssam/smac039","DOIUrl":"https://doi.org/10.1093/jssam/smac039","url":null,"abstract":"\u0000 The inverse probability weighting (IPW) method is commonly used to deal with missing-at-random outcome (response) data collected by surveys with complex sampling designs. However, IPW methods generally assume that fully observed predictor variables are available for all sampled units, and it is unclear how to appropriately implement these methods when one or more independent variables are subject to missing values. Multiple imputation (MI) methods are well suited for a variety of missingness patterns but are not as easily adapted to complex sampling designs. In this case study, we consider the National Survey of Morbidity and Risk Factors (EMENO), a multistage probability sample survey. To understand the strengths and limitations of using either missing data treatment method for the EMENO, we present an extensive simulation study modeled on the EMENO health survey, with the target analysis being the estimation of population prevalence of hypertension as well as the association between hypertension and income. Both variables are subject to missingness. We test a variety of IPW and MI methods in simulation and on empirical data from the survey, assessing robustness by varying missingness mechanisms, proportions of missingness, and strengths of fitted response propensity models.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48401788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1