首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
Reducing Burden in a Web Survey through Dependent Interviewing 通过依赖访谈减轻网络调查的负担
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-03-15 DOI: 10.1093/jssam/smad006
Curtiss Engstrom, J. Sinibaldi
Longitudinal surveys provide valuable data for tracking change in a cohort of individuals over time. Respondents are often asked to provide similar, if not the same, data at multiple time points. One could argue that this unnecessarily increases respondent burden, especially for information that does not change frequently. One way to reduce burden while still capturing up-to-date information may be to implement dependent interviewing (DI), where the respondent is provided information from the last data collection to aid in answering the current survey. If the information is still correct, then no change is needed, but if incorrect, the respondent has the option to change the response. To test this, we implemented two different versions of DI in a self-administered web survey and compared these against a traditional version of the web survey. We examined respondent burden by analyzing timing data and respondent enjoyment by analyzing debriefing questions. To assess the success of the implementation, we looked at timing data and undesirable behavior (missing data and backtracking). Finally, to evaluate measurement error, we looked at the number of meaningful changes. We found that DI is faster, more enjoyable, easily executed by the respondent (more so in one of our experimental formats), and significant measurement error was not introduced. In addition, DI provided consistency in the data, minimizing the noise introduced by nonmeaningful changes. The findings have significant implications for implementing DI in self-administered modes without an interviewer present.
纵向调查为跟踪一组个体随时间的变化提供了有价值的数据。受访者经常被要求在多个时间点提供相似(如果不相同的话)的数据。有人可能会说,这不必要地增加了受访者的负担,尤其是对于不经常变化的信息。在获取最新信息的同时减轻负担的一种方法可能是实施依赖性访谈(DI),向受访者提供上次数据收集的信息,以帮助回答当前调查。如果信息仍然正确,则无需更改,但如果信息不正确,受访者可以选择更改回复。为了测试这一点,我们在一个自我管理的网络调查中实现了两个不同版本的DI,并将其与传统版本的网络调查进行了比较。我们通过分析时间数据来考察受访者的负担,并通过分析汇报问题来考察受访者享受。为了评估实现的成功,我们查看了定时数据和不良行为(丢失数据和回溯)。最后,为了评估测量误差,我们观察了有意义的变化的数量。我们发现DI更快、更愉快、更容易被受访者执行(在我们的一种实验格式中更是如此),并且没有引入显著的测量误差。此外,DI提供了数据的一致性,最大限度地减少了非意义变化带来的噪声。研究结果对在没有面试官在场的情况下以自我管理模式实施DI具有重要意义。
{"title":"Reducing Burden in a Web Survey through Dependent Interviewing","authors":"Curtiss Engstrom, J. Sinibaldi","doi":"10.1093/jssam/smad006","DOIUrl":"https://doi.org/10.1093/jssam/smad006","url":null,"abstract":"\u0000 Longitudinal surveys provide valuable data for tracking change in a cohort of individuals over time. Respondents are often asked to provide similar, if not the same, data at multiple time points. One could argue that this unnecessarily increases respondent burden, especially for information that does not change frequently. One way to reduce burden while still capturing up-to-date information may be to implement dependent interviewing (DI), where the respondent is provided information from the last data collection to aid in answering the current survey. If the information is still correct, then no change is needed, but if incorrect, the respondent has the option to change the response. To test this, we implemented two different versions of DI in a self-administered web survey and compared these against a traditional version of the web survey. We examined respondent burden by analyzing timing data and respondent enjoyment by analyzing debriefing questions. To assess the success of the implementation, we looked at timing data and undesirable behavior (missing data and backtracking). Finally, to evaluate measurement error, we looked at the number of meaningful changes. We found that DI is faster, more enjoyable, easily executed by the respondent (more so in one of our experimental formats), and significant measurement error was not introduced. In addition, DI provided consistency in the data, minimizing the noise introduced by nonmeaningful changes. The findings have significant implications for implementing DI in self-administered modes without an interviewer present.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44304074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implicates as Instrumental Variables: An Approach for Estimation and Inference with Probabilistically Matched Data 隐含作为工具变量:一种利用概率匹配数据进行估计和推理的方法
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-03-10 DOI: 10.1093/jssam/smad005
Dhiren Patki, M. Shapiro
Linkage errors in probabilistically matched data sets can cause biases in the estimation of regression coefficients. This article proposes an approach to obtain consistent estimates and valid inference that relies on instrumental variables. The novelty of the method is to show that instrumental variables arise naturally in the course of probabilistic record linkage thereby allowing for off-the-shelf implementation. Relative to existing approaches, the instrumental variable approach does not require integration of the record linkage and regression analysis steps, the estimation of complex models of linkage error, or computationally expensive methods to estimate standard errors. The instrumental variables approach performs well in Monte Carlo simulations of an environment highlighting a many-to-one linkage problem.
概率匹配数据集中的连锁误差可能导致回归系数估计中的偏差。本文提出了一种获得一致估计和有效推理的方法,该方法依赖于工具变量。该方法的新颖性在于表明,工具变量在概率记录链接过程中自然产生,从而允许现成的实现。相对于现有方法,工具变量方法不需要整合记录链接和回归分析步骤,不需要估计链接误差的复杂模型,也不需要估计标准误差的计算成本高昂的方法。工具变量方法在突出多对一链接问题的环境的蒙特卡罗模拟中表现良好。
{"title":"Implicates as Instrumental Variables: An Approach for Estimation and Inference with Probabilistically Matched Data","authors":"Dhiren Patki, M. Shapiro","doi":"10.1093/jssam/smad005","DOIUrl":"https://doi.org/10.1093/jssam/smad005","url":null,"abstract":"\u0000 Linkage errors in probabilistically matched data sets can cause biases in the estimation of regression coefficients. This article proposes an approach to obtain consistent estimates and valid inference that relies on instrumental variables. The novelty of the method is to show that instrumental variables arise naturally in the course of probabilistic record linkage thereby allowing for off-the-shelf implementation. Relative to existing approaches, the instrumental variable approach does not require integration of the record linkage and regression analysis steps, the estimation of complex models of linkage error, or computationally expensive methods to estimate standard errors. The instrumental variables approach performs well in Monte Carlo simulations of an environment highlighting a many-to-one linkage problem.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46248838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Combining National Surveys with Composite Calibration to Improve the Precision of Estimates from the United Kingdom's Living Costs and Food Survey 将全国调查与综合校准相结合,以提高英国生活成本和食品调查的估计精度
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-03-08 DOI: 10.1093/jssam/smad001
T. Merkouris, Paul A. Smith, A. Fallows
The United Kingdom’s Living Costs and Food (LCF) Survey has a relatively small sample size but produces estimates which are widely used, notably as a key input to the calculation of weights for consumer price indices. There has been a recent call for the use of additional data sources to improve the estimates from the LCF. Since some LCF variables are shared with the much larger Labour Force Survey (LFS), we investigate combining data from these surveys using composite calibration to improve the precision of estimates from the LCF. We undertake model selection to choose a suitable set of common variables for the composite calibration using the effect on the estimated variances for national and regional totals of important LCF variables. The variances of estimates for common variables are reduced to around 5 percent of their original size. Variances of national estimates are reduced (across several quarters) by around 10 percent for expenditure and 25 percent for income; these are the variables of primary interest in the LCF. Reductions in the variances of regional estimates vary more but are mostly large when using common variables at the regional level in the composite calibration. The composite calibration also makes the LCF estimates for employment status almost consistent with the outputs of the LFS, which is an important property for users of the statistics. A novel alternative method for variance estimation, using stored information produced by the composite calibration, is also presented.
英国的生活成本和食品(LCF)调查的样本量相对较小,但得出的估计值被广泛使用,特别是作为计算消费者价格指数权重的关键输入。最近有人呼吁使用更多的数据来源来改进LCF的估计。由于一些LCF变量与更大的劳动力调查(LFS)共享,我们使用复合校准来研究这些调查的数据,以提高LCF估计的精度。我们进行模型选择,以选择一组合适的公共变量进行组合校准,使用对重要LCF变量的国家和地区总数的估计方差的影响。常见变量的估计方差减小到其原始大小的5%左右。国家估计数的差异(跨越几个季度)在支出方面减少了约10%,在收入方面减少了25%;这些是LCF中最重要的变量。区域估计值的方差减少幅度更大,但在复合校准中使用区域一级的共同变量时,减少幅度大多很大。复合校准还使就业状况的LCF估计与LFS的输出几乎一致,这对统计数据的用户来说是一个重要的属性。本文还提出了一种新的方差估计替代方法,即利用组合标定产生的存储信息进行方差估计。
{"title":"Combining National Surveys with Composite Calibration to Improve the Precision of Estimates from the United Kingdom's Living Costs and Food Survey","authors":"T. Merkouris, Paul A. Smith, A. Fallows","doi":"10.1093/jssam/smad001","DOIUrl":"https://doi.org/10.1093/jssam/smad001","url":null,"abstract":"\u0000 The United Kingdom’s Living Costs and Food (LCF) Survey has a relatively small sample size but produces estimates which are widely used, notably as a key input to the calculation of weights for consumer price indices. There has been a recent call for the use of additional data sources to improve the estimates from the LCF. Since some LCF variables are shared with the much larger Labour Force Survey (LFS), we investigate combining data from these surveys using composite calibration to improve the precision of estimates from the LCF. We undertake model selection to choose a suitable set of common variables for the composite calibration using the effect on the estimated variances for national and regional totals of important LCF variables. The variances of estimates for common variables are reduced to around 5 percent of their original size. Variances of national estimates are reduced (across several quarters) by around 10 percent for expenditure and 25 percent for income; these are the variables of primary interest in the LCF. Reductions in the variances of regional estimates vary more but are mostly large when using common variables at the regional level in the composite calibration. The composite calibration also makes the LCF estimates for employment status almost consistent with the outputs of the LFS, which is an important property for users of the statistics. A novel alternative method for variance estimation, using stored information produced by the composite calibration, is also presented.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42475076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Data Fusion Methods to Improve Income Modeling 评估数据融合方法以改进收入建模
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-03-02 DOI: 10.1093/jssam/smac033
Jana Emmenegger, R. Münnich, Jannik Schaller
Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.
收入是衡量生活水平和个人福祉的重要经济指标。在德国,不同的数据来源为分析收入分配提供了模棱两可的证据。税务统计(TS)是一份收入登记簿,记录了2014年德国超过4000万纳税人的总收入,其中包含了涵盖全部收入分布的最可靠的收入信息。然而,它只提供了收入分析所必需的有限范围的社会人口变量。为了应对这一挑战,我们利用来自德国人口的1%代表性样本——微观人口普查(Microcensus)的教育和工作时间信息来丰富税收数据。我们研究了两种类型的数据融合方法,非常适合于TS和微观人口普查的特定数据融合场景:缺失数据方法和性能预测模型。我们进行了模拟研究并提供了一个经验应用,比较了所提出的数据融合方法,结果表明多项式回归和随机森林是最适合我们的数据融合场景的方法。
{"title":"Evaluating Data Fusion Methods to Improve Income Modeling","authors":"Jana Emmenegger, R. Münnich, Jannik Schaller","doi":"10.1093/jssam/smac033","DOIUrl":"https://doi.org/10.1093/jssam/smac033","url":null,"abstract":"\u0000 Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44968251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure 异方差结构小面积估计的共轭建模方法
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-25 DOI: 10.1093/jssam/smad002
Paul A Parker, Scott H Holan, Ryan Janicki
Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.
摘要小面积估计(SAE)是官方统计中的一种重要工具,用于构建小样本域的人口数量估计。典型的区域级模型作为一种异方差回归,其中每个域的方差被假设为已知的,并在基于设计的估计之后插入。最近的工作考虑了方差的层次模型,其中基于设计的估计被用作附加的数据点来模拟每个领域的潜在真实方差。这些分层模型可能包含协变量信息,但很难从高维设置中进行采样。利用最新的分布理论,我们探索了一类SAE的贝叶斯分层模型,该模型平滑了基于设计的均值和方差估计。此外,我们还建立了一类异方差高斯响应数据的单位级模型。重要的是,我们结合了协变量信息和空间依赖性,同时保留了允许有效采样的共轭模型结构。我们通过实证模拟研究以及使用美国社区调查数据的应用程序来说明我们的方法。
{"title":"Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure","authors":"Paul A Parker, Scott H Holan, Ryan Janicki","doi":"10.1093/jssam/smad002","DOIUrl":"https://doi.org/10.1093/jssam/smad002","url":null,"abstract":"Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136081685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equipping the Offline Population with Internet Access in an Online Panel: Does It Make a Difference? 在一个在线小组中为离线人口提供互联网接入:它会产生影响吗?
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-24 DOI: 10.1093/jssam/smad003
Ruben L. Bach, Carina Cornesse, Jessica Daikeler
Online panel surveys are often criticized for their inability to cover the offline population, potentially resulting in coverage error. Previous research has demonstrated that non-internet users in fact differ from online individuals on several sociodemographic characteristics. In attempts to reduce coverage error due to missing the offline population, several probability-based online panels equip offline households with an internet connection and a simple computer or tablet. However, the question remains whether the recruitment of offline individuals for an online panel leads to substantial changes in survey estimates. That is, it is unclear whether estimates derived from the survey data are affected by the differences between the groups of online and offline individuals. Against this background, we investigate how the inclusion of the previously offline population into the German Internet Panel affects various survey estimates such as voting behavior and social engagement. Overall, we find little evidence for the claim that equipping otherwise offline individuals with online access affects the estimates derived from previously online individuals only.
在线小组调查经常因无法覆盖离线人群而受到批评,这可能导致报道错误。先前的研究已经表明,非互联网用户实际上在几个社会人口学特征上与在线个人不同。为了减少由于错过离线人口而导致的覆盖误差,几个基于概率的在线面板为离线家庭配备了互联网连接和一台简单的电脑或平板电脑。然而,问题仍然存在,在线小组招募线下个人是否会导致调查估计的实质性变化。也就是说,目前尚不清楚从调查数据中得出的估计是否受到在线和离线个人群体之间差异的影响。在此背景下,我们研究了将以前离线的人口纳入德国互联网小组如何影响各种调查估计,如投票行为和社会参与。总的来说,我们发现很少有证据表明,为离线个人提供在线访问只会影响先前在线个人的估计。
{"title":"Equipping the Offline Population with Internet Access in an Online Panel: Does It Make a Difference?","authors":"Ruben L. Bach, Carina Cornesse, Jessica Daikeler","doi":"10.1093/jssam/smad003","DOIUrl":"https://doi.org/10.1093/jssam/smad003","url":null,"abstract":"\u0000 Online panel surveys are often criticized for their inability to cover the offline population, potentially resulting in coverage error. Previous research has demonstrated that non-internet users in fact differ from online individuals on several sociodemographic characteristics. In attempts to reduce coverage error due to missing the offline population, several probability-based online panels equip offline households with an internet connection and a simple computer or tablet. However, the question remains whether the recruitment of offline individuals for an online panel leads to substantial changes in survey estimates. That is, it is unclear whether estimates derived from the survey data are affected by the differences between the groups of online and offline individuals. Against this background, we investigate how the inclusion of the previously offline population into the German Internet Panel affects various survey estimates such as voting behavior and social engagement. Overall, we find little evidence for the claim that equipping otherwise offline individuals with online access affects the estimates derived from previously online individuals only.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45916954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Handling Missing Values in Surveys With Complex Study Design: A Simulation Study 用复杂的研究设计处理调查中的缺失值:一项模拟研究
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-20 DOI: 10.1093/jssam/smac039
N. Kalpourtzi, James R. Carpenter, G. Touloumi
The inverse probability weighting (IPW) method is commonly used to deal with missing-at-random outcome (response) data collected by surveys with complex sampling designs. However, IPW methods generally assume that fully observed predictor variables are available for all sampled units, and it is unclear how to appropriately implement these methods when one or more independent variables are subject to missing values. Multiple imputation (MI) methods are well suited for a variety of missingness patterns but are not as easily adapted to complex sampling designs. In this case study, we consider the National Survey of Morbidity and Risk Factors (EMENO), a multistage probability sample survey. To understand the strengths and limitations of using either missing data treatment method for the EMENO, we present an extensive simulation study modeled on the EMENO health survey, with the target analysis being the estimation of population prevalence of hypertension as well as the association between hypertension and income. Both variables are subject to missingness. We test a variety of IPW and MI methods in simulation and on empirical data from the survey, assessing robustness by varying missingness mechanisms, proportions of missingness, and strengths of fitted response propensity models.
反概率加权(IPW)方法通常用于处理通过复杂抽样设计的调查收集的随机结果(响应)数据的缺失。然而,IPW方法通常假设所有采样单元都可以获得完全观察到的预测变量,并且当一个或多个自变量存在缺失值时,尚不清楚如何适当地实现这些方法。多重插补(MI)方法非常适合各种缺失模式,但不太容易适应复杂的抽样设计。在本案例研究中,我们考虑了全国发病率和风险因素调查(EMENO),这是一项多阶段概率抽样调查。为了了解EMENO使用缺失数据处理方法的优势和局限性,我们在EMENO健康调查的基础上进行了一项广泛的模拟研究,目标分析是估计高血压的人群患病率以及高血压与收入之间的关系。这两个变量都可能缺失。我们在模拟和调查的经验数据上测试了各种IPW和MI方法,通过不同的缺失机制、缺失比例和拟合的反应倾向模型的强度来评估稳健性。
{"title":"Handling Missing Values in Surveys With Complex Study Design: A Simulation Study","authors":"N. Kalpourtzi, James R. Carpenter, G. Touloumi","doi":"10.1093/jssam/smac039","DOIUrl":"https://doi.org/10.1093/jssam/smac039","url":null,"abstract":"\u0000 The inverse probability weighting (IPW) method is commonly used to deal with missing-at-random outcome (response) data collected by surveys with complex sampling designs. However, IPW methods generally assume that fully observed predictor variables are available for all sampled units, and it is unclear how to appropriately implement these methods when one or more independent variables are subject to missing values. Multiple imputation (MI) methods are well suited for a variety of missingness patterns but are not as easily adapted to complex sampling designs. In this case study, we consider the National Survey of Morbidity and Risk Factors (EMENO), a multistage probability sample survey. To understand the strengths and limitations of using either missing data treatment method for the EMENO, we present an extensive simulation study modeled on the EMENO health survey, with the target analysis being the estimation of population prevalence of hypertension as well as the association between hypertension and income. Both variables are subject to missingness. We test a variety of IPW and MI methods in simulation and on empirical data from the survey, assessing robustness by varying missingness mechanisms, proportions of missingness, and strengths of fitted response propensity models.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48401788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Statistical Matching when Auxiliary Information is Available 当辅助信息可用时,改进统计匹配
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-13 DOI: 10.1093/jssam/smac038
Angelo Moretti, N. Shlomo
There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.
在国家统计研究所内部,人们越来越有兴趣将包含各种社会领域信息的现有数据集结合起来。统计匹配方法可用于通过一组公共变量集成数据源,其中每个数据集包含属于相同目标人群的不同单元。然而,一个常见的问题与在不同数据源中观察到的变量之间的条件独立性假设有关。在这种情况下,可以使用一个包含所有变量的辅助数据集,通过提供在不同数据集上观察到的变量的相关结构信息来改进统计匹配。我们提出通过校准步骤修改辅助数据集的预测模型,并表明我们可以改善各种设置下的统计匹配结果。我们通过模拟和基于欧盟收入和生活条件统计以及英国生活成本和食品调查的应用程序来评估拟议的方法。
{"title":"Improving Statistical Matching when Auxiliary Information is Available","authors":"Angelo Moretti, N. Shlomo","doi":"10.1093/jssam/smac038","DOIUrl":"https://doi.org/10.1093/jssam/smac038","url":null,"abstract":"\u0000 There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48388115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Constructing State and National Estimates of Vaccination Rates from Immunization Information Systems 从免疫信息系统构建州和国家疫苗接种率估计
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-07 DOI: 10.1093/jssam/smac042
T. Raghunathan, K. Kirtland, Ji Li, K. White, B. Murthy, Xia Lin, Latreace Harris, L. Gibbs-Scharf, E. Zell
Immunization Information Systems are confidential computerized population-based systems that collect data from vaccination providers on individual vaccinations administered along with limited patient-level characteristics. Through a data use agreement, Centers for Disease Control and Prevention obtains the individual-level data and aggregates the number of vaccinations for geographical statistical areas defined by the US Census Bureau (counties or equivalent statistical entities) for each vaccine included in system. Currently, 599 counties, covering 11 states, collect and report data using a uniform protocol. We combine these data with inter-decennial population counts from the Population Estimates Program in the US Census Bureau and several covariates from a variety of sources to develop model-based estimates for each of the 3,142 counties in 50 states and the District of Columbia and then aggregate to the state and national levels. We use a hierarchical Bayesian model and Markov Chain Monte Carlo methods to obtain draws from the posterior predictive distribution of the vaccination rates. We use posterior predictive checks and cross-validation to assess the goodness of fit and to validate the models. We also compare the model-based estimates to direct estimates from the National Immunization Surveys.
免疫信息系统是一种保密的基于人群的计算机化系统,从疫苗接种提供者那里收集个人疫苗接种的数据,以及有限的患者水平特征。通过数据使用协议,疾病控制和预防中心获得个人层面的数据,并汇总美国人口普查局(县或同等统计实体)为系统中包含的每种疫苗定义的地理统计区域的疫苗接种数量。目前,覆盖11个州的599个县使用统一协议收集和报告数据。我们将这些数据与美国人口普查局人口估计项目的十年一次的人口统计以及各种来源的几个协变量相结合,为50个州和哥伦比亚特区的3142个县中的每个县制定基于模型的估计,然后汇总到州和国家层面。我们使用分层贝叶斯模型和马尔可夫链蒙特卡罗方法从疫苗接种率的后验预测分布中获得结果。我们使用后验预测检验和交叉验证来评估拟合优度并验证模型。我们还将基于模型的估计与国家免疫调查的直接估计进行了比较。
{"title":"Constructing State and National Estimates of Vaccination Rates from Immunization Information Systems","authors":"T. Raghunathan, K. Kirtland, Ji Li, K. White, B. Murthy, Xia Lin, Latreace Harris, L. Gibbs-Scharf, E. Zell","doi":"10.1093/jssam/smac042","DOIUrl":"https://doi.org/10.1093/jssam/smac042","url":null,"abstract":"\u0000 Immunization Information Systems are confidential computerized population-based systems that collect data from vaccination providers on individual vaccinations administered along with limited patient-level characteristics. Through a data use agreement, Centers for Disease Control and Prevention obtains the individual-level data and aggregates the number of vaccinations for geographical statistical areas defined by the US Census Bureau (counties or equivalent statistical entities) for each vaccine included in system. Currently, 599 counties, covering 11 states, collect and report data using a uniform protocol. We combine these data with inter-decennial population counts from the Population Estimates Program in the US Census Bureau and several covariates from a variety of sources to develop model-based estimates for each of the 3,142 counties in 50 states and the District of Columbia and then aggregate to the state and national levels. We use a hierarchical Bayesian model and Markov Chain Monte Carlo methods to obtain draws from the posterior predictive distribution of the vaccination rates. We use posterior predictive checks and cross-validation to assess the goodness of fit and to validate the models. We also compare the model-based estimates to direct estimates from the National Immunization Surveys.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"1 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41952610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Application of Adaptive Cluster Sampling to Surveying Informal Businesses 自适应聚类抽样在非正式企业调查中的应用
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-01-27 DOI: 10.1093/jssam/smac037
Gemechu Aga, David C Francis, Filip Jolevski, Jorge Rodriguez Meza, Joshua Seth Wimpey
Abstract Informal business activity is ubiquitous around the world, but it is nearly always uncaptured by administrative data, registries, or commercial sources. For this reason, there are rarely adequate sampling frames available for survey implementers wishing to measure the activity and characteristics of the sector. This article applies a well-established sampling method for rare and/or clustered populations, Adaptive Cluster Sampling (ACS), to a novel population of informal businesses. Generally, it shows that efficiency gains through the application of ACS, when compared to Simple Random Sampling (SRS), are large, particularly at higher levels of fieldwork effort. In particular, ACS efficiency gains over SRS remain sizable at higher values of initial starting samples, but with comparatively high expansion thresholds, which can reduce the fieldwork effort.
非正式的商业活动在世界各地无处不在,但它几乎总是不被管理数据、注册表或商业来源所捕获。因此,很少有足够的抽样框架可供希望衡量该部门的活动和特征的调查执行者使用。本文将一种成熟的针对罕见和/或群集人口的抽样方法——自适应群集抽样(ACS)——应用于一种新的非正式企业群体。总的来说,与简单随机抽样(SRS)相比,应用ACS的效率提高很大,特别是在较高水平的现场工作中。特别是,在较高的初始起始样本值下,ACS效率比SRS的收益仍然相当可观,但膨胀阈值相对较高,这可能会减少现场工作的工作量。
{"title":"An Application of Adaptive Cluster Sampling to Surveying Informal Businesses","authors":"Gemechu Aga, David C Francis, Filip Jolevski, Jorge Rodriguez Meza, Joshua Seth Wimpey","doi":"10.1093/jssam/smac037","DOIUrl":"https://doi.org/10.1093/jssam/smac037","url":null,"abstract":"Abstract Informal business activity is ubiquitous around the world, but it is nearly always uncaptured by administrative data, registries, or commercial sources. For this reason, there are rarely adequate sampling frames available for survey implementers wishing to measure the activity and characteristics of the sector. This article applies a well-established sampling method for rare and/or clustered populations, Adaptive Cluster Sampling (ACS), to a novel population of informal businesses. Generally, it shows that efficiency gains through the application of ACS, when compared to Simple Random Sampling (SRS), are large, particularly at higher levels of fieldwork effort. In particular, ACS efficiency gains over SRS remain sizable at higher values of initial starting samples, but with comparatively high expansion thresholds, which can reduce the fieldwork effort.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135794712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1