首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
Combining National Surveys with Composite Calibration to Improve the Precision of Estimates from the United Kingdom's Living Costs and Food Survey 将全国调查与综合校准相结合,以提高英国生活成本和食品调查的估计精度
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-03-08 DOI: 10.1093/jssam/smad001
T. Merkouris, Paul A. Smith, A. Fallows
The United Kingdom’s Living Costs and Food (LCF) Survey has a relatively small sample size but produces estimates which are widely used, notably as a key input to the calculation of weights for consumer price indices. There has been a recent call for the use of additional data sources to improve the estimates from the LCF. Since some LCF variables are shared with the much larger Labour Force Survey (LFS), we investigate combining data from these surveys using composite calibration to improve the precision of estimates from the LCF. We undertake model selection to choose a suitable set of common variables for the composite calibration using the effect on the estimated variances for national and regional totals of important LCF variables. The variances of estimates for common variables are reduced to around 5 percent of their original size. Variances of national estimates are reduced (across several quarters) by around 10 percent for expenditure and 25 percent for income; these are the variables of primary interest in the LCF. Reductions in the variances of regional estimates vary more but are mostly large when using common variables at the regional level in the composite calibration. The composite calibration also makes the LCF estimates for employment status almost consistent with the outputs of the LFS, which is an important property for users of the statistics. A novel alternative method for variance estimation, using stored information produced by the composite calibration, is also presented.
英国的生活成本和食品(LCF)调查的样本量相对较小,但得出的估计值被广泛使用,特别是作为计算消费者价格指数权重的关键输入。最近有人呼吁使用更多的数据来源来改进LCF的估计。由于一些LCF变量与更大的劳动力调查(LFS)共享,我们使用复合校准来研究这些调查的数据,以提高LCF估计的精度。我们进行模型选择,以选择一组合适的公共变量进行组合校准,使用对重要LCF变量的国家和地区总数的估计方差的影响。常见变量的估计方差减小到其原始大小的5%左右。国家估计数的差异(跨越几个季度)在支出方面减少了约10%,在收入方面减少了25%;这些是LCF中最重要的变量。区域估计值的方差减少幅度更大,但在复合校准中使用区域一级的共同变量时,减少幅度大多很大。复合校准还使就业状况的LCF估计与LFS的输出几乎一致,这对统计数据的用户来说是一个重要的属性。本文还提出了一种新的方差估计替代方法,即利用组合标定产生的存储信息进行方差估计。
{"title":"Combining National Surveys with Composite Calibration to Improve the Precision of Estimates from the United Kingdom's Living Costs and Food Survey","authors":"T. Merkouris, Paul A. Smith, A. Fallows","doi":"10.1093/jssam/smad001","DOIUrl":"https://doi.org/10.1093/jssam/smad001","url":null,"abstract":"\u0000 The United Kingdom’s Living Costs and Food (LCF) Survey has a relatively small sample size but produces estimates which are widely used, notably as a key input to the calculation of weights for consumer price indices. There has been a recent call for the use of additional data sources to improve the estimates from the LCF. Since some LCF variables are shared with the much larger Labour Force Survey (LFS), we investigate combining data from these surveys using composite calibration to improve the precision of estimates from the LCF. We undertake model selection to choose a suitable set of common variables for the composite calibration using the effect on the estimated variances for national and regional totals of important LCF variables. The variances of estimates for common variables are reduced to around 5 percent of their original size. Variances of national estimates are reduced (across several quarters) by around 10 percent for expenditure and 25 percent for income; these are the variables of primary interest in the LCF. Reductions in the variances of regional estimates vary more but are mostly large when using common variables at the regional level in the composite calibration. The composite calibration also makes the LCF estimates for employment status almost consistent with the outputs of the LFS, which is an important property for users of the statistics. A novel alternative method for variance estimation, using stored information produced by the composite calibration, is also presented.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42475076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Data Fusion Methods to Improve Income Modeling 评估数据融合方法以改进收入建模
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-03-02 DOI: 10.1093/jssam/smac033
Jana Emmenegger, R. Münnich, Jannik Schaller
Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.
收入是衡量生活水平和个人福祉的重要经济指标。在德国,不同的数据来源为分析收入分配提供了模棱两可的证据。税务统计(TS)是一份收入登记簿,记录了2014年德国超过4000万纳税人的总收入,其中包含了涵盖全部收入分布的最可靠的收入信息。然而,它只提供了收入分析所必需的有限范围的社会人口变量。为了应对这一挑战,我们利用来自德国人口的1%代表性样本——微观人口普查(Microcensus)的教育和工作时间信息来丰富税收数据。我们研究了两种类型的数据融合方法,非常适合于TS和微观人口普查的特定数据融合场景:缺失数据方法和性能预测模型。我们进行了模拟研究并提供了一个经验应用,比较了所提出的数据融合方法,结果表明多项式回归和随机森林是最适合我们的数据融合场景的方法。
{"title":"Evaluating Data Fusion Methods to Improve Income Modeling","authors":"Jana Emmenegger, R. Münnich, Jannik Schaller","doi":"10.1093/jssam/smac033","DOIUrl":"https://doi.org/10.1093/jssam/smac033","url":null,"abstract":"\u0000 Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44968251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure 异方差结构小面积估计的共轭建模方法
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-25 DOI: 10.1093/jssam/smad002
Paul A Parker, Scott H Holan, Ryan Janicki
Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.
摘要小面积估计(SAE)是官方统计中的一种重要工具,用于构建小样本域的人口数量估计。典型的区域级模型作为一种异方差回归,其中每个域的方差被假设为已知的,并在基于设计的估计之后插入。最近的工作考虑了方差的层次模型,其中基于设计的估计被用作附加的数据点来模拟每个领域的潜在真实方差。这些分层模型可能包含协变量信息,但很难从高维设置中进行采样。利用最新的分布理论,我们探索了一类SAE的贝叶斯分层模型,该模型平滑了基于设计的均值和方差估计。此外,我们还建立了一类异方差高斯响应数据的单位级模型。重要的是,我们结合了协变量信息和空间依赖性,同时保留了允许有效采样的共轭模型结构。我们通过实证模拟研究以及使用美国社区调查数据的应用程序来说明我们的方法。
{"title":"Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure","authors":"Paul A Parker, Scott H Holan, Ryan Janicki","doi":"10.1093/jssam/smad002","DOIUrl":"https://doi.org/10.1093/jssam/smad002","url":null,"abstract":"Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136081685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equipping the Offline Population with Internet Access in an Online Panel: Does It Make a Difference? 在一个在线小组中为离线人口提供互联网接入:它会产生影响吗?
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-24 DOI: 10.1093/jssam/smad003
Ruben L. Bach, Carina Cornesse, Jessica Daikeler
Online panel surveys are often criticized for their inability to cover the offline population, potentially resulting in coverage error. Previous research has demonstrated that non-internet users in fact differ from online individuals on several sociodemographic characteristics. In attempts to reduce coverage error due to missing the offline population, several probability-based online panels equip offline households with an internet connection and a simple computer or tablet. However, the question remains whether the recruitment of offline individuals for an online panel leads to substantial changes in survey estimates. That is, it is unclear whether estimates derived from the survey data are affected by the differences between the groups of online and offline individuals. Against this background, we investigate how the inclusion of the previously offline population into the German Internet Panel affects various survey estimates such as voting behavior and social engagement. Overall, we find little evidence for the claim that equipping otherwise offline individuals with online access affects the estimates derived from previously online individuals only.
在线小组调查经常因无法覆盖离线人群而受到批评,这可能导致报道错误。先前的研究已经表明,非互联网用户实际上在几个社会人口学特征上与在线个人不同。为了减少由于错过离线人口而导致的覆盖误差,几个基于概率的在线面板为离线家庭配备了互联网连接和一台简单的电脑或平板电脑。然而,问题仍然存在,在线小组招募线下个人是否会导致调查估计的实质性变化。也就是说,目前尚不清楚从调查数据中得出的估计是否受到在线和离线个人群体之间差异的影响。在此背景下,我们研究了将以前离线的人口纳入德国互联网小组如何影响各种调查估计,如投票行为和社会参与。总的来说,我们发现很少有证据表明,为离线个人提供在线访问只会影响先前在线个人的估计。
{"title":"Equipping the Offline Population with Internet Access in an Online Panel: Does It Make a Difference?","authors":"Ruben L. Bach, Carina Cornesse, Jessica Daikeler","doi":"10.1093/jssam/smad003","DOIUrl":"https://doi.org/10.1093/jssam/smad003","url":null,"abstract":"\u0000 Online panel surveys are often criticized for their inability to cover the offline population, potentially resulting in coverage error. Previous research has demonstrated that non-internet users in fact differ from online individuals on several sociodemographic characteristics. In attempts to reduce coverage error due to missing the offline population, several probability-based online panels equip offline households with an internet connection and a simple computer or tablet. However, the question remains whether the recruitment of offline individuals for an online panel leads to substantial changes in survey estimates. That is, it is unclear whether estimates derived from the survey data are affected by the differences between the groups of online and offline individuals. Against this background, we investigate how the inclusion of the previously offline population into the German Internet Panel affects various survey estimates such as voting behavior and social engagement. Overall, we find little evidence for the claim that equipping otherwise offline individuals with online access affects the estimates derived from previously online individuals only.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45916954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Handling Missing Values in Surveys With Complex Study Design: A Simulation Study 用复杂的研究设计处理调查中的缺失值:一项模拟研究
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-20 DOI: 10.1093/jssam/smac039
N. Kalpourtzi, James R. Carpenter, G. Touloumi
The inverse probability weighting (IPW) method is commonly used to deal with missing-at-random outcome (response) data collected by surveys with complex sampling designs. However, IPW methods generally assume that fully observed predictor variables are available for all sampled units, and it is unclear how to appropriately implement these methods when one or more independent variables are subject to missing values. Multiple imputation (MI) methods are well suited for a variety of missingness patterns but are not as easily adapted to complex sampling designs. In this case study, we consider the National Survey of Morbidity and Risk Factors (EMENO), a multistage probability sample survey. To understand the strengths and limitations of using either missing data treatment method for the EMENO, we present an extensive simulation study modeled on the EMENO health survey, with the target analysis being the estimation of population prevalence of hypertension as well as the association between hypertension and income. Both variables are subject to missingness. We test a variety of IPW and MI methods in simulation and on empirical data from the survey, assessing robustness by varying missingness mechanisms, proportions of missingness, and strengths of fitted response propensity models.
反概率加权(IPW)方法通常用于处理通过复杂抽样设计的调查收集的随机结果(响应)数据的缺失。然而,IPW方法通常假设所有采样单元都可以获得完全观察到的预测变量,并且当一个或多个自变量存在缺失值时,尚不清楚如何适当地实现这些方法。多重插补(MI)方法非常适合各种缺失模式,但不太容易适应复杂的抽样设计。在本案例研究中,我们考虑了全国发病率和风险因素调查(EMENO),这是一项多阶段概率抽样调查。为了了解EMENO使用缺失数据处理方法的优势和局限性,我们在EMENO健康调查的基础上进行了一项广泛的模拟研究,目标分析是估计高血压的人群患病率以及高血压与收入之间的关系。这两个变量都可能缺失。我们在模拟和调查的经验数据上测试了各种IPW和MI方法,通过不同的缺失机制、缺失比例和拟合的反应倾向模型的强度来评估稳健性。
{"title":"Handling Missing Values in Surveys With Complex Study Design: A Simulation Study","authors":"N. Kalpourtzi, James R. Carpenter, G. Touloumi","doi":"10.1093/jssam/smac039","DOIUrl":"https://doi.org/10.1093/jssam/smac039","url":null,"abstract":"\u0000 The inverse probability weighting (IPW) method is commonly used to deal with missing-at-random outcome (response) data collected by surveys with complex sampling designs. However, IPW methods generally assume that fully observed predictor variables are available for all sampled units, and it is unclear how to appropriately implement these methods when one or more independent variables are subject to missing values. Multiple imputation (MI) methods are well suited for a variety of missingness patterns but are not as easily adapted to complex sampling designs. In this case study, we consider the National Survey of Morbidity and Risk Factors (EMENO), a multistage probability sample survey. To understand the strengths and limitations of using either missing data treatment method for the EMENO, we present an extensive simulation study modeled on the EMENO health survey, with the target analysis being the estimation of population prevalence of hypertension as well as the association between hypertension and income. Both variables are subject to missingness. We test a variety of IPW and MI methods in simulation and on empirical data from the survey, assessing robustness by varying missingness mechanisms, proportions of missingness, and strengths of fitted response propensity models.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48401788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Statistical Matching when Auxiliary Information is Available 当辅助信息可用时,改进统计匹配
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-13 DOI: 10.1093/jssam/smac038
Angelo Moretti, N. Shlomo
There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.
在国家统计研究所内部,人们越来越有兴趣将包含各种社会领域信息的现有数据集结合起来。统计匹配方法可用于通过一组公共变量集成数据源,其中每个数据集包含属于相同目标人群的不同单元。然而,一个常见的问题与在不同数据源中观察到的变量之间的条件独立性假设有关。在这种情况下,可以使用一个包含所有变量的辅助数据集,通过提供在不同数据集上观察到的变量的相关结构信息来改进统计匹配。我们提出通过校准步骤修改辅助数据集的预测模型,并表明我们可以改善各种设置下的统计匹配结果。我们通过模拟和基于欧盟收入和生活条件统计以及英国生活成本和食品调查的应用程序来评估拟议的方法。
{"title":"Improving Statistical Matching when Auxiliary Information is Available","authors":"Angelo Moretti, N. Shlomo","doi":"10.1093/jssam/smac038","DOIUrl":"https://doi.org/10.1093/jssam/smac038","url":null,"abstract":"\u0000 There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48388115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Constructing State and National Estimates of Vaccination Rates from Immunization Information Systems 从免疫信息系统构建州和国家疫苗接种率估计
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-07 DOI: 10.1093/jssam/smac042
T. Raghunathan, K. Kirtland, Ji Li, K. White, B. Murthy, Xia Lin, Latreace Harris, L. Gibbs-Scharf, E. Zell
Immunization Information Systems are confidential computerized population-based systems that collect data from vaccination providers on individual vaccinations administered along with limited patient-level characteristics. Through a data use agreement, Centers for Disease Control and Prevention obtains the individual-level data and aggregates the number of vaccinations for geographical statistical areas defined by the US Census Bureau (counties or equivalent statistical entities) for each vaccine included in system. Currently, 599 counties, covering 11 states, collect and report data using a uniform protocol. We combine these data with inter-decennial population counts from the Population Estimates Program in the US Census Bureau and several covariates from a variety of sources to develop model-based estimates for each of the 3,142 counties in 50 states and the District of Columbia and then aggregate to the state and national levels. We use a hierarchical Bayesian model and Markov Chain Monte Carlo methods to obtain draws from the posterior predictive distribution of the vaccination rates. We use posterior predictive checks and cross-validation to assess the goodness of fit and to validate the models. We also compare the model-based estimates to direct estimates from the National Immunization Surveys.
免疫信息系统是一种保密的基于人群的计算机化系统,从疫苗接种提供者那里收集个人疫苗接种的数据,以及有限的患者水平特征。通过数据使用协议,疾病控制和预防中心获得个人层面的数据,并汇总美国人口普查局(县或同等统计实体)为系统中包含的每种疫苗定义的地理统计区域的疫苗接种数量。目前,覆盖11个州的599个县使用统一协议收集和报告数据。我们将这些数据与美国人口普查局人口估计项目的十年一次的人口统计以及各种来源的几个协变量相结合,为50个州和哥伦比亚特区的3142个县中的每个县制定基于模型的估计,然后汇总到州和国家层面。我们使用分层贝叶斯模型和马尔可夫链蒙特卡罗方法从疫苗接种率的后验预测分布中获得结果。我们使用后验预测检验和交叉验证来评估拟合优度并验证模型。我们还将基于模型的估计与国家免疫调查的直接估计进行了比较。
{"title":"Constructing State and National Estimates of Vaccination Rates from Immunization Information Systems","authors":"T. Raghunathan, K. Kirtland, Ji Li, K. White, B. Murthy, Xia Lin, Latreace Harris, L. Gibbs-Scharf, E. Zell","doi":"10.1093/jssam/smac042","DOIUrl":"https://doi.org/10.1093/jssam/smac042","url":null,"abstract":"\u0000 Immunization Information Systems are confidential computerized population-based systems that collect data from vaccination providers on individual vaccinations administered along with limited patient-level characteristics. Through a data use agreement, Centers for Disease Control and Prevention obtains the individual-level data and aggregates the number of vaccinations for geographical statistical areas defined by the US Census Bureau (counties or equivalent statistical entities) for each vaccine included in system. Currently, 599 counties, covering 11 states, collect and report data using a uniform protocol. We combine these data with inter-decennial population counts from the Population Estimates Program in the US Census Bureau and several covariates from a variety of sources to develop model-based estimates for each of the 3,142 counties in 50 states and the District of Columbia and then aggregate to the state and national levels. We use a hierarchical Bayesian model and Markov Chain Monte Carlo methods to obtain draws from the posterior predictive distribution of the vaccination rates. We use posterior predictive checks and cross-validation to assess the goodness of fit and to validate the models. We also compare the model-based estimates to direct estimates from the National Immunization Surveys.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"1 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41952610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Application of Adaptive Cluster Sampling to Surveying Informal Businesses 自适应聚类抽样在非正式企业调查中的应用
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-01-27 DOI: 10.1093/jssam/smac037
Gemechu Aga, David C Francis, Filip Jolevski, Jorge Rodriguez Meza, Joshua Seth Wimpey
Abstract Informal business activity is ubiquitous around the world, but it is nearly always uncaptured by administrative data, registries, or commercial sources. For this reason, there are rarely adequate sampling frames available for survey implementers wishing to measure the activity and characteristics of the sector. This article applies a well-established sampling method for rare and/or clustered populations, Adaptive Cluster Sampling (ACS), to a novel population of informal businesses. Generally, it shows that efficiency gains through the application of ACS, when compared to Simple Random Sampling (SRS), are large, particularly at higher levels of fieldwork effort. In particular, ACS efficiency gains over SRS remain sizable at higher values of initial starting samples, but with comparatively high expansion thresholds, which can reduce the fieldwork effort.
非正式的商业活动在世界各地无处不在,但它几乎总是不被管理数据、注册表或商业来源所捕获。因此,很少有足够的抽样框架可供希望衡量该部门的活动和特征的调查执行者使用。本文将一种成熟的针对罕见和/或群集人口的抽样方法——自适应群集抽样(ACS)——应用于一种新的非正式企业群体。总的来说,与简单随机抽样(SRS)相比,应用ACS的效率提高很大,特别是在较高水平的现场工作中。特别是,在较高的初始起始样本值下,ACS效率比SRS的收益仍然相当可观,但膨胀阈值相对较高,这可能会减少现场工作的工作量。
{"title":"An Application of Adaptive Cluster Sampling to Surveying Informal Businesses","authors":"Gemechu Aga, David C Francis, Filip Jolevski, Jorge Rodriguez Meza, Joshua Seth Wimpey","doi":"10.1093/jssam/smac037","DOIUrl":"https://doi.org/10.1093/jssam/smac037","url":null,"abstract":"Abstract Informal business activity is ubiquitous around the world, but it is nearly always uncaptured by administrative data, registries, or commercial sources. For this reason, there are rarely adequate sampling frames available for survey implementers wishing to measure the activity and characteristics of the sector. This article applies a well-established sampling method for rare and/or clustered populations, Adaptive Cluster Sampling (ACS), to a novel population of informal businesses. Generally, it shows that efficiency gains through the application of ACS, when compared to Simple Random Sampling (SRS), are large, particularly at higher levels of fieldwork effort. In particular, ACS efficiency gains over SRS remain sizable at higher values of initial starting samples, but with comparatively high expansion thresholds, which can reduce the fieldwork effort.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135794712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Interviewer Fraud Using Multilevel Models 利用多层次模型检测面试官欺诈
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-01-02 DOI: 10.1093/jssam/smac036
Lukas Olbrich, Yuliya Kosyakova, J. Sakshaug, Silvia Schwanhäuser
Interviewer falsification, such as the complete or partial fabrication of interview data, has been shown to substantially affect the results of survey data. In this study, we apply a method to identify falsifying face-to-face interviewers based on the development of their behavior over the survey field period. We postulate four potential falsifier types: steady low-effort falsifiers, steady high-effort falsifiers, learning falsifiers, and sudden falsifiers. Using large-scale survey data from Germany with verified falsifications, we apply multilevel models with interviewer effects on the intercept, scale, and slope of the interview sequence to test whether falsifiers can be detected based on their dynamic behavior. In addition to identifying a rather high-effort falsifier previously detected by the survey organization, the model flagged two additional suspicious interviewers exhibiting learning behavior, who were subsequently classified as deviant by the survey organization. We additionally apply the analysis approach to publicly available cross-national survey data and find multiple interviewers who show behavior consistent with the postulated falsifier types.
采访者造假,如完全或部分伪造采访数据,已被证明会对调查数据的结果产生重大影响。在这项研究中,我们应用了一种方法,根据他们在调查期间的行为发展来识别伪造的面对面采访者。我们假设了四种潜在的证伪者类型:稳定的低努力证伪者、稳定的高努力证伪器、学习证伪器和突然证伪器。利用来自德国的大规模调查数据,我们对采访序列的截距、规模和斜率应用了具有采访者效应的多层次模型,以测试是否可以根据造假者的动态行为来检测造假者。除了识别调查组织之前检测到的一个相当努力的造假者外,该模型还标记了另外两名表现出学习行为的可疑受访者,他们随后被调查组织归类为离经叛道者。此外,我们将分析方法应用于公开的跨国调查数据,并发现多名受访者的行为与假设的证伪者类型一致。
{"title":"Detecting Interviewer Fraud Using Multilevel Models","authors":"Lukas Olbrich, Yuliya Kosyakova, J. Sakshaug, Silvia Schwanhäuser","doi":"10.1093/jssam/smac036","DOIUrl":"https://doi.org/10.1093/jssam/smac036","url":null,"abstract":"\u0000 Interviewer falsification, such as the complete or partial fabrication of interview data, has been shown to substantially affect the results of survey data. In this study, we apply a method to identify falsifying face-to-face interviewers based on the development of their behavior over the survey field period. We postulate four potential falsifier types: steady low-effort falsifiers, steady high-effort falsifiers, learning falsifiers, and sudden falsifiers. Using large-scale survey data from Germany with verified falsifications, we apply multilevel models with interviewer effects on the intercept, scale, and slope of the interview sequence to test whether falsifiers can be detected based on their dynamic behavior. In addition to identifying a rather high-effort falsifier previously detected by the survey organization, the model flagged two additional suspicious interviewers exhibiting learning behavior, who were subsequently classified as deviant by the survey organization. We additionally apply the analysis approach to publicly available cross-national survey data and find multiple interviewers who show behavior consistent with the postulated falsifier types.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42430170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Web Survey Mode and Panel Effects in a Nationwide Survey of Alcohol Use. 估计全国酒精使用调查的网络调查模式和小组效应。
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2022-11-02 eCollection Date: 2023-11-01 DOI: 10.1093/jssam/smac028
Randal ZuWallack, Matt Jans, Thomas Brassell, Kisha Bailly, James Dayton, Priscilla Martinez, Deidre Patterson, Thomas K Greenfield, Katherine J Karriker-Jaffe

Random-digit dialing (RDD) telephone surveys are challenged by declining response rates and increasing costs. Many surveys that were traditionally conducted via telephone are seeking cost-effective alternatives, such as address-based sampling (ABS) with self-administered web or mail questionnaires. At a fraction of the cost of both telephone and ABS surveys, opt-in web panels are an attractive alternative. The 2019-2020 National Alcohol Survey (NAS) employed three methods: (1) an RDD telephone survey (traditional NAS method); (2) an ABS push-to-web survey; and (3) an opt-in web panel. The study reported here evaluated differences in the three data-collection methods, which we will refer to as "mode effects," on alcohol consumption and health topics. To evaluate mode effects, multivariate regression models were developed predicting these characteristics, and the presence of a mode effect on each outcome was determined by the significance of the three-level effect (RDD-telephone, ABS-web, opt-in web panel) in each model. Those results were then used to adjust for mode effects and produce a "telephone-equivalent" estimate for the ABS and panel data sources. The study found that ABS-web and RDD were similar for most estimates but exhibited differences for sensitive questions including getting drunk and experiencing depression. The opt-in web panel exhibited more differences between it and the other two survey modes. One notable example is the reporting of drinking alcohol at least 3-4 times per week, which was 21 percent for RDD-phone, 24 percent for ABS-web, and 34 percent for opt-in web panel. The regression model adjusts for mode effects, improving comparability with past surveys conducted by telephone; however, the models result in higher variance of the estimates. This method of adjusting for mode effects has broad applications to mode and sample transitions throughout the survey research industry.

随机数字拨号(RDD)电话调查受到回复率下降和成本增加的挑战。许多传统上通过电话进行的调查正在寻求具有成本效益的替代方案,例如基于地址的抽样(ABS)与自我管理的网络或邮件问卷。与电话调查和ABS调查相比,选择加入的网页面板是一个很有吸引力的选择。2019-2020年全国酒精调查(NAS)采用了三种方法:(1)RDD电话调查(传统的NAS方法);(2) ABS推送到网页的调查;(3)一个可选择的网络面板。这里报告的研究评估了三种数据收集方法的差异,我们将其称为“模式效应”,在酒精消费和健康主题上。为了评估模式效应,我们建立了预测这些特征的多元回归模型,并通过每个模型中三层次效应(RDD-telephone, ABS-web, option -in web panel)的显著性来确定模式效应对每个结果的影响。然后,这些结果被用于调整模式效应,并为ABS和面板数据源产生“电话等效”估计。研究发现,ABS-web和RDD在大多数估计上是相似的,但在诸如醉酒和抑郁等敏感问题上表现出差异。可选择的网络面板显示出它与其他两种调查模式之间的差异。一个值得注意的例子是报告每周至少饮酒3-4次,其中RDD-phone为21%,ABS-web为24%,option -in web面板为34%。回归模型调整了模式效应,提高了与以往电话调查的可比性;然而,这些模型导致估计的方差较大。这种调整模式效应的方法在整个调查研究行业的模式和样本过渡中有着广泛的应用。
{"title":"Estimating Web Survey Mode and Panel Effects in a Nationwide Survey of Alcohol Use.","authors":"Randal ZuWallack, Matt Jans, Thomas Brassell, Kisha Bailly, James Dayton, Priscilla Martinez, Deidre Patterson, Thomas K Greenfield, Katherine J Karriker-Jaffe","doi":"10.1093/jssam/smac028","DOIUrl":"https://doi.org/10.1093/jssam/smac028","url":null,"abstract":"<p><p>Random-digit dialing (RDD) telephone surveys are challenged by declining response rates and increasing costs. Many surveys that were traditionally conducted via telephone are seeking cost-effective alternatives, such as address-based sampling (ABS) with self-administered web or mail questionnaires. At a fraction of the cost of both telephone and ABS surveys, opt-in web panels are an attractive alternative. The 2019-2020 National Alcohol Survey (NAS) employed three methods: (1) an RDD telephone survey (traditional NAS method); (2) an ABS push-to-web survey; and (3) an opt-in web panel. The study reported here evaluated differences in the three data-collection methods, which we will refer to as \"mode effects,\" on alcohol consumption and health topics. To evaluate mode effects, multivariate regression models were developed predicting these characteristics, and the presence of a mode effect on each outcome was determined by the significance of the three-level effect (RDD-telephone, ABS-web, opt-in web panel) in each model. Those results were then used to adjust for mode effects and produce a \"telephone-equivalent\" estimate for the ABS and panel data sources. The study found that ABS-web and RDD were similar for most estimates but exhibited differences for sensitive questions including getting drunk and experiencing depression. The opt-in web panel exhibited more differences between it and the other two survey modes. One notable example is the reporting of drinking alcohol at least 3-4 times per week, which was 21 percent for RDD-phone, 24 percent for ABS-web, and 34 percent for opt-in web panel. The regression model adjusts for mode effects, improving comparability with past surveys conducted by telephone; however, the models result in higher variance of the estimates. This method of adjusting for mode effects has broad applications to mode and sample transitions throughout the survey research industry.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"11 5","pages":"1089-1109"},"PeriodicalIF":2.1,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138460650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1