首页 > 最新文献

Survey Methodology最新文献

英文 中文
The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment. 锚定法:在没有互渗透样本分配的情况下,估计采访者的效果。
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2022-06-01
Michael R Elliott, Brady T West, Xinyu Zhang, Stephanie Coffey

Methodological studies of the effects that human interviewers have on the quality of survey data have long been limited by a critical assumption: that interviewers in a given survey are assigned random subsets of the larger overall sample (also known as interpenetrated assignment). Absent this type of study design, estimates of interviewer effects on survey measures of interest may reflect differences between interviewers in the characteristics of their assigned sample members, rather than recruitment or measurement effects specifically introduced by the interviewers. Previous attempts to approximate interpenetrated assignment have typically used regression models to condition on factors that might be related to interviewer assignment. We introduce a new approach for overcoming this lack of interpenetrated assignment when estimating interviewer effects. This approach, which we refer to as the "anchoring" method, leverages correlations between observed variables that are unlikely to be affected by interviewers ("anchors") and variables that may be prone to interviewer effects to remove components of within-interviewer correlations that lack of interpenetrated assignment may introduce. We consider both frequentist and Bayesian approaches, where the latter can make use of information about interviewer effect variances in previous waves of a study, if available. We evaluate this new methodology empirically using a simulation study, and then illustrate its application using real survey data from the Behavioral Risk Factor Surveillance System (BRFSS), where interviewer IDs are provided on public-use data files. While our proposed method shares some of the limitations of the traditional approach - namely the need for variables associated with the outcome of interest that are also free of measurement error - it avoids the need for conditional inference and thus has improved inferential qualities when the focus is on marginal estimates, and it shows evidence of further reducing overestimation of larger interviewer effects relative to the traditional approach.

长期以来,关于人类采访者对调查数据质量影响的方法学研究一直受到一个关键假设的限制:在给定的调查中,采访者是从更大的总体样本中随机分配的子集(也称为相互渗透分配)。如果没有这种类型的研究设计,对感兴趣的调查措施的访谈者效应的估计可能反映了访谈者在其分配的样本成员特征上的差异,而不是访谈者专门引入的招聘或测量效应。以前尝试近似互渗透分配通常使用回归模型来限制可能与采访者分配相关的因素。我们引入了一种新的方法来克服在估计采访者效果时缺乏互渗透分配的问题。这种方法,我们称之为“锚定”方法,利用观察到的变量之间的相关性,这些变量不太可能受到采访者(“锚定”)的影响,而变量可能容易受到采访者的影响,以消除缺乏相互渗透分配可能引入的采访者内部相关性的成分。我们考虑频率论和贝叶斯方法,后者可以利用前几波研究中关于采访者效应方差的信息,如果有的话。我们通过模拟研究对这种新方法进行了实证评估,然后使用行为风险因素监测系统(BRFSS)的真实调查数据来说明其应用,其中采访者的id提供在公共使用数据文件中。虽然我们提出的方法有一些传统方法的局限性——即需要与感兴趣的结果相关的变量,这些变量也没有测量误差——但它避免了条件推理的需要,因此在关注边际估计时提高了推理质量,并且它显示了相对于传统方法进一步减少对更大的采访者影响的高估的证据。
{"title":"The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment.","authors":"Michael R Elliott,&nbsp;Brady T West,&nbsp;Xinyu Zhang,&nbsp;Stephanie Coffey","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Methodological studies of the effects that human interviewers have on the quality of survey data have long been limited by a critical assumption: that interviewers in a given survey are assigned random subsets of the larger overall sample (also known as interpenetrated assignment). Absent this type of study design, estimates of interviewer effects on survey measures of interest may reflect differences between interviewers in the characteristics of their assigned sample members, rather than recruitment or measurement effects specifically introduced by the interviewers. Previous attempts to approximate interpenetrated assignment have typically used regression models to condition on factors that might be related to interviewer assignment. We introduce a new approach for overcoming this lack of interpenetrated assignment when estimating interviewer effects. This approach, which we refer to as the \"anchoring\" method, leverages correlations between observed variables that are unlikely to be affected by interviewers (\"anchors\") and variables that may be prone to interviewer effects to remove components of within-interviewer correlations that lack of interpenetrated assignment may introduce. We consider both frequentist and Bayesian approaches, where the latter can make use of information about interviewer effect variances in previous waves of a study, if available. We evaluate this new methodology empirically using a simulation study, and then illustrate its application using real survey data from the Behavioral Risk Factor Surveillance System (BRFSS), where interviewer IDs are provided on public-use data files. While our proposed method shares some of the limitations of the traditional approach - namely the need for variables associated with the outcome of interest that are also free of measurement error - it avoids the need for conditional inference and thus has improved inferential qualities when the focus is on marginal estimates, and it shows evidence of further reducing overestimation of larger interviewer effects relative to the traditional approach.</p>","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9983757/pdf/nihms-1832600.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10844524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A note on multiply robust predictive mean matching imputation with complex survey data. 关于复杂调查数据的多稳健预测均值匹配估算的说明。
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2021-06-01 Epub Date: 2021-06-24
Sixia Chen, David Haziza, Alexander Stubblefield

Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonrespone in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.

预测均值匹配是一种常用的估算程序,用于解决调查中的项目无响应问题。传统方法依赖于指定单一结果回归模型。在本说明中,我们提出了一种新颖的预测均值匹配程序,允许用户指定多个结果回归模型。由此产生的估计器具有多重稳健性,即只要指定的结果回归模型之一正确,估计器就能保持一致。模拟研究的结果表明,所提出的方法在偏差和效率方面表现良好。
{"title":"A note on multiply robust predictive mean matching imputation with complex survey data.","authors":"Sixia Chen, David Haziza, Alexander Stubblefield","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonrespone in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.</p>","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438827/pdf/nihms-1704347.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10053183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimum allocation for a dual-frame telephone survey. 双帧电话调查的最佳分配。
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2015-12-01 Epub Date: 2015-12-17
Kirk M Wolter, Xian Tao, Robert Montgomery, Philip J Smith

Careful design of a dual-frame random digit dial (RDD) telephone survey requires selecting from among many options that have varying impacts on cost, precision, and coverage in order to obtain the best possible implementation of the study goals. One such consideration is whether to screen cell-phone households in order to interview cell-phone only (CPO) households and exclude dual-user household, or to take all interviews obtained via the cell-phone sample. We present a framework in which to consider the tradeoffs between these two options and a method to select the optimal design. We derive and discuss the optimum allocation of sample size between the two sampling frames and explore the choice of optimum p, the mixing parameter for the dual-user domain. We illustrate our methods using the National Immunization Survey, sponsored by the Centers for Disease Control and Prevention.

精心设计双帧随机数字拨号(RDD)电话调查需要从许多选项中进行选择,这些选项对成本,精度和覆盖范围有不同的影响,以便获得最佳的研究目标实施。其中一个需要考虑的问题是,是否对使用手机的家庭进行筛选,以便只采访使用手机的家庭(CPO),并排除使用双重用户的家庭,还是对所有通过手机样本获得的采访进行调查。我们提出了一个框架,其中考虑这两个选项之间的权衡和方法来选择最优设计。我们推导并讨论了两个采样帧之间样本大小的最优分配,并探讨了双用户域的最优混合参数p的选择。我们使用由疾病控制和预防中心赞助的全国免疫调查来说明我们的方法。
{"title":"Optimum allocation for a dual-frame telephone survey.","authors":"Kirk M Wolter,&nbsp;Xian Tao,&nbsp;Robert Montgomery,&nbsp;Philip J Smith","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Careful design of a dual-frame random digit dial (RDD) telephone survey requires selecting from among many options that have varying impacts on cost, precision, and coverage in order to obtain the best possible implementation of the study goals. One such consideration is whether to screen cell-phone households in order to interview cell-phone only (CPO) households and exclude dual-user household, or to take all interviews obtained via the cell-phone sample. We present a framework in which to consider the tradeoffs between these two options and a method to select the optimal design. We derive and discuss the optimum allocation of sample size between the two sampling frames and explore the choice of optimum <i>p</i>, the mixing parameter for the dual-user domain. We illustrate our methods using the <i>National Immunization Survey</i>, sponsored by the Centers for Disease Control and Prevention.</p>","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5839168/pdf/nihms945885.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35897071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining information from multiple complex surveys. 结合多个复杂调查的信息。
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2014-12-01 Epub Date: 2014-12-19
Qi Dong, Michael R Elliott, Trivellore E Raghunathan

This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounting for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).

这篇手稿描述了使用多重imputation来结合来自同一潜在人群的多重调查的信息。我们使用一种新开发的方法来生成非参数合成总体,使用有限总体贝叶斯自举法,自动考虑复杂的样本设计。然后,我们用简单随机样本的标准完整数据软件分析每个合成总体,并利用现有合成数据组合规则的扩展,通过结合点和方差估计获得有效的推断。我们通过结合2006年全国健康访谈调查(NHIS)和2006年医疗支出小组调查(MEPS)的数据来说明这种方法。
{"title":"Combining information from multiple complex surveys.","authors":"Qi Dong,&nbsp;Michael R Elliott,&nbsp;Trivellore E Raghunathan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounting for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).</p>","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708582/pdf/nihms921254.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35215512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A nonparametric method to generate synthetic populations to adjust for complex sampling design features. 一种生成合成群体的非参数方法,用于调整复杂的抽样设计特征。
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2014-06-01 Epub Date: 2014-06-27
Qi Dong, Michael R Elliott, Trivellore E Raghunathan

Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

在调查抽样文献之外,通常假定样本是由简单的随机抽样过程产生的,该过程会产生独立且同分布(IID)的样本。许多统计方法主要就是在这种 IID 世界中发展起来的。将这些方法应用于复杂抽样调查的数据时,如果不考虑调查设计的特点,可能会导致错误的推论。因此,人们投入了大量的时间和精力来开发统计方法,以分析复杂的调查数据并考虑样本设计。在使用有限总体贝叶斯推断法生成合成总体时,这个问题尤为重要,因为在缺失数据或披露风险环境下,或者在合并来自多个调查的数据时,经常会出现这种情况。通过扩展有限种群贝叶斯引导文献中的前人工作,我们提出了一种从后验预测分布生成合成种群的方法,该方法反转了复杂抽样设计的特征,并从超种群的角度生成简单随机样本,对复杂数据进行调整,使其可以作为简单随机样本进行分析。我们考虑了分层聚类不等概率抽样设计的模拟研究,并使用所提出的非参数方法生成了 2006 年全国健康访谈调查(NHIS)和医疗支出面板调查(MEPS)的合成人群,这两个调查都是分层聚类不等概率抽样设计。
{"title":"A nonparametric method to generate synthetic populations to adjust for complex sampling design features.","authors":"Qi Dong, Michael R Elliott, Trivellore E Raghunathan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.</p>","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708580/pdf/nihms921248.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35215509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian inference for finite population quantiles from unequal probability samples. 不等概率样本有限总体分位数的贝叶斯推理。
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2012-12-01 Epub Date: 2012-12-19
Qixuan Chen, Michael R Elliott, Roderick J A Little

This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.

本文发展了两种用不等概率抽样对连续调查变量有限总体分位数进行推断的贝叶斯方法。第一种方法通过对包含概率拟合多个概率惩罚样条回归模型来估计连续调查变量的累积分布函数。然后通过对估计的分布函数进行反求得到有限总体分位数。这种方法对计算量要求很高。第二种方法通过假设连续调查变量与包含概率之间的平滑变化关系,通过使用样条对均值函数和方差函数建模来预测非抽样值。这两个基于贝叶斯样条模型的估计器在鲁棒性和效率之间取得了理想的平衡。仿真研究表明,这两种方法产生的均方根误差都小于样本加权估计器和Rao、Kovar和Mantel (RKM 1990)描述的比率和差异估计器,并且比Chambers和Dunstan(1986)描述的基于起源模型的估计器的回归对模型错误规范的鲁棒性更强。当样本量较小时,两种新方法的95%可信区间比样本加权估计器更接近名义置信覆盖率。
{"title":"Bayesian inference for finite population quantiles from unequal probability samples.","authors":"Qixuan Chen,&nbsp;Michael R Elliott,&nbsp;Roderick J A Little","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.</p>","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708554/pdf/nihms921237.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35215508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling. 基于贝叶斯惩罚样条模型的不等概率抽样有限总体比例推理。
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2010-06-01 Epub Date: 2010-06-29
Qixuan Chen, Michael R Elliott, Roderick J A Little

We propose a Bayesian Penalized Spline Predictive (BPSP) estimator for a finite population proportion in an unequal probability sampling setting. This new method allows the probabilities of inclusion to be directly incorporated into the estimation of a population proportion, using a probit regression of the binary outcome on the penalized spline of the inclusion probabilities. The posterior predictive distribution of the population proportion is obtained using Gibbs sampling. The advantages of the BPSP estimator over the Hájek (HK), Generalized Regression (GR), and parametric model-based prediction estimators are demonstrated by simulation studies and a real example in tax auditing. Simulation studies show that the BPSP estimator is more efficient, and its 95% credible interval provides better confidence coverage with shorter average width than the HK and GR estimators, especially when the population proportion is close to zero or one or when the sample is small. Compared to linear model-based predictive estimators, the BPSP estimators are robust to model misspecification and influential observations in the sample.

针对非等概率采样条件下的有限总体比例,提出了一种贝叶斯惩罚样条预测(BPSP)估计。这种新方法允许将包含概率直接纳入总体比例的估计,使用包含概率的惩罚样条上的二进制结果的概率回归。采用吉布斯抽样法得到了总体比例的后验预测分布。通过仿真研究和税务审计实例,证明了BPSP估计器相对于Hájek (HK)、广义回归(GR)和基于参数模型的预测估计器的优势。仿真研究表明,与HK和GR估计相比,BPSP估计具有更高的效率,其95%可信区间具有更好的置信覆盖率和更短的平均宽度,特别是在总体比例接近于0或1或样本较小的情况下。与基于线性模型的预测估计器相比,BPSP估计器对样本中的错误规范和有影响的观测值具有鲁棒性。
{"title":"Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling.","authors":"Qixuan Chen,&nbsp;Michael R Elliott,&nbsp;Roderick J A Little","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We propose a Bayesian Penalized Spline Predictive (BPSP) estimator for a finite population proportion in an unequal probability sampling setting. This new method allows the probabilities of inclusion to be directly incorporated into the estimation of a population proportion, using a probit regression of the binary outcome on the penalized spline of the inclusion probabilities. The posterior predictive distribution of the population proportion is obtained using Gibbs sampling. The advantages of the BPSP estimator over the Hájek (HK), Generalized Regression (GR), and parametric model-based prediction estimators are demonstrated by simulation studies and a real example in tax auditing. Simulation studies show that the BPSP estimator is more efficient, and its 95% credible interval provides better confidence coverage with shorter average width than the HK and GR estimators, especially when the population proportion is close to zero or one or when the sample is small. Compared to linear model-based predictive estimators, the BPSP estimators are robust to model misspecification and influential observations in the sample.</p>","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2010-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708555/pdf/nihms921230.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35215506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal sample allocation for design-consistent regression in a cancer services survey when design variables are known for aggregates. 当设计变量已知为总量时,癌症服务调查中设计一致回归的最佳样本分配。
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2008-06-01
Alan M Zaslavsky, Hui Zheng, John Adams

We consider optimal sampling rates in element-sampling designs when the anticipated analysis is survey-weighted linear regression and the estimands of interest are linear combinations of regression coefficients from one or more models. Methods are first developed assuming that exact design information is available in the sampling frame and then generalized to situations in which some design variables are available only as aggregates for groups of potential subjects, or from inaccurate or old data. We also consider design for estimation of combinations of coefficients from more than one model. A further generalization allows for flexible combinations of coefficients chosen to improve estimation of one effect while controlling for another. Potential applications include estimation of means for several sets of overlapping domains, or improving estimates for subpopulations such as minority races by disproportionate sampling of geographic areas. In the motivating problem of designing a survey on care received by cancer patients (the CanCORS study), potential design information included block-level census data on race/ethnicity and poverty as well as individual-level data. In one study site, an unequal-probability sampling design using the subjectss residential addresses and census data would have reduced the variance of the estimator of an income effect by 25%, or by 38% if the subjects' races were also known. With flexible weighting of the income contrasts by race, the variance of the estimator would be reduced by 26% using residential addresses alone and by 52% using addresses and races. Our methods would be useful in studies in which geographic oversampling by race-ethnicity or socioeconomic characteristics is considered, or in any study in which characteristics available in sampling frames are measured with error.

当预期分析是调查加权线性回归,而感兴趣的估计值是来自一个或多个模型的回归系数的线性组合时,我们考虑元素抽样设计中的最佳抽样率。方法首先是假设在抽样框架中可以获得精确的设计信息,然后将其推广到某些设计变量只能作为潜在受试者群体的总和或来自不准确或旧数据的情况。我们还考虑了从多个模型中估计系数组合的设计。进一步的推广允许选择系数的灵活组合,以改进对一种效应的估计,同时控制另一种效应。潜在的应用包括对几组重叠域的均值估计,或通过对地理区域进行不成比例的抽样来改进对少数民族等亚种群的估计。在设计癌症患者接受治疗调查的激励问题(CanCORS研究)中,潜在的设计信息包括种族/民族和贫困的块级人口普查数据以及个人水平的数据。在一个研究地点,使用受试者的居住地址和人口普查数据的非等概率抽样设计可以将收入效应估计值的方差减少25%,如果受试者的种族也已知,则可以减少38%。如果按种族对收入对比进行灵活的加权,仅使用居住地址估算器的方差将减少26%,使用地址和种族估算器的方差将减少52%。我们的方法在考虑种族或社会经济特征的地理过采样的研究中是有用的,或者在抽样框架中可用的特征有误差测量的任何研究中都是有用的。
{"title":"Optimal sample allocation for design-consistent regression in a cancer services survey when design variables are known for aggregates.","authors":"Alan M Zaslavsky,&nbsp;Hui Zheng,&nbsp;John Adams","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider optimal sampling rates in element-sampling designs when the anticipated analysis is survey-weighted linear regression and the estimands of interest are linear combinations of regression coefficients from one or more models. Methods are first developed assuming that exact design information is available in the sampling frame and then generalized to situations in which some design variables are available only as aggregates for groups of potential subjects, or from inaccurate or old data. We also consider design for estimation of combinations of coefficients from more than one model. A further generalization allows for flexible combinations of coefficients chosen to improve estimation of one effect while controlling for another. Potential applications include estimation of means for several sets of overlapping domains, or improving estimates for subpopulations such as minority races by disproportionate sampling of geographic areas. In the motivating problem of designing a survey on care received by cancer patients (the CanCORS study), potential design information included block-level census data on race/ethnicity and poverty as well as individual-level data. In one study site, an unequal-probability sampling design using the subjectss residential addresses and census data would have reduced the variance of the estimator of an income effect by 25%, or by 38% if the subjects' races were also known. With flexible weighting of the income contrasts by race, the variance of the estimator would be reduced by 26% using residential addresses alone and by 52% using addresses and races. Our methods would be useful in studies in which geographic oversampling by race-ethnicity or socioeconomic characteristics is considered, or in any study in which characteristics available in sampling frames are measured with error.</p>","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2008-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2725367/pdf/nihms-105215.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28339824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of the Distribution of Hourly Pay from Household Survey Data: The Use of Missing Data Methods to Handle Measurement Error 从住户调查数据估计时薪分布:利用缺失数据方法处理测量误差
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2003-05-22 DOI: 10.1920/WP.CEM.2003.1203
G. Beissel-Durrant, C. Skinner
Measurement errors in survey data on hourly pay may lead to serious upward bias in low pay estimates. We consider how to correct for this bias when auxiliary accurately measured data are available for a subsample. An application to the UK Labour Force Survey is described. The use of fractional imputation, nearest neighbour imputation, predictive mean matching and propensity score weighting are considered. Properties of point estimators are compared both theoretically and by simulation. A fractional predictive mean matching imputation approach is advocated. It performs similarly to propensity score weighting, but displays slight advantages of robustness and efficiency.
小时工资调查数据中的测量误差可能导致低工资估计中的严重向上偏差。我们考虑当辅助精确测量数据可用于子样本时如何纠正这种偏差。应用到英国劳动力调查描述。考虑了分数归算、最近邻归算、预测均值匹配和倾向得分加权的使用。从理论和仿真两方面比较了点估计器的性质。提出了一种分数预测均值匹配插值方法。它的执行类似于倾向得分加权,但显示出鲁棒性和效率的轻微优势。
{"title":"Estimation of the Distribution of Hourly Pay from Household Survey Data: The Use of Missing Data Methods to Handle Measurement Error","authors":"G. Beissel-Durrant, C. Skinner","doi":"10.1920/WP.CEM.2003.1203","DOIUrl":"https://doi.org/10.1920/WP.CEM.2003.1203","url":null,"abstract":"Measurement errors in survey data on hourly pay may lead to serious upward bias in low pay estimates. We consider how to correct for this bias when auxiliary accurately measured data are available for a subsample. An application to the UK Labour Force Survey is described. The use of fractional imputation, nearest neighbour imputation, predictive mean matching and propensity score weighting are considered. Properties of point estimators are compared both theoretically and by simulation. A fractional predictive mean matching imputation approach is advocated. It performs similarly to propensity score weighting, but displays slight advantages of robustness and efficiency.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2003-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68006267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Variance Estimation After Imputation 方差估计后的估计
IF 0.9 4区 数学 Q4 Mathematics Pub Date : 2000-01-01 DOI: 10.31274/RTD-180813-13957
Jae Kwang Kim
Imputation is commonly used to compensate for item nonresponse. Variance estimation after imputation has generated considerable discussion and several variance estimators have been proposed. We propose a variance estimator based on a pseudo data set used only for variance estimation. Standard complete data variance estimators applied to the pseudo data set lead to consistent estimators for linear estimators under various imputation methods, including without­replacement hot deck imputation and with­replacement hot deck imputation. The asymptotic equivalence of the proposed method and the adjusted jackknife method of Rao and Sitter (1995) is illustrated. The proposed method is directly applicable to variance estimation for two­phase sampling.
代入通常用于补偿项目无反应。对方差估计进行了大量的讨论,并提出了几种方差估计器。我们提出了一种基于仅用于方差估计的伪数据集的方差估计器。将标准的完全数据方差估计量应用于伪数据集,在不替换热甲板估算和替换热甲板估算两种估算方法下,得到线性估计量的一致估计量。说明了该方法与Rao和Sitter(1995)的调整折刀法的渐近等价性。该方法可直接应用于两相抽样的方差估计。
{"title":"Variance Estimation After Imputation","authors":"Jae Kwang Kim","doi":"10.31274/RTD-180813-13957","DOIUrl":"https://doi.org/10.31274/RTD-180813-13957","url":null,"abstract":"Imputation is commonly used to compensate for item nonresponse. Variance estimation after imputation has generated considerable discussion and several variance estimators have been proposed. We propose a variance estimator based on a pseudo data set used only for variance estimation. Standard complete data variance estimators applied to the pseudo data set lead to consistent estimators for linear estimators under various imputation methods, including without­replacement hot deck imputation and with­replacement hot deck imputation. The asymptotic equivalence of the proposed method and the adjusted jackknife method of Rao and Sitter (1995) is illustrated. The proposed method is directly applicable to variance estimation for two­phase sampling.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69350594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
Survey Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1