Michael Jones, J Michael Brick, Wendy Van De Kerckhove
For over a decade, address-based sampling (ABS) frames have often been used to draw samples for multistage area sample surveys in lieu of traditionally listed (or enumerated) address frames. However, it is well known that the use of ABS frames for face-to-face surveys suffer from undercoverage due to, for example, households that receive mail via a PO Box rather than being delivered to the household's street address. Undercoverage of ABS frames has typically been more prominent in rural areas but can also occur in urban areas where recent construction of households has taken place. Procedures have been developed to supplement ABS frames to address this undercoverage. In this article, we investigate a procedure called Address Coverage Enhancement (ACE) that supplements the ABS frame with addresses not found on the frame, and the resulting effects the addresses added to the sample through ACE have on estimates. Weighted estimates from two studies, the Population Assessment of Tobacco and Health Study and the 2017 US Program for the International Assessment of Adult Competencies, are calculated with and without supplemental addresses. Estimates are then calculated to assess if poststratifying analysis weights to control for urbanicity at the person level brings estimates closer to estimates from the supplemented frame. Our findings show that the noncoverage bias was likely minimal across both studies for a range of estimates. The main reason is because the Computerized Delivery Sequence file coverage rate is high, and when the coverage rate is high, only very large differences between the covered and not covered will result in meaningful bias.
{"title":"Effects of Address Coverage Enhancement on Estimates from Address-Based Sampling Studies.","authors":"Michael Jones, J Michael Brick, Wendy Van De Kerckhove","doi":"10.1093/jssam/smab032","DOIUrl":"https://doi.org/10.1093/jssam/smab032","url":null,"abstract":"<p><p>For over a decade, address-based sampling (ABS) frames have often been used to draw samples for multistage area sample surveys in lieu of traditionally listed (or enumerated) address frames. However, it is well known that the use of ABS frames for face-to-face surveys suffer from undercoverage due to, for example, households that receive mail via a PO Box rather than being delivered to the household's street address. Undercoverage of ABS frames has typically been more prominent in rural areas but can also occur in urban areas where recent construction of households has taken place. Procedures have been developed to supplement ABS frames to address this undercoverage. In this article, we investigate a procedure called Address Coverage Enhancement (ACE) that supplements the ABS frame with addresses not found on the frame, and the resulting effects the addresses added to the sample through ACE have on estimates. Weighted estimates from two studies, the Population Assessment of Tobacco and Health Study and the 2017 US Program for the International Assessment of Adult Competencies, are calculated with and without supplemental addresses. Estimates are then calculated to assess if poststratifying analysis weights to control for urbanicity at the person level brings estimates closer to estimates from the supplemented frame. Our findings show that the noncoverage bias was likely minimal across both studies for a range of estimates. The main reason is because the Computerized Delivery Sequence file coverage rate is high, and when the coverage rate is high, only very large differences between the covered and not covered will result in meaningful bias.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10080217/pdf/smab032.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9274583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brady T West, James Wagner, Stephanie Coffey, Michael R Elliott
Responsive survey design (RSD) aims to increase the efficiency of survey data collection via live monitoring of paradata and the introduction of protocol changes when survey errors and increased costs seem imminent. Daily predictions of response propensity for all active sampled cases are among the most important quantities for live monitoring of data collection outcomes, making sound predictions of these propensities essential for the success of RSD. Because it relies on real-time updates of prior beliefs about key design quantities, such as predicted response propensities, RSD stands to benefit from Bayesian approaches. However, empirical evidence of the merits of these approaches is lacking in the literature, and the derivation of informative prior distributions is required for these approaches to be effective. In this paper, we evaluate the ability of two approaches to deriving prior distributions for the coefficients defining daily response propensity models to improve predictions of daily response propensity in a real data collection employing RSD. The first approach involves analyses of historical data from the same survey, and the second approach involves literature review. We find that Bayesian methods based on these two approaches result in higher-quality predictions of response propensity than more standard approaches ignoring prior information. This is especially true during the early-to-middle periods of data collection, when survey managers using RSD often consider interventions.
{"title":"Deriving Priors for Bayesian Prediction of Daily Response Propensity in Responsive Survey Design: Historical Data Analysis Versus Literature Review.","authors":"Brady T West, James Wagner, Stephanie Coffey, Michael R Elliott","doi":"10.1093/jssam/smab036","DOIUrl":"https://doi.org/10.1093/jssam/smab036","url":null,"abstract":"<p><p>Responsive survey design (RSD) aims to increase the efficiency of survey data collection via live monitoring of paradata and the introduction of protocol changes when survey errors and increased costs seem imminent. Daily predictions of response propensity for all active sampled cases are among the most important quantities for live monitoring of data collection outcomes, making sound predictions of these propensities essential for the success of RSD. Because it relies on real-time updates of prior beliefs about key design quantities, such as predicted response propensities, RSD stands to benefit from Bayesian approaches. However, empirical evidence of the merits of these approaches is lacking in the literature, and the derivation of informative prior distributions is required for these approaches to be effective. In this paper, we evaluate the ability of two approaches to deriving prior distributions for the coefficients defining daily response propensity models to improve predictions of daily response propensity in a real data collection employing RSD. The first approach involves analyses of historical data from the same survey, and the second approach involves literature review. We find that Bayesian methods based on these two approaches result in higher-quality predictions of response propensity than more standard approaches ignoring prior information. This is especially true during the early-to-middle periods of data collection, when survey managers using RSD often consider interventions.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10080219/pdf/smab036.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9652642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Longitudinal surveys provide valuable data for tracking change in a cohort of individuals over time. Respondents are often asked to provide similar, if not the same, data at multiple time points. One could argue that this unnecessarily increases respondent burden, especially for information that does not change frequently. One way to reduce burden while still capturing up-to-date information may be to implement dependent interviewing (DI), where the respondent is provided information from the last data collection to aid in answering the current survey. If the information is still correct, then no change is needed, but if incorrect, the respondent has the option to change the response. To test this, we implemented two different versions of DI in a self-administered web survey and compared these against a traditional version of the web survey. We examined respondent burden by analyzing timing data and respondent enjoyment by analyzing debriefing questions. To assess the success of the implementation, we looked at timing data and undesirable behavior (missing data and backtracking). Finally, to evaluate measurement error, we looked at the number of meaningful changes. We found that DI is faster, more enjoyable, easily executed by the respondent (more so in one of our experimental formats), and significant measurement error was not introduced. In addition, DI provided consistency in the data, minimizing the noise introduced by nonmeaningful changes. The findings have significant implications for implementing DI in self-administered modes without an interviewer present.
{"title":"Reducing Burden in a Web Survey through Dependent Interviewing","authors":"Curtiss Engstrom, J. Sinibaldi","doi":"10.1093/jssam/smad006","DOIUrl":"https://doi.org/10.1093/jssam/smad006","url":null,"abstract":"\u0000 Longitudinal surveys provide valuable data for tracking change in a cohort of individuals over time. Respondents are often asked to provide similar, if not the same, data at multiple time points. One could argue that this unnecessarily increases respondent burden, especially for information that does not change frequently. One way to reduce burden while still capturing up-to-date information may be to implement dependent interviewing (DI), where the respondent is provided information from the last data collection to aid in answering the current survey. If the information is still correct, then no change is needed, but if incorrect, the respondent has the option to change the response. To test this, we implemented two different versions of DI in a self-administered web survey and compared these against a traditional version of the web survey. We examined respondent burden by analyzing timing data and respondent enjoyment by analyzing debriefing questions. To assess the success of the implementation, we looked at timing data and undesirable behavior (missing data and backtracking). Finally, to evaluate measurement error, we looked at the number of meaningful changes. We found that DI is faster, more enjoyable, easily executed by the respondent (more so in one of our experimental formats), and significant measurement error was not introduced. In addition, DI provided consistency in the data, minimizing the noise introduced by nonmeaningful changes. The findings have significant implications for implementing DI in self-administered modes without an interviewer present.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44304074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linkage errors in probabilistically matched data sets can cause biases in the estimation of regression coefficients. This article proposes an approach to obtain consistent estimates and valid inference that relies on instrumental variables. The novelty of the method is to show that instrumental variables arise naturally in the course of probabilistic record linkage thereby allowing for off-the-shelf implementation. Relative to existing approaches, the instrumental variable approach does not require integration of the record linkage and regression analysis steps, the estimation of complex models of linkage error, or computationally expensive methods to estimate standard errors. The instrumental variables approach performs well in Monte Carlo simulations of an environment highlighting a many-to-one linkage problem.
{"title":"Implicates as Instrumental Variables: An Approach for Estimation and Inference with Probabilistically Matched Data","authors":"Dhiren Patki, M. Shapiro","doi":"10.1093/jssam/smad005","DOIUrl":"https://doi.org/10.1093/jssam/smad005","url":null,"abstract":"\u0000 Linkage errors in probabilistically matched data sets can cause biases in the estimation of regression coefficients. This article proposes an approach to obtain consistent estimates and valid inference that relies on instrumental variables. The novelty of the method is to show that instrumental variables arise naturally in the course of probabilistic record linkage thereby allowing for off-the-shelf implementation. Relative to existing approaches, the instrumental variable approach does not require integration of the record linkage and regression analysis steps, the estimation of complex models of linkage error, or computationally expensive methods to estimate standard errors. The instrumental variables approach performs well in Monte Carlo simulations of an environment highlighting a many-to-one linkage problem.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46248838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The United Kingdom’s Living Costs and Food (LCF) Survey has a relatively small sample size but produces estimates which are widely used, notably as a key input to the calculation of weights for consumer price indices. There has been a recent call for the use of additional data sources to improve the estimates from the LCF. Since some LCF variables are shared with the much larger Labour Force Survey (LFS), we investigate combining data from these surveys using composite calibration to improve the precision of estimates from the LCF. We undertake model selection to choose a suitable set of common variables for the composite calibration using the effect on the estimated variances for national and regional totals of important LCF variables. The variances of estimates for common variables are reduced to around 5 percent of their original size. Variances of national estimates are reduced (across several quarters) by around 10 percent for expenditure and 25 percent for income; these are the variables of primary interest in the LCF. Reductions in the variances of regional estimates vary more but are mostly large when using common variables at the regional level in the composite calibration. The composite calibration also makes the LCF estimates for employment status almost consistent with the outputs of the LFS, which is an important property for users of the statistics. A novel alternative method for variance estimation, using stored information produced by the composite calibration, is also presented.
{"title":"Combining National Surveys with Composite Calibration to Improve the Precision of Estimates from the United Kingdom's Living Costs and Food Survey","authors":"T. Merkouris, Paul A. Smith, A. Fallows","doi":"10.1093/jssam/smad001","DOIUrl":"https://doi.org/10.1093/jssam/smad001","url":null,"abstract":"\u0000 The United Kingdom’s Living Costs and Food (LCF) Survey has a relatively small sample size but produces estimates which are widely used, notably as a key input to the calculation of weights for consumer price indices. There has been a recent call for the use of additional data sources to improve the estimates from the LCF. Since some LCF variables are shared with the much larger Labour Force Survey (LFS), we investigate combining data from these surveys using composite calibration to improve the precision of estimates from the LCF. We undertake model selection to choose a suitable set of common variables for the composite calibration using the effect on the estimated variances for national and regional totals of important LCF variables. The variances of estimates for common variables are reduced to around 5 percent of their original size. Variances of national estimates are reduced (across several quarters) by around 10 percent for expenditure and 25 percent for income; these are the variables of primary interest in the LCF. Reductions in the variances of regional estimates vary more but are mostly large when using common variables at the regional level in the composite calibration. The composite calibration also makes the LCF estimates for employment status almost consistent with the outputs of the LFS, which is an important property for users of the statistics. A novel alternative method for variance estimation, using stored information produced by the composite calibration, is also presented.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42475076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.
{"title":"Evaluating Data Fusion Methods to Improve Income Modeling","authors":"Jana Emmenegger, R. Münnich, Jannik Schaller","doi":"10.1093/jssam/smac033","DOIUrl":"https://doi.org/10.1093/jssam/smac033","url":null,"abstract":"\u0000 Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44968251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.
{"title":"Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure","authors":"Paul A Parker, Scott H Holan, Ryan Janicki","doi":"10.1093/jssam/smad002","DOIUrl":"https://doi.org/10.1093/jssam/smad002","url":null,"abstract":"Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136081685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online panel surveys are often criticized for their inability to cover the offline population, potentially resulting in coverage error. Previous research has demonstrated that non-internet users in fact differ from online individuals on several sociodemographic characteristics. In attempts to reduce coverage error due to missing the offline population, several probability-based online panels equip offline households with an internet connection and a simple computer or tablet. However, the question remains whether the recruitment of offline individuals for an online panel leads to substantial changes in survey estimates. That is, it is unclear whether estimates derived from the survey data are affected by the differences between the groups of online and offline individuals. Against this background, we investigate how the inclusion of the previously offline population into the German Internet Panel affects various survey estimates such as voting behavior and social engagement. Overall, we find little evidence for the claim that equipping otherwise offline individuals with online access affects the estimates derived from previously online individuals only.
{"title":"Equipping the Offline Population with Internet Access in an Online Panel: Does It Make a Difference?","authors":"Ruben L. Bach, Carina Cornesse, Jessica Daikeler","doi":"10.1093/jssam/smad003","DOIUrl":"https://doi.org/10.1093/jssam/smad003","url":null,"abstract":"\u0000 Online panel surveys are often criticized for their inability to cover the offline population, potentially resulting in coverage error. Previous research has demonstrated that non-internet users in fact differ from online individuals on several sociodemographic characteristics. In attempts to reduce coverage error due to missing the offline population, several probability-based online panels equip offline households with an internet connection and a simple computer or tablet. However, the question remains whether the recruitment of offline individuals for an online panel leads to substantial changes in survey estimates. That is, it is unclear whether estimates derived from the survey data are affected by the differences between the groups of online and offline individuals. Against this background, we investigate how the inclusion of the previously offline population into the German Internet Panel affects various survey estimates such as voting behavior and social engagement. Overall, we find little evidence for the claim that equipping otherwise offline individuals with online access affects the estimates derived from previously online individuals only.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45916954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-22eCollection Date: 2023-11-01DOI: 10.1093/jssam/smac041
Shiyu Zhang, Brady T West, James Wagner, Mick P Couper, Rebecca Gatward, William G Axinn
In push-to-web surveys that use postal mail to contact sampled cases, participation is contingent on the mail being opened and the survey invitations being delivered. The design of the mailings is crucial to the success of the survey. We address the question of how to design invitation mailings that can grab potential respondents' attention and sway them to be interested in the survey in a short window of time. In the household screening stage of a national survey, the American Family Health Study, we experimentally tested three mailing design techniques for recruiting respondents: (1) a visible cash incentive in the initial mailing, (2) a second incentive for initial nonrespondents, and (3) use of Priority Mail in the nonresponse follow-up mailing. We evaluated the three techniques' overall effects on response rates as well as how they differentially attracted respondents with different characteristics. We found that all three techniques were useful in increasing the screening response rates, but there was little evidence that they had differential effects on sample subgroups that could help to reduce nonresponse biases.
在使用邮政邮件联系抽样案例的推送到网络调查中,参与取决于邮件是否被打开和调查邀请是否被发送。邮件的设计对调查的成功至关重要。我们解决的问题是如何设计邀请邮件,可以抓住潜在受访者的注意力,并在短时间内影响他们对调查的兴趣。在美国家庭健康研究(American Family Health Study)的一项全国性调查的家庭筛选阶段,我们实验测试了三种招募受访者的邮件设计技术:(1)在初始邮件中使用可见的现金激励,(2)对初始非受访者的第二次激励,以及(3)在无回应的后续邮件中使用优先邮件。我们评估了这三种技术对回复率的总体影响,以及它们如何以不同的方式吸引不同特征的受访者。我们发现这三种技术在提高筛查反应率方面都是有用的,但几乎没有证据表明它们对样本亚组有不同的影响,可以帮助减少无反应偏差。
{"title":"Visible Cash, a Second Incentive, and Priority Mail? An Experimental Evaluation of Mailing Strategies for a Screening Questionnaire in a National Push-to-Web/Mail Survey.","authors":"Shiyu Zhang, Brady T West, James Wagner, Mick P Couper, Rebecca Gatward, William G Axinn","doi":"10.1093/jssam/smac041","DOIUrl":"10.1093/jssam/smac041","url":null,"abstract":"<p><p>In push-to-web surveys that use postal mail to contact sampled cases, participation is contingent on the mail being opened and the survey invitations being delivered. The design of the mailings is crucial to the success of the survey. We address the question of how to design invitation mailings that can grab potential respondents' attention and sway them to be interested in the survey in a short window of time. In the household screening stage of a national survey, the American Family Health Study, we experimentally tested three mailing design techniques for recruiting respondents: (1) a visible cash incentive in the initial mailing, (2) a second incentive for initial nonrespondents, and (3) use of Priority Mail in the nonresponse follow-up mailing. We evaluated the three techniques' overall effects on response rates as well as how they differentially attracted respondents with different characteristics. We found that all three techniques were useful in increasing the screening response rates, but there was little evidence that they had differential effects on sample subgroups that could help to reduce nonresponse biases.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646700/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43534591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The inverse probability weighting (IPW) method is commonly used to deal with missing-at-random outcome (response) data collected by surveys with complex sampling designs. However, IPW methods generally assume that fully observed predictor variables are available for all sampled units, and it is unclear how to appropriately implement these methods when one or more independent variables are subject to missing values. Multiple imputation (MI) methods are well suited for a variety of missingness patterns but are not as easily adapted to complex sampling designs. In this case study, we consider the National Survey of Morbidity and Risk Factors (EMENO), a multistage probability sample survey. To understand the strengths and limitations of using either missing data treatment method for the EMENO, we present an extensive simulation study modeled on the EMENO health survey, with the target analysis being the estimation of population prevalence of hypertension as well as the association between hypertension and income. Both variables are subject to missingness. We test a variety of IPW and MI methods in simulation and on empirical data from the survey, assessing robustness by varying missingness mechanisms, proportions of missingness, and strengths of fitted response propensity models.
{"title":"Handling Missing Values in Surveys With Complex Study Design: A Simulation Study","authors":"N. Kalpourtzi, James R. Carpenter, G. Touloumi","doi":"10.1093/jssam/smac039","DOIUrl":"https://doi.org/10.1093/jssam/smac039","url":null,"abstract":"\u0000 The inverse probability weighting (IPW) method is commonly used to deal with missing-at-random outcome (response) data collected by surveys with complex sampling designs. However, IPW methods generally assume that fully observed predictor variables are available for all sampled units, and it is unclear how to appropriately implement these methods when one or more independent variables are subject to missing values. Multiple imputation (MI) methods are well suited for a variety of missingness patterns but are not as easily adapted to complex sampling designs. In this case study, we consider the National Survey of Morbidity and Risk Factors (EMENO), a multistage probability sample survey. To understand the strengths and limitations of using either missing data treatment method for the EMENO, we present an extensive simulation study modeled on the EMENO health survey, with the target analysis being the estimation of population prevalence of hypertension as well as the association between hypertension and income. Both variables are subject to missingness. We test a variety of IPW and MI methods in simulation and on empirical data from the survey, assessing robustness by varying missingness mechanisms, proportions of missingness, and strengths of fitted response propensity models.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48401788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}