Multilevel models with nested errors are widely used in poverty estimation. An important application in this context is estimating the distribution of poverty as defined by the distribution of income within a set of domains that cover the population of interest. Since unit-level values of income are usually heteroskedastic, the standard homoskedasticity assumptions implicit in popular multilevel models may not be appropriate and can lead to bias, particularly when used to estimate domain-specific income distributions. This article addresses this problem when the income values in the population of interest can be characterized by a two-level mixed linear model with independent and identically distributed domain effects and with independent but not identically distributed individual effects. Estimation of poverty indicators that are functionals of domain-level income distributions is also addressed, and a nonparametric bootstrap procedure is used to estimate mean squared errors and confidence intervals. The proposed methodology is compared with the well-known World Bank poverty mapping methodology for this situation, using model-based simulation experiments as well as an empirical study based on Bangladesh poverty data.
{"title":"Small Area Poverty Estimation under Heteroskedasticity","authors":"Sumonkanti Das, Ray Chambers","doi":"10.1093/jssam/smad045","DOIUrl":"https://doi.org/10.1093/jssam/smad045","url":null,"abstract":"\u0000 Multilevel models with nested errors are widely used in poverty estimation. An important application in this context is estimating the distribution of poverty as defined by the distribution of income within a set of domains that cover the population of interest. Since unit-level values of income are usually heteroskedastic, the standard homoskedasticity assumptions implicit in popular multilevel models may not be appropriate and can lead to bias, particularly when used to estimate domain-specific income distributions. This article addresses this problem when the income values in the population of interest can be characterized by a two-level mixed linear model with independent and identically distributed domain effects and with independent but not identically distributed individual effects. Estimation of poverty indicators that are functionals of domain-level income distributions is also addressed, and a nonparametric bootstrap procedure is used to estimate mean squared errors and confidence intervals. The proposed methodology is compared with the well-known World Bank poverty mapping methodology for this situation, using model-based simulation experiments as well as an empirical study based on Bangladesh poverty data.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"50 9","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139441260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Whether respondents pay adequate attention to a questionnaire has long been of concern to survey researchers. In this study, we measure respondents’ attention with an instruction manipulation check. We investigate which respondents read question texts of experimentally varied lengths and which become inattentive in a probability-based online panel of the German population. We find that respondent attention is closely linked to text length. Individual response speed is strongly correlated with respondent attention, but a fixed cutoff time is unsuitable as a standalone attention indicator. Differing levels of attention are also associated with respondents’ age, gender, education, panel experience, and the device used to complete the survey. Removal of inattentive respondents is thus likely to result in a biased remaining sample. Instead, questions should be curtailed to encourage respondents of different backgrounds and abilities to read them attentively and provide optimized answers.
{"title":"Investigating Respondent Attention to Experimental Text Lengths","authors":"Tobias Rettig, A. Blom","doi":"10.1093/jssam/smad044","DOIUrl":"https://doi.org/10.1093/jssam/smad044","url":null,"abstract":"\u0000 Whether respondents pay adequate attention to a questionnaire has long been of concern to survey researchers. In this study, we measure respondents’ attention with an instruction manipulation check. We investigate which respondents read question texts of experimentally varied lengths and which become inattentive in a probability-based online panel of the German population. We find that respondent attention is closely linked to text length. Individual response speed is strongly correlated with respondent attention, but a fixed cutoff time is unsuitable as a standalone attention indicator. Differing levels of attention are also associated with respondents’ age, gender, education, panel experience, and the device used to complete the survey. Removal of inattentive respondents is thus likely to result in a biased remaining sample. Instead, questions should be curtailed to encourage respondents of different backgrounds and abilities to read them attentively and provide optimized answers.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"35 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139385171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article addresses the problems with the traditional reinterview approach to estimating the reliability of survey measures. Using data from three reinterview (or panel) studies conducted by the General Social Survey, we investigate the differences between the two-wave correlational approach embodied by the traditional reinterview strategy, compared to estimates of reliability that take the stability of traits into account based on a three-wave model. Our results indicate that the problems identified with the two-wave correlational approach reflect a kind of “Catch-22” in the sense that the only solution to the problem is denied by the approach itself. Specifically, we show that the correctly specified two-wave model, which includes the potential for true change in the latent variable, is underidentified, and thus, unless one is willing to make some potentially risky assumptions, reliability parameters are not estimable. This article compares the two-wave correlational approach to an alternative model for estimating reliability, Heise’s estimates based on the three-wave simplex model. Using three waves of data from the GSS panels, which were separated by 2-year intervals between waves, this article examines the conditions under which the wave-1, wave-2 correlations which do not take stability into account approximate the reliability estimate obtained from three-wave simplex models that do take stability into account. The results lead to the conclusion that the differences between estimates depend on the stability and/or fixed nature of the underlying processes involved. Few if any differences are identified when traits are fixed or highly stable, but for traits involving changes in the underlying traits the differences can be quite large, and thus, we argue for the superiority of reinterview designs that involve more than 2 waves in the estimation of reliability parameters.
{"title":"A Catch-22—the Test–Retest Method of Reliability Estimation","authors":"Paula A. Tufiș, D. Alwin, Daniel N Ramírez","doi":"10.1093/jssam/smad043","DOIUrl":"https://doi.org/10.1093/jssam/smad043","url":null,"abstract":"\u0000 This article addresses the problems with the traditional reinterview approach to estimating the reliability of survey measures. Using data from three reinterview (or panel) studies conducted by the General Social Survey, we investigate the differences between the two-wave correlational approach embodied by the traditional reinterview strategy, compared to estimates of reliability that take the stability of traits into account based on a three-wave model. Our results indicate that the problems identified with the two-wave correlational approach reflect a kind of “Catch-22” in the sense that the only solution to the problem is denied by the approach itself. Specifically, we show that the correctly specified two-wave model, which includes the potential for true change in the latent variable, is underidentified, and thus, unless one is willing to make some potentially risky assumptions, reliability parameters are not estimable. This article compares the two-wave correlational approach to an alternative model for estimating reliability, Heise’s estimates based on the three-wave simplex model. Using three waves of data from the GSS panels, which were separated by 2-year intervals between waves, this article examines the conditions under which the wave-1, wave-2 correlations which do not take stability into account approximate the reliability estimate obtained from three-wave simplex models that do take stability into account. The results lead to the conclusion that the differences between estimates depend on the stability and/or fixed nature of the underlying processes involved. Few if any differences are identified when traits are fixed or highly stable, but for traits involving changes in the underlying traits the differences can be quite large, and thus, we argue for the superiority of reinterview designs that involve more than 2 waves in the estimation of reliability parameters.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"36 20","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138955719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Under an area-level random regression coefficient Poisson model, this article derives small area predictors of counts and proportions and introduces bootstrap estimators of the mean squared errors (MSEs). The maximum likelihood estimators of the model parameters and the mode predictors of the random effects are calculated by a Laplace approximation algorithm. Simulation experiments are implemented to investigate the behavior of the fitting algorithm, the predictors, and the MSE estimators with and without bias correction. The new statistical methodology is applied to data from the Spanish Living Conditions Survey. The target is to estimate the proportions of women and men under the poverty line by province.
{"title":"Poverty Mapping Under Area-Level Random Regression Coefficient Poisson Models","authors":"Naomi Diz-Rosales, M. Lombardía, Domingo Morales","doi":"10.1093/jssam/smad036","DOIUrl":"https://doi.org/10.1093/jssam/smad036","url":null,"abstract":"Under an area-level random regression coefficient Poisson model, this article derives small area predictors of counts and proportions and introduces bootstrap estimators of the mean squared errors (MSEs). The maximum likelihood estimators of the model parameters and the mode predictors of the random effects are calculated by a Laplace approximation algorithm. Simulation experiments are implemented to investigate the behavior of the fitting algorithm, the predictors, and the MSE estimators with and without bias correction. The new statistical methodology is applied to data from the Spanish Living Conditions Survey. The target is to estimate the proportions of women and men under the poverty line by province.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"23 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139214258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ipek Bilgen, David Dutwin, Roopam Singh, Erlina Hendarwan
Abstract Recent studies consistently showed that making cash visible with a windowed envelope during mail contact increases response rates in surveys. The visible cash aims to pique interest and encourage sampled households to open the envelope. This article extends prior research by examining the effect of additional interventions implemented during mail recruitment to a survey panel on recruitment rates and costs. Specifically, we implemented randomized experiments to examine size (small, large) and location (none, front, back) of the window displaying cash, combined with what part of the cash is shown through the window envelope (numeric amount, face/image), and various prepaid incentive amounts (two $1, one $2, one $5). We used the recruitment effort for NORC’s AmeriSpeak Panel as the data source for this study. The probability-based AmeriSpeak Panel uses an address-based sample and multiple modes of respondent contact, including mail, phone, and in-person outreach during recruitment. Our results were consistent with prior research and showed significant improvement in recruitment rates when cash was displayed through a window during mail contact. We also found that placing the window on the front of the envelope, showing $5 through the envelope compared to $2 and $1, and showing the tender amount compared to the image on the cash through the window were more likely to improve the recruitment rates. Our cost analyses illustrated that the cost difference in printing window versus no window envelope is small. There is no difference in printing cost between front window and back window as they both require custom manufacturing. There is also no cost difference in printing envelopes with small windows versus large windows. Lastly, we found no evidence of mail theft based on our review of the United States Postal Service’s “track and trace” reports, seed mailings sent to staff, and undeliverable mailing rates.
{"title":"Peekaboo! The Effect of Different Visible Cash Display and Amount Options During Mail Contact When Recruiting to a Probability-Based Panel","authors":"Ipek Bilgen, David Dutwin, Roopam Singh, Erlina Hendarwan","doi":"10.1093/jssam/smad039","DOIUrl":"https://doi.org/10.1093/jssam/smad039","url":null,"abstract":"Abstract Recent studies consistently showed that making cash visible with a windowed envelope during mail contact increases response rates in surveys. The visible cash aims to pique interest and encourage sampled households to open the envelope. This article extends prior research by examining the effect of additional interventions implemented during mail recruitment to a survey panel on recruitment rates and costs. Specifically, we implemented randomized experiments to examine size (small, large) and location (none, front, back) of the window displaying cash, combined with what part of the cash is shown through the window envelope (numeric amount, face/image), and various prepaid incentive amounts (two $1, one $2, one $5). We used the recruitment effort for NORC’s AmeriSpeak Panel as the data source for this study. The probability-based AmeriSpeak Panel uses an address-based sample and multiple modes of respondent contact, including mail, phone, and in-person outreach during recruitment. Our results were consistent with prior research and showed significant improvement in recruitment rates when cash was displayed through a window during mail contact. We also found that placing the window on the front of the envelope, showing $5 through the envelope compared to $2 and $1, and showing the tender amount compared to the image on the cash through the window were more likely to improve the recruitment rates. Our cost analyses illustrated that the cost difference in printing window versus no window envelope is small. There is no difference in printing cost between front window and back window as they both require custom manufacturing. There is also no cost difference in printing envelopes with small windows versus large windows. Lastly, we found no evidence of mail theft based on our review of the United States Postal Service’s “track and trace” reports, seed mailings sent to staff, and undeliverable mailing rates.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Correcting Selection Bias in Big Data by Pseudo-Weighting","authors":"","doi":"10.1093/jssam/smad042","DOIUrl":"https://doi.org/10.1093/jssam/smad042","url":null,"abstract":"","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01Epub Date: 2022-02-01DOI: 10.1093/jssam/smab042
Katherine A McGonagle, Narayan Sastry, Vicki A Freedman
Adaptive survey designs are increasingly used by survey practitioners to counteract ongoing declines in household survey response rates and manage rising fieldwork costs. This paper reports findings from an evaluation of an early-bird incentive (EBI) experiment targeting high-effort respondents who participate in the 2019 wave of the US Panel Study of Income Dynamics. We identified a subgroup of high-effort respondents at risk of nonresponse based on their prior wave fieldwork effort and randomized them to a treatment offering an extra time-delimited monetary incentive for completing their interview within the first month of data collection (treatment group; N = 800) or the standard study incentive (control group; N = 400). In recent waves, we have found that the costs of the protracted fieldwork needed to complete interviews with high-effort cases in the form of interviewer contact attempts plus an increased incentive near the close of data collection are extremely high. By incentivizing early participation and reducing the number of interviewer contact attempts and fieldwork days to complete the interview, our goal was to manage both nonresponse and survey costs. We found that the EBI treatment increased response rates and reduced fieldwork effort and costs compared to a control group. We review several key findings and limitations, discuss their implications, and identify the next steps for future research.
调查从业者越来越多地采用适应性调查设计来应对住户调查响应率的持续下降,并管理不断上升的实地调查成本。本文报告了针对参加 2019 年美国收入动态面板研究(US Panel Study of Income Dynamics)的高努力程度受访者的 "早鸟激励"(EBI)实验的评估结果。我们根据高努力受访者在前一轮实地调查中的努力程度,确定了一个有可能不作回应的高努力受访者子群,并将他们随机分配到一个提供额外限时货币奖励的处理组(处理组;N = 800)或标准研究奖励组(对照组;N = 400),以便在数据收集的第一个月内完成访谈。在最近的几次调查中,我们发现,为了完成对高难度个案的访谈,访谈者需要多次尝试与他们接触,再加上在数据收集工作接近尾声时增加奖励,这样的长期实地工作成本非常高。我们的目标是通过激励早期参与,减少访问员接触的次数和完成访谈所需的实地工作天数,来控制非响应率和调查成本。我们发现,与对照组相比,EBI 方法提高了回复率,减少了实地工作和成本。我们回顾了几个主要发现和局限性,讨论了它们的影响,并确定了未来研究的下一步。
{"title":"THE EFFECTS OF A TARGETED \"EARLY BIRD\" INCENTIVE STRATEGY ON RESPONSE RATES, FIELDWORK EFFORT, AND COSTS IN A NATIONAL PANEL STUDY.","authors":"Katherine A McGonagle, Narayan Sastry, Vicki A Freedman","doi":"10.1093/jssam/smab042","DOIUrl":"10.1093/jssam/smab042","url":null,"abstract":"<p><p>Adaptive survey designs are increasingly used by survey practitioners to counteract ongoing declines in household survey response rates and manage rising fieldwork costs. This paper reports findings from an evaluation of an early-bird incentive (EBI) experiment targeting high-effort respondents who participate in the 2019 wave of the US Panel Study of Income Dynamics. We identified a subgroup of high-effort respondents at risk of nonresponse based on their prior wave fieldwork effort and randomized them to a treatment offering an extra time-delimited monetary incentive for completing their interview within the first month of data collection (treatment group; <i>N</i> = 800) or the standard study incentive (control group; <i>N</i> = 400). In recent waves, we have found that the costs of the protracted fieldwork needed to complete interviews with high-effort cases in the form of interviewer contact attempts plus an increased incentive near the close of data collection are extremely high. By incentivizing early participation and reducing the number of interviewer contact attempts and fieldwork days to complete the interview, our goal was to manage both nonresponse and survey costs. We found that the EBI treatment increased response rates and reduced fieldwork effort and costs compared to a control group. We review several key findings and limitations, discuss their implications, and identify the next steps for future research.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"11 5","pages":"1032-1053"},"PeriodicalIF":1.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10702785/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138801468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hafsteinn Einarsson, Alexandru Cernat, Natalie Shlomo
Abstract Cross-national surveys run the risk of differential survey errors, where data collected vary in quality from country to country. Responsive and adaptive survey designs (RASDs) have been proposed as a way to reduce survey errors, by leveraging auxiliary variables to inform fieldwork efforts, but have rarely been considered in the context of cross-national surveys. Using data from the European Social Survey, we simulate fieldwork in a repeated cross-national survey using RASD where fieldwork efforts are ended early for selected units in the final stage of data collection. Demographic variables, paradata (interviewer observations), and contact data are used to inform fieldwork efforts. Eight combinations of response propensity models and selection mechanisms are evaluated in terms of sample composition (as measured by the coefficient of variation of response propensities), response rates, number of contact attempts saved, and effects on estimates of target variables in the survey. We find that sample balance can be improved in many country-round combinations. Response rates can be increased marginally and targeting high propensity respondents could lead to significant cost savings associated with making fewer contact attempts. Estimates of target variables are not changed by the case prioritizations used in the simulations, indicating that they do not impact nonresponse bias. We conclude that RASDs should be considered in cross-national surveys, but that more work is needed to identify suitable covariates to inform fieldwork efforts.
{"title":"Responsive and Adaptive Designs in Repeated Cross-National Surveys: A Simulation Study","authors":"Hafsteinn Einarsson, Alexandru Cernat, Natalie Shlomo","doi":"10.1093/jssam/smad038","DOIUrl":"https://doi.org/10.1093/jssam/smad038","url":null,"abstract":"Abstract Cross-national surveys run the risk of differential survey errors, where data collected vary in quality from country to country. Responsive and adaptive survey designs (RASDs) have been proposed as a way to reduce survey errors, by leveraging auxiliary variables to inform fieldwork efforts, but have rarely been considered in the context of cross-national surveys. Using data from the European Social Survey, we simulate fieldwork in a repeated cross-national survey using RASD where fieldwork efforts are ended early for selected units in the final stage of data collection. Demographic variables, paradata (interviewer observations), and contact data are used to inform fieldwork efforts. Eight combinations of response propensity models and selection mechanisms are evaluated in terms of sample composition (as measured by the coefficient of variation of response propensities), response rates, number of contact attempts saved, and effects on estimates of target variables in the survey. We find that sample balance can be improved in many country-round combinations. Response rates can be increased marginally and targeting high propensity respondents could lead to significant cost savings associated with making fewer contact attempts. Estimates of target variables are not changed by the case prioritizations used in the simulations, indicating that they do not impact nonresponse bias. We conclude that RASDs should be considered in cross-national surveys, but that more work is needed to identify suitable covariates to inform fieldwork efforts.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136317165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Researchers are often unsure about the quality of the data collected by third-party actors, such as survey firms. This may be because of the inability to measure data quality effectively at scale and the difficulty with communicating which observations may be the source of measurement error. Researchers rely on survey firms to provide them with estimates of data quality and to identify observations that are problematic, potentially because they have been falsified or poorly collected. To address these issues, I propose the QualMix model, a mixture modeling approach to deriving estimates of survey data quality in situations in which two sets of responses exist for all or certain subsets of respondents. I apply this model to the context of survey reinterviews, a common form of data quality assessment used to detect falsification and data collection problems during enumeration. Through simulation based on real-world data, I demonstrate that the model successfully identifies incorrect observations and recovers latent enumerator and survey data quality. I further demonstrate the model’s utility by applying it to reinterview data from a large survey fielded in Malawi, using it to identify significant variation in data quality across observations generated by different enumerators.
{"title":"A Mixture Model Approach to Assessing Measurement Error in Surveys Using Reinterviews","authors":"Simon Hoellerbauer","doi":"10.1093/jssam/smad037","DOIUrl":"https://doi.org/10.1093/jssam/smad037","url":null,"abstract":"Abstract Researchers are often unsure about the quality of the data collected by third-party actors, such as survey firms. This may be because of the inability to measure data quality effectively at scale and the difficulty with communicating which observations may be the source of measurement error. Researchers rely on survey firms to provide them with estimates of data quality and to identify observations that are problematic, potentially because they have been falsified or poorly collected. To address these issues, I propose the QualMix model, a mixture modeling approach to deriving estimates of survey data quality in situations in which two sets of responses exist for all or certain subsets of respondents. I apply this model to the context of survey reinterviews, a common form of data quality assessment used to detect falsification and data collection problems during enumeration. Through simulation based on real-world data, I demonstrate that the model successfully identifies incorrect observations and recovers latent enumerator and survey data quality. I further demonstrate the model’s utility by applying it to reinterview data from a large survey fielded in Malawi, using it to identify significant variation in data quality across observations generated by different enumerators.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135739888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingyu Zhu, Laura J Gamble, Matthew Klapman, Lan Xue, Virginia M Lesser
Abstract While probability sampling has been considered the gold standard of survey methods, nonprobability sampling is increasingly popular due to its convenience and low cost. However, nonprobability samples can lead to biased estimates due to the unknown nature of the underlying selection mechanism. In this article, we propose parametric and semiparametric approaches to integrate probability and nonprobability samples using common ancillary variables observed in both samples. In the parametric approach, the joint distribution of ancillary variables is assumed to follow the latent Gaussian copula model, which is flexible to accommodate both categorical and continuous variables. In contrast, the semiparametric approach requires no assumptions about the distribution of ancillary variables. In addition, logistic regression is used to model the mechanism by which population units enter the nonprobability sample. The unknown parameters in the copula model are estimated through the pseudo maximum likelihood approach. The logistic regression model is estimated by maximizing the sample likelihood constructed from the nonprobability sample. The proposed method is evaluated in the context of estimating the population mean. Our simulation results show that the proposed method is able to correct the selection bias in the nonprobability sample by consistently estimating the underlying inclusion mechanism. By incorporating additional information in the nonprobability sample, the combined method can estimate the population mean more efficiently than using the probability sample alone. A real-data application is provided to illustrate the practical use of the proposed method.
{"title":"Using Auxiliary Information in Probability Survey Data to Improve Pseudo-Weighting in Nonprobability Samples: A Copula Model Approach","authors":"Tingyu Zhu, Laura J Gamble, Matthew Klapman, Lan Xue, Virginia M Lesser","doi":"10.1093/jssam/smad032","DOIUrl":"https://doi.org/10.1093/jssam/smad032","url":null,"abstract":"Abstract While probability sampling has been considered the gold standard of survey methods, nonprobability sampling is increasingly popular due to its convenience and low cost. However, nonprobability samples can lead to biased estimates due to the unknown nature of the underlying selection mechanism. In this article, we propose parametric and semiparametric approaches to integrate probability and nonprobability samples using common ancillary variables observed in both samples. In the parametric approach, the joint distribution of ancillary variables is assumed to follow the latent Gaussian copula model, which is flexible to accommodate both categorical and continuous variables. In contrast, the semiparametric approach requires no assumptions about the distribution of ancillary variables. In addition, logistic regression is used to model the mechanism by which population units enter the nonprobability sample. The unknown parameters in the copula model are estimated through the pseudo maximum likelihood approach. The logistic regression model is estimated by maximizing the sample likelihood constructed from the nonprobability sample. The proposed method is evaluated in the context of estimating the population mean. Our simulation results show that the proposed method is able to correct the selection bias in the nonprobability sample by consistently estimating the underlying inclusion mechanism. By incorporating additional information in the nonprobability sample, the combined method can estimate the population mean more efficiently than using the probability sample alone. A real-data application is provided to illustrate the practical use of the proposed method.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135825710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}