A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach
To maximize the value of the data while minimizing respondent burden, survey data are increasingly linked to administrative records. Record linkage often requires the informed consent of survey respondents and failure to obtain consent reduces sample size and may lead to selection bias. Relatively little is known about how best to word and format consent requests in surveys. We conducted a series of experiments in a probability household panel and an online access panel to understand how various features of the design of the consent request can affect informed consent. We experimentally varied: (i) the readability of the consent request, (ii) placement of the consent request in the survey, (iii) consent as default versus the standard opt-in consent question, (iv) offering additional information, and (v) a priming treatment focusing on trust in the data holder. For each experiment, we examine the effects of the treatments on consent rates, objective understanding of the consent request (measured with knowledge test questions), subjective understanding (how well the respondent felt they understood the request), confidence in their decision, response times, and whether they read any of the additional information materials. We find that the default wording and offering additional information do not increase consent rates. Improving the readability of the consent question increases objective understanding but does not increase the consent rate. However, asking for consent early in the survey and priming respondents to consider their trust in the administrative data holder both increase consent rates without negatively affecting understanding of the request.
{"title":"Survey Consent to Administrative Data Linkage: Five Experiments on Wording and Format","authors":"A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach","doi":"10.1093/jssam/smad019","DOIUrl":"https://doi.org/10.1093/jssam/smad019","url":null,"abstract":"\u0000 To maximize the value of the data while minimizing respondent burden, survey data are increasingly linked to administrative records. Record linkage often requires the informed consent of survey respondents and failure to obtain consent reduces sample size and may lead to selection bias. Relatively little is known about how best to word and format consent requests in surveys. We conducted a series of experiments in a probability household panel and an online access panel to understand how various features of the design of the consent request can affect informed consent. We experimentally varied: (i) the readability of the consent request, (ii) placement of the consent request in the survey, (iii) consent as default versus the standard opt-in consent question, (iv) offering additional information, and (v) a priming treatment focusing on trust in the data holder. For each experiment, we examine the effects of the treatments on consent rates, objective understanding of the consent request (measured with knowledge test questions), subjective understanding (how well the respondent felt they understood the request), confidence in their decision, response times, and whether they read any of the additional information materials. We find that the default wording and offering additional information do not increase consent rates. Improving the readability of the consent question increases objective understanding but does not increase the consent rate. However, asking for consent early in the survey and priming respondents to consider their trust in the administrative data holder both increase consent rates without negatively affecting understanding of the request.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44943344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In sample surveys, a subpopulation is referred to as a “small area” or “small domain” if it does not have a large enough sample that alone will yield an adequately accurate estimate of a characteristic. In small-area estimation, the sample size from various subpopulations is often too small to accurately estimate its mean, and so one borrows strength from similar subpopulations through an appropriate model based on relevant covariates. The empirical best linear unbiased prediction (EBLUP) method has been the dominant frequentist model-based approach in small-area estimation. This method relies on estimation of model parameters based on the marginal distribution of the data. As an alternative to this method, the observed best prediction (OBP) method estimates the parameters by minimizing an objective function that is implied by the total mean squared prediction error. We use this objective function in the Fay–Herriot model to construct a pseudo-posterior distribution for the model parameters under nearly noninformative priors for them. Data analysis and simulation show that the pseudo-Bayesian estimators (PBEs) compete favorably with the OBPs and EBLUPs. The PBE estimates are robust to mean misspecification and have good frequentist properties. Being Bayesian by construction, they automatically avoid negative estimates of standard errors, enjoy a dual justification, and provide an attractive alternative to practitioners.
{"title":"Pseudo-Bayesian Small-Area Estimation","authors":"G. Datta, Juhyung Lee, Jiacheng Li","doi":"10.1093/jssam/smad012","DOIUrl":"https://doi.org/10.1093/jssam/smad012","url":null,"abstract":"\u0000 In sample surveys, a subpopulation is referred to as a “small area” or “small domain” if it does not have a large enough sample that alone will yield an adequately accurate estimate of a characteristic. In small-area estimation, the sample size from various subpopulations is often too small to accurately estimate its mean, and so one borrows strength from similar subpopulations through an appropriate model based on relevant covariates. The empirical best linear unbiased prediction (EBLUP) method has been the dominant frequentist model-based approach in small-area estimation. This method relies on estimation of model parameters based on the marginal distribution of the data. As an alternative to this method, the observed best prediction (OBP) method estimates the parameters by minimizing an objective function that is implied by the total mean squared prediction error. We use this objective function in the Fay–Herriot model to construct a pseudo-posterior distribution for the model parameters under nearly noninformative priors for them. Data analysis and simulation show that the pseudo-Bayesian estimators (PBEs) compete favorably with the OBPs and EBLUPs. The PBE estimates are robust to mean misspecification and have good frequentist properties. Being Bayesian by construction, they automatically avoid negative estimates of standard errors, enjoy a dual justification, and provide an attractive alternative to practitioners.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48464247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract In this article, we study an implementation of maximum entropy (ME) design utilizing a Markov chain. This design, which is also called the conditional Poisson sampling design, is difficult to implement. We first present a new method for calculating the weights associated with conditional Poisson sampling. Then, we study a very simple method of random exchanges of units, which allows switching from one sample to another. This exchange system defines an irreducible and aperiodic Markov chain whose ME design is the stationary distribution. The design can be implemented without enumerating all possible samples. By repeating the exchange process a large number of times, it is possible to select a sample that respects the design. The process is simple to implement, and its convergence rate has been investigated theoretically and by simulation, which led to promising results.
{"title":"Maximum Entropy Design by a Markov Chain Process","authors":"Yves Tillé, Bardia Panahbehagh","doi":"10.1093/jssam/smad010","DOIUrl":"https://doi.org/10.1093/jssam/smad010","url":null,"abstract":"Abstract In this article, we study an implementation of maximum entropy (ME) design utilizing a Markov chain. This design, which is also called the conditional Poisson sampling design, is difficult to implement. We first present a new method for calculating the weights associated with conditional Poisson sampling. Then, we study a very simple method of random exchanges of units, which allows switching from one sample to another. This exchange system defines an irreducible and aperiodic Markov chain whose ME design is the stationary distribution. The design can be implemented without enumerating all possible samples. By repeating the exchange process a large number of times, it is possible to select a sample that respects the design. The process is simple to implement, and its convergence rate has been investigated theoretically and by simulation, which led to promising results.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135961420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Model-based small area estimation is frequently used in conjunction with survey data to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models, literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This article provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches.
{"title":"A Comprehensive Overview of Unit-Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad020","DOIUrl":"https://doi.org/10.1093/jssam/smad020","url":null,"abstract":"Abstract Model-based small area estimation is frequently used in conjunction with survey data to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models, literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This article provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135860164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Unit-level modeling strategies offer many advantages relative to the area-level models that are most often used in the context of small area estimation. For example, unit-level models aggregate naturally, allowing for estimates at any desired resolution, and also offer greater precision in many cases. We compare a variety of the methods available in the literature related to unit-level modeling for small area estimation. Specifically, to provide insight into the differences between methods, we conduct a simulation study that compares several of the general approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey.
{"title":"Comparison of Unit-Level Small Area Estimation Modeling Approaches for Survey Data Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad022","DOIUrl":"https://doi.org/10.1093/jssam/smad022","url":null,"abstract":"Abstract Unit-level modeling strategies offer many advantages relative to the area-level models that are most often used in the context of small area estimation. For example, unit-level models aggregate naturally, allowing for estimates at any desired resolution, and also offer greater precision in many cases. We compare a variety of the methods available in the literature related to unit-level modeling for small area estimation. Specifically, to provide insight into the differences between methods, we conduct a simulation study that compares several of the general approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135859749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Dutwin, Patrick Coyle, I. Bilgen, N. English
Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.
{"title":"Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error","authors":"David Dutwin, Patrick Coyle, I. Bilgen, N. English","doi":"10.1093/jssam/smad016","DOIUrl":"https://doi.org/10.1093/jssam/smad016","url":null,"abstract":"\u0000 Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45867293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, have also grown. Specifically, the science of the “data cleaning pipeline” contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on “cleaned data.” This article provides a review of this emerging field, introducing technical terminology and commonly used methods.
{"title":"A Primer on the Data Cleaning Pipeline","authors":"Rebecca C Steorts","doi":"10.1093/jssam/smad017","DOIUrl":"https://doi.org/10.1093/jssam/smad017","url":null,"abstract":"Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, have also grown. Specifically, the science of the “data cleaning pipeline” contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on “cleaned data.” This article provides a review of this emerging field, introducing technical terminology and commonly used methods.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135194364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Improving Statistical Matching when Auxiliary Information is Available","authors":"","doi":"10.1093/jssam/smad023","DOIUrl":"https://doi.org/10.1093/jssam/smad023","url":null,"abstract":"","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135540950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chloe Howard, Lara M. Greaves, D. Osborne, C. Sibley
Does the day of the week an email is sent inviting existing participants to complete a follow-up questionnaire for an annual online survey impact response rate? We answer this question using a preregistered experiment conducted as part of an ongoing national probability panel study in New Zealand. Across 14 consecutive days, existing participants in a panel study were randomly allocated a day of the week to receive an email inviting them to complete the next wave of the questionnaire online (N = 26,126). Valid responses included questionnaires completed within 31 days of receiving the initial invitation. Results revealed that the day the invitation was sent did not affect the likelihood of responding. These results are reassuring for researchers conducting ongoing panel studies and suggest that, once participants have joined a panel, the day of the week they are contacted does not impact their likelihood of responding to subsequent waves.
{"title":"Is there a Day of the Week Effect on Panel Response Rate to an Online Questionnaire Email Invitation?","authors":"Chloe Howard, Lara M. Greaves, D. Osborne, C. Sibley","doi":"10.1093/jssam/smad014","DOIUrl":"https://doi.org/10.1093/jssam/smad014","url":null,"abstract":"\u0000 Does the day of the week an email is sent inviting existing participants to complete a follow-up questionnaire for an annual online survey impact response rate? We answer this question using a preregistered experiment conducted as part of an ongoing national probability panel study in New Zealand. Across 14 consecutive days, existing participants in a panel study were randomly allocated a day of the week to receive an email inviting them to complete the next wave of the questionnaire online (N = 26,126). Valid responses included questionnaires completed within 31 days of receiving the initial invitation. Results revealed that the day the invitation was sent did not affect the likelihood of responding. These results are reassuring for researchers conducting ongoing panel studies and suggest that, once participants have joined a panel, the day of the week they are contacted does not impact their likelihood of responding to subsequent waves.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46873457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Survey researchers and practitioners often assume that higher response rates are associated with a higher quality of survey data. However, the evidence for this claim in face-to-face surveys is mixed. To explain these mixed results, recent studies have proposed that interviewers’ involvement in respondent selection moderates the effect of response rates on data quality. Previous analyses based on data from the European Social Survey found that response rates are positively associated with data quality when interviewer involvement in respondent selection is minimal. However, the association between response rates and data quality is negative when interviewers are more involved in respondent selection through household frame creation or within-household selection of target persons. These studies have hypothesized that some interviewers deviate from prescribed selection procedures to select individuals with higher response propensities, which increase response rates while reducing data quality. We replicate these results with an extended dataset, including more recent European Social Survey rounds and three other European survey projects: the European Quality of Life Survey, European Values Study, and International Social Survey Programme. Based on our results, we recommend that surveys include procedures to verify respondent-selection practices into their fieldwork control procedures.
{"title":"Interviewer Involvement in Respondent Selection Moderates the Relationship between Response Rates and Sample Bias in Cross-National Survey Projects in Europe","authors":"M. Kołczyńska, P. Jabkowski, S. Eckman","doi":"10.1093/jssam/smad013","DOIUrl":"https://doi.org/10.1093/jssam/smad013","url":null,"abstract":"\u0000 Survey researchers and practitioners often assume that higher response rates are associated with a higher quality of survey data. However, the evidence for this claim in face-to-face surveys is mixed. To explain these mixed results, recent studies have proposed that interviewers’ involvement in respondent selection moderates the effect of response rates on data quality. Previous analyses based on data from the European Social Survey found that response rates are positively associated with data quality when interviewer involvement in respondent selection is minimal. However, the association between response rates and data quality is negative when interviewers are more involved in respondent selection through household frame creation or within-household selection of target persons. These studies have hypothesized that some interviewers deviate from prescribed selection procedures to select individuals with higher response propensities, which increase response rates while reducing data quality. We replicate these results with an extended dataset, including more recent European Social Survey rounds and three other European survey projects: the European Quality of Life Survey, European Values Study, and International Social Survey Programme. Based on our results, we recommend that surveys include procedures to verify respondent-selection practices into their fieldwork control procedures.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46265346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}