首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
Survey Consent to Administrative Data Linkage: Five Experiments on Wording and Format 调查同意行政数据联动:五项措辞与格式实验
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-27 DOI: 10.1093/jssam/smad019
A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach
To maximize the value of the data while minimizing respondent burden, survey data are increasingly linked to administrative records. Record linkage often requires the informed consent of survey respondents and failure to obtain consent reduces sample size and may lead to selection bias. Relatively little is known about how best to word and format consent requests in surveys. We conducted a series of experiments in a probability household panel and an online access panel to understand how various features of the design of the consent request can affect informed consent. We experimentally varied: (i) the readability of the consent request, (ii) placement of the consent request in the survey, (iii) consent as default versus the standard opt-in consent question, (iv) offering additional information, and (v) a priming treatment focusing on trust in the data holder. For each experiment, we examine the effects of the treatments on consent rates, objective understanding of the consent request (measured with knowledge test questions), subjective understanding (how well the respondent felt they understood the request), confidence in their decision, response times, and whether they read any of the additional information materials. We find that the default wording and offering additional information do not increase consent rates. Improving the readability of the consent question increases objective understanding but does not increase the consent rate. However, asking for consent early in the survey and priming respondents to consider their trust in the administrative data holder both increase consent rates without negatively affecting understanding of the request.
为了最大限度地提高数据的价值,同时最大限度地减少受访者的负担,调查数据越来越多地与行政记录挂钩。记录联系通常需要调查受访者的知情同意,未能获得同意会减少样本量,并可能导致选择偏差。对于如何在调查中最好地表达和格式化同意请求,人们知之甚少。我们在概率家庭小组和在线访问小组中进行了一系列实验,以了解同意请求设计的各种特征如何影响知情同意。我们通过实验改变了:(i)同意请求的可读性,(ii)同意请求在调查中的位置,(iii)默认同意与标准的选择加入同意问题,(iv)提供额外信息,以及(v)专注于对数据持有者的信任的启动处理。对于每个实验,我们检查了治疗对同意率、对同意请求的客观理解(用知识测试题衡量)、主观理解(受访者认为他们理解请求的程度)、对自己决定的信心、回答时间以及他们是否阅读了任何附加信息材料的影响。我们发现,默认的措辞和提供额外的信息并不会提高同意率。提高同意问题的可读性可以增加客观理解,但不会增加同意率。然而,在调查的早期征求同意,并促使受访者考虑他们对行政数据持有者的信任,都会提高同意率,而不会对对请求的理解产生负面影响。
{"title":"Survey Consent to Administrative Data Linkage: Five Experiments on Wording and Format","authors":"A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach","doi":"10.1093/jssam/smad019","DOIUrl":"https://doi.org/10.1093/jssam/smad019","url":null,"abstract":"\u0000 To maximize the value of the data while minimizing respondent burden, survey data are increasingly linked to administrative records. Record linkage often requires the informed consent of survey respondents and failure to obtain consent reduces sample size and may lead to selection bias. Relatively little is known about how best to word and format consent requests in surveys. We conducted a series of experiments in a probability household panel and an online access panel to understand how various features of the design of the consent request can affect informed consent. We experimentally varied: (i) the readability of the consent request, (ii) placement of the consent request in the survey, (iii) consent as default versus the standard opt-in consent question, (iv) offering additional information, and (v) a priming treatment focusing on trust in the data holder. For each experiment, we examine the effects of the treatments on consent rates, objective understanding of the consent request (measured with knowledge test questions), subjective understanding (how well the respondent felt they understood the request), confidence in their decision, response times, and whether they read any of the additional information materials. We find that the default wording and offering additional information do not increase consent rates. Improving the readability of the consent question increases objective understanding but does not increase the consent rate. However, asking for consent early in the survey and priming respondents to consider their trust in the administrative data holder both increase consent rates without negatively affecting understanding of the request.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44943344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pseudo-Bayesian Small-Area Estimation 伪贝叶斯小面积估计
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-19 DOI: 10.1093/jssam/smad012
G. Datta, Juhyung Lee, Jiacheng Li
In sample surveys, a subpopulation is referred to as a “small area” or “small domain” if it does not have a large enough sample that alone will yield an adequately accurate estimate of a characteristic. In small-area estimation, the sample size from various subpopulations is often too small to accurately estimate its mean, and so one borrows strength from similar subpopulations through an appropriate model based on relevant covariates. The empirical best linear unbiased prediction (EBLUP) method has been the dominant frequentist model-based approach in small-area estimation. This method relies on estimation of model parameters based on the marginal distribution of the data. As an alternative to this method, the observed best prediction (OBP) method estimates the parameters by minimizing an objective function that is implied by the total mean squared prediction error. We use this objective function in the Fay–Herriot model to construct a pseudo-posterior distribution for the model parameters under nearly noninformative priors for them. Data analysis and simulation show that the pseudo-Bayesian estimators (PBEs) compete favorably with the OBPs and EBLUPs. The PBE estimates are robust to mean misspecification and have good frequentist properties. Being Bayesian by construction, they automatically avoid negative estimates of standard errors, enjoy a dual justification, and provide an attractive alternative to practitioners.
在抽样调查中,如果一个亚群体没有足够大的样本来单独产生对某一特征的充分准确的估计,那么它就被称为“小区域”或“小领域”。在小区域估计中,来自不同亚种群的样本量往往太小,无法准确估计其平均值,因此可以通过基于相关协变量的适当模型从相似的亚种群中借鉴力量。经验最佳线性无偏预测(EBLUP)方法是小区域估计中基于频率模型的主要方法。该方法依赖于基于数据边缘分布的模型参数估计。作为该方法的替代方案,观测最佳预测(OBP)方法通过最小化由总均方预测误差隐含的目标函数来估计参数。我们在Fay-Herriot模型中使用这个目标函数来构造模型参数在接近无信息先验条件下的伪后验分布。数据分析和仿真表明,伪贝叶斯估计器(PBEs)与OBPs和eblps具有较好的竞争优势。PBE估计具有鲁棒性,不会导致错误规范,并且具有良好的频率特性。作为贝叶斯构造,它们自动避免了对标准误差的负估计,享有双重证明,并为从业者提供了一个有吸引力的替代方案。
{"title":"Pseudo-Bayesian Small-Area Estimation","authors":"G. Datta, Juhyung Lee, Jiacheng Li","doi":"10.1093/jssam/smad012","DOIUrl":"https://doi.org/10.1093/jssam/smad012","url":null,"abstract":"\u0000 In sample surveys, a subpopulation is referred to as a “small area” or “small domain” if it does not have a large enough sample that alone will yield an adequately accurate estimate of a characteristic. In small-area estimation, the sample size from various subpopulations is often too small to accurately estimate its mean, and so one borrows strength from similar subpopulations through an appropriate model based on relevant covariates. The empirical best linear unbiased prediction (EBLUP) method has been the dominant frequentist model-based approach in small-area estimation. This method relies on estimation of model parameters based on the marginal distribution of the data. As an alternative to this method, the observed best prediction (OBP) method estimates the parameters by minimizing an objective function that is implied by the total mean squared prediction error. We use this objective function in the Fay–Herriot model to construct a pseudo-posterior distribution for the model parameters under nearly noninformative priors for them. Data analysis and simulation show that the pseudo-Bayesian estimators (PBEs) compete favorably with the OBPs and EBLUPs. The PBE estimates are robust to mean misspecification and have good frequentist properties. Being Bayesian by construction, they automatically avoid negative estimates of standard errors, enjoy a dual justification, and provide an attractive alternative to practitioners.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48464247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Entropy Design by a Markov Chain Process 马尔可夫链过程的最大熵设计
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-14 DOI: 10.1093/jssam/smad010
Yves Tillé, Bardia Panahbehagh
Abstract In this article, we study an implementation of maximum entropy (ME) design utilizing a Markov chain. This design, which is also called the conditional Poisson sampling design, is difficult to implement. We first present a new method for calculating the weights associated with conditional Poisson sampling. Then, we study a very simple method of random exchanges of units, which allows switching from one sample to another. This exchange system defines an irreducible and aperiodic Markov chain whose ME design is the stationary distribution. The design can be implemented without enumerating all possible samples. By repeating the exchange process a large number of times, it is possible to select a sample that respects the design. The process is simple to implement, and its convergence rate has been investigated theoretically and by simulation, which led to promising results.
摘要本文研究了一种利用马尔可夫链实现最大熵设计的方法。这种设计,也被称为条件泊松抽样设计,很难实现。我们首先提出了一种计算条件泊松抽样相关权值的新方法。然后,我们研究了一种非常简单的单位随机交换方法,它允许从一个样本切换到另一个样本。该交换系统定义了一个不可约的非周期马尔可夫链,其ME设计为平稳分布。该设计可以在不列举所有可能的样本的情况下实现。通过多次重复交换过程,可以选择符合设计的样品。该方法实现简单,并对其收敛速度进行了理论和仿真研究,结果令人满意。
{"title":"Maximum Entropy Design by a Markov Chain Process","authors":"Yves Tillé, Bardia Panahbehagh","doi":"10.1093/jssam/smad010","DOIUrl":"https://doi.org/10.1093/jssam/smad010","url":null,"abstract":"Abstract In this article, we study an implementation of maximum entropy (ME) design utilizing a Markov chain. This design, which is also called the conditional Poisson sampling design, is difficult to implement. We first present a new method for calculating the weights associated with conditional Poisson sampling. Then, we study a very simple method of random exchanges of units, which allows switching from one sample to another. This exchange system defines an irreducible and aperiodic Markov chain whose ME design is the stationary distribution. The design can be implemented without enumerating all possible samples. By repeating the exchange process a large number of times, it is possible to select a sample that respects the design. The process is simple to implement, and its convergence rate has been investigated theoretically and by simulation, which led to promising results.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135961420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comprehensive Overview of Unit-Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling 信息抽样下小面积估算调查数据的单位级建模综述
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-14 DOI: 10.1093/jssam/smad020
Paul A Parker, Ryan Janicki, Scott H Holan
Abstract Model-based small area estimation is frequently used in conjunction with survey data to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models, literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This article provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches.
基于模型的小区域估计经常与调查数据结合使用,以建立对欠采样或未采样地理区域的估计。可以在区域级或单元级指定这些模型,但是单元级模型通常提供潜在的优势,例如更精确的估计和容易的空间聚合。然而,相对于区域级模型,单位级模型的文献较少流行。在单位一级对小区域进行建模时,由于使用了收集调查数据的信息抽样机制,经常会出现挑战。这篇文章提供了一个全面的方法审查下的信息抽样单位级模型,重点是贝叶斯方法。
{"title":"A Comprehensive Overview of Unit-Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad020","DOIUrl":"https://doi.org/10.1093/jssam/smad020","url":null,"abstract":"Abstract Model-based small area estimation is frequently used in conjunction with survey data to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models, literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This article provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135860164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Comparison of Unit-Level Small Area Estimation Modeling Approaches for Survey Data Under Informative Sampling 信息抽样下调查数据单位级小面积估算建模方法比较
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-14 DOI: 10.1093/jssam/smad022
Paul A Parker, Ryan Janicki, Scott H Holan
Abstract Unit-level modeling strategies offer many advantages relative to the area-level models that are most often used in the context of small area estimation. For example, unit-level models aggregate naturally, allowing for estimates at any desired resolution, and also offer greater precision in many cases. We compare a variety of the methods available in the literature related to unit-level modeling for small area estimation. Specifically, to provide insight into the differences between methods, we conduct a simulation study that compares several of the general approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey.
单元级建模策略相对于最常用于小面积估计的区域级模型具有许多优点。例如,单元级模型自然地聚集,允许在任何期望的分辨率下进行估计,并且在许多情况下也提供更高的精度。我们比较了文献中与小面积估计的单元级建模相关的各种方法。具体来说,为了深入了解方法之间的差异,我们进行了一项模拟研究,比较了几种一般方法。此外,通过在美国社区调查中的应用进一步说明了用于模拟的方法。
{"title":"Comparison of Unit-Level Small Area Estimation Modeling Approaches for Survey Data Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad022","DOIUrl":"https://doi.org/10.1093/jssam/smad022","url":null,"abstract":"Abstract Unit-level modeling strategies offer many advantages relative to the area-level models that are most often used in the context of small area estimation. For example, unit-level models aggregate naturally, allowing for estimates at any desired resolution, and also offer greater precision in many cases. We compare a variety of the methods available in the literature related to unit-level modeling for small area estimation. Specifically, to provide insight into the differences between methods, we conduct a simulation study that compares several of the general approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135859749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error 利用多来源大数据的预测建模,提高样本效率,减少调查无响应误差
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-10 DOI: 10.1093/jssam/smad016
David Dutwin, Patrick Coyle, I. Bilgen, N. English
Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.
大数据作为调查数据的补充,有时作为替代,在最好的情况下,作为“力量倍增器”来提高调查分析和洞察力。我们详细介绍了一个用例,即大数据分类器(BDC),以替代更传统的针对特定家庭和个人属性进行调查抽样的目标家庭方法。与地理定位和商业供应商标志的使用非常相似,我们详细描述了bdc预测任何给定家庭(例如,家庭中有孩子或西班牙裔人)的可能性的能力。我们专门构建了15个bdc,结合了来自全国代表性的基于概率的大型面板的数据和来自公共和私人来源的一系列大数据,然后评估了这些bdc在三个大型调查数据集中成功预测其预测属性范围的有效性。对于每个BDC和每个数据应用程序,我们将BDC与地理聚类和供应商标志的历史样本目标技术的相对有效性进行了比较。总体而言,bdc在针对亚群体的能力方面略有提高。我们发现预测的类别始终更有效,而其他bdc与供应商标记相当,尽管总是优于地理聚类。我们提出了一些bdc的相对优势和劣势,作为识别和随后采样低发病率和其他人群的新方法。
{"title":"Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error","authors":"David Dutwin, Patrick Coyle, I. Bilgen, N. English","doi":"10.1093/jssam/smad016","DOIUrl":"https://doi.org/10.1093/jssam/smad016","url":null,"abstract":"\u0000 Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45867293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Primer on the Data Cleaning Pipeline 数据清洗管道入门
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-05-31 DOI: 10.1093/jssam/smad017
Rebecca C Steorts
Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, have also grown. Specifically, the science of the “data cleaning pipeline” contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on “cleaned data.” This article provides a review of this emerging field, introducing technical terminology and commonly used methods.
结构化和非结构化数据库的可用性,如电子健康数据、社交媒体数据、专利数据和经常实时更新的调查等,在过去十年中迅速增长。随着这种扩展,围绕数据集成(或者说合并多个数据源)的统计和方法问题也在增加。具体来说,“数据清理管道”包含四个阶段,允许分析人员执行下游任务、预测分析或对“已清理数据”进行统计分析。本文综述了这一新兴领域,介绍了技术术语和常用方法。
{"title":"A Primer on the Data Cleaning Pipeline","authors":"Rebecca C Steorts","doi":"10.1093/jssam/smad017","DOIUrl":"https://doi.org/10.1093/jssam/smad017","url":null,"abstract":"Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, have also grown. Specifically, the science of the “data cleaning pipeline” contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on “cleaned data.” This article provides a review of this emerging field, introducing technical terminology and commonly used methods.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135194364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Correction to: Improving Statistical Matching when Auxiliary Information is Available 修正:当辅助信息可用时,改进统计匹配
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-05-30 DOI: 10.1093/jssam/smad023
{"title":"Correction to: Improving Statistical Matching when Auxiliary Information is Available","authors":"","doi":"10.1093/jssam/smad023","DOIUrl":"https://doi.org/10.1093/jssam/smad023","url":null,"abstract":"","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135540950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is there a Day of the Week Effect on Panel Response Rate to an Online Questionnaire Email Invitation? 对在线问卷邮件邀请的小组回复率是否有一周中的某一天的影响?
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-05-26 DOI: 10.1093/jssam/smad014
Chloe Howard, Lara M. Greaves, D. Osborne, C. Sibley
Does the day of the week an email is sent inviting existing participants to complete a follow-up questionnaire for an annual online survey impact response rate? We answer this question using a preregistered experiment conducted as part of an ongoing national probability panel study in New Zealand. Across 14 consecutive days, existing participants in a panel study were randomly allocated a day of the week to receive an email inviting them to complete the next wave of the questionnaire online (N = 26,126). Valid responses included questionnaires completed within 31 days of receiving the initial invitation. Results revealed that the day the invitation was sent did not affect the likelihood of responding. These results are reassuring for researchers conducting ongoing panel studies and suggest that, once participants have joined a panel, the day of the week they are contacted does not impact their likelihood of responding to subsequent waves.
是否在一周中的哪一天发送电子邮件,邀请现有参与者完成年度在线调查影响回复率的后续问卷?我们使用一项预先注册的实验来回答这个问题,该实验是新西兰正在进行的国家概率小组研究的一部分。在连续14天的时间里,一项小组研究的现有参与者被随机分配到一周中的一天,收到一封电子邮件,邀请他们在线完成下一波问卷(N = 26126)。有效答复包括在31日内完成的调查表 收到初始邀请的天数。结果显示,发出邀请的当天不会影响回复的可能性。这些结果让正在进行的小组研究的研究人员感到放心,并表明,一旦参与者加入小组,他们在一周中的哪一天联系不会影响他们对后续浪潮做出反应的可能性。
{"title":"Is there a Day of the Week Effect on Panel Response Rate to an Online Questionnaire Email Invitation?","authors":"Chloe Howard, Lara M. Greaves, D. Osborne, C. Sibley","doi":"10.1093/jssam/smad014","DOIUrl":"https://doi.org/10.1093/jssam/smad014","url":null,"abstract":"\u0000 Does the day of the week an email is sent inviting existing participants to complete a follow-up questionnaire for an annual online survey impact response rate? We answer this question using a preregistered experiment conducted as part of an ongoing national probability panel study in New Zealand. Across 14 consecutive days, existing participants in a panel study were randomly allocated a day of the week to receive an email inviting them to complete the next wave of the questionnaire online (N = 26,126). Valid responses included questionnaires completed within 31 days of receiving the initial invitation. Results revealed that the day the invitation was sent did not affect the likelihood of responding. These results are reassuring for researchers conducting ongoing panel studies and suggest that, once participants have joined a panel, the day of the week they are contacted does not impact their likelihood of responding to subsequent waves.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46873457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interviewer Involvement in Respondent Selection Moderates the Relationship between Response Rates and Sample Bias in Cross-National Survey Projects in Europe 访谈者参与受访者选择调节回复率和样本偏差在欧洲跨国调查项目之间的关系
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-05-18 DOI: 10.1093/jssam/smad013
M. Kołczyńska, P. Jabkowski, S. Eckman
Survey researchers and practitioners often assume that higher response rates are associated with a higher quality of survey data. However, the evidence for this claim in face-to-face surveys is mixed. To explain these mixed results, recent studies have proposed that interviewers’ involvement in respondent selection moderates the effect of response rates on data quality. Previous analyses based on data from the European Social Survey found that response rates are positively associated with data quality when interviewer involvement in respondent selection is minimal. However, the association between response rates and data quality is negative when interviewers are more involved in respondent selection through household frame creation or within-household selection of target persons. These studies have hypothesized that some interviewers deviate from prescribed selection procedures to select individuals with higher response propensities, which increase response rates while reducing data quality. We replicate these results with an extended dataset, including more recent European Social Survey rounds and three other European survey projects: the European Quality of Life Survey, European Values Study, and International Social Survey Programme. Based on our results, we recommend that surveys include procedures to verify respondent-selection practices into their fieldwork control procedures.
调查研究人员和从业人员通常认为,较高的回复率与较高的调查数据质量有关。然而,在面对面的调查中,这种说法的证据是混杂的。为了解释这些混合的结果,最近的研究提出,采访者参与被调查者的选择调节了回复率对数据质量的影响。先前基于欧洲社会调查数据的分析发现,当采访者参与受访者选择时,回复率与数据质量呈正相关。然而,当采访者更多地通过家庭框架创建或家庭内目标人员的选择参与受访者选择时,回复率与数据质量之间的关联是负的。这些研究假设,一些采访者偏离了规定的选择程序,选择具有更高反应倾向的个人,这增加了回复率,同时降低了数据质量。我们用扩展的数据集复制了这些结果,包括最近的欧洲社会调查和其他三个欧洲调查项目:欧洲生活质量调查、欧洲价值观研究和国际社会调查计划。根据我们的研究结果,我们建议在调查中包括验证受访者选择实践的程序,以纳入其实地工作控制程序。
{"title":"Interviewer Involvement in Respondent Selection Moderates the Relationship between Response Rates and Sample Bias in Cross-National Survey Projects in Europe","authors":"M. Kołczyńska, P. Jabkowski, S. Eckman","doi":"10.1093/jssam/smad013","DOIUrl":"https://doi.org/10.1093/jssam/smad013","url":null,"abstract":"\u0000 Survey researchers and practitioners often assume that higher response rates are associated with a higher quality of survey data. However, the evidence for this claim in face-to-face surveys is mixed. To explain these mixed results, recent studies have proposed that interviewers’ involvement in respondent selection moderates the effect of response rates on data quality. Previous analyses based on data from the European Social Survey found that response rates are positively associated with data quality when interviewer involvement in respondent selection is minimal. However, the association between response rates and data quality is negative when interviewers are more involved in respondent selection through household frame creation or within-household selection of target persons. These studies have hypothesized that some interviewers deviate from prescribed selection procedures to select individuals with higher response propensities, which increase response rates while reducing data quality. We replicate these results with an extended dataset, including more recent European Social Survey rounds and three other European survey projects: the European Quality of Life Survey, European Values Study, and International Social Survey Programme. Based on our results, we recommend that surveys include procedures to verify respondent-selection practices into their fieldwork control procedures.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46265346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1