首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
A Primer on the Data Cleaning Pipeline 数据清洗管道入门
4区 数学 Q1 Social Sciences Pub Date : 2023-05-31 DOI: 10.1093/jssam/smad017
Rebecca C Steorts
Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, have also grown. Specifically, the science of the “data cleaning pipeline” contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on “cleaned data.” This article provides a review of this emerging field, introducing technical terminology and commonly used methods.
结构化和非结构化数据库的可用性,如电子健康数据、社交媒体数据、专利数据和经常实时更新的调查等,在过去十年中迅速增长。随着这种扩展,围绕数据集成(或者说合并多个数据源)的统计和方法问题也在增加。具体来说,“数据清理管道”包含四个阶段,允许分析人员执行下游任务、预测分析或对“已清理数据”进行统计分析。本文综述了这一新兴领域,介绍了技术术语和常用方法。
{"title":"A Primer on the Data Cleaning Pipeline","authors":"Rebecca C Steorts","doi":"10.1093/jssam/smad017","DOIUrl":"https://doi.org/10.1093/jssam/smad017","url":null,"abstract":"Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, have also grown. Specifically, the science of the “data cleaning pipeline” contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on “cleaned data.” This article provides a review of this emerging field, introducing technical terminology and commonly used methods.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135194364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Correction to: Improving Statistical Matching when Auxiliary Information is Available 修正:当辅助信息可用时,改进统计匹配
4区 数学 Q1 Social Sciences Pub Date : 2023-05-30 DOI: 10.1093/jssam/smad023
{"title":"Correction to: Improving Statistical Matching when Auxiliary Information is Available","authors":"","doi":"10.1093/jssam/smad023","DOIUrl":"https://doi.org/10.1093/jssam/smad023","url":null,"abstract":"","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135540950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is there a Day of the Week Effect on Panel Response Rate to an Online Questionnaire Email Invitation? 对在线问卷邮件邀请的小组回复率是否有一周中的某一天的影响?
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-05-26 DOI: 10.1093/jssam/smad014
Chloe Howard, Lara M. Greaves, D. Osborne, C. Sibley
Does the day of the week an email is sent inviting existing participants to complete a follow-up questionnaire for an annual online survey impact response rate? We answer this question using a preregistered experiment conducted as part of an ongoing national probability panel study in New Zealand. Across 14 consecutive days, existing participants in a panel study were randomly allocated a day of the week to receive an email inviting them to complete the next wave of the questionnaire online (N = 26,126). Valid responses included questionnaires completed within 31 days of receiving the initial invitation. Results revealed that the day the invitation was sent did not affect the likelihood of responding. These results are reassuring for researchers conducting ongoing panel studies and suggest that, once participants have joined a panel, the day of the week they are contacted does not impact their likelihood of responding to subsequent waves.
是否在一周中的哪一天发送电子邮件,邀请现有参与者完成年度在线调查影响回复率的后续问卷?我们使用一项预先注册的实验来回答这个问题,该实验是新西兰正在进行的国家概率小组研究的一部分。在连续14天的时间里,一项小组研究的现有参与者被随机分配到一周中的一天,收到一封电子邮件,邀请他们在线完成下一波问卷(N = 26126)。有效答复包括在31日内完成的调查表 收到初始邀请的天数。结果显示,发出邀请的当天不会影响回复的可能性。这些结果让正在进行的小组研究的研究人员感到放心,并表明,一旦参与者加入小组,他们在一周中的哪一天联系不会影响他们对后续浪潮做出反应的可能性。
{"title":"Is there a Day of the Week Effect on Panel Response Rate to an Online Questionnaire Email Invitation?","authors":"Chloe Howard, Lara M. Greaves, D. Osborne, C. Sibley","doi":"10.1093/jssam/smad014","DOIUrl":"https://doi.org/10.1093/jssam/smad014","url":null,"abstract":"\u0000 Does the day of the week an email is sent inviting existing participants to complete a follow-up questionnaire for an annual online survey impact response rate? We answer this question using a preregistered experiment conducted as part of an ongoing national probability panel study in New Zealand. Across 14 consecutive days, existing participants in a panel study were randomly allocated a day of the week to receive an email inviting them to complete the next wave of the questionnaire online (N = 26,126). Valid responses included questionnaires completed within 31 days of receiving the initial invitation. Results revealed that the day the invitation was sent did not affect the likelihood of responding. These results are reassuring for researchers conducting ongoing panel studies and suggest that, once participants have joined a panel, the day of the week they are contacted does not impact their likelihood of responding to subsequent waves.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46873457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interviewer Involvement in Respondent Selection Moderates the Relationship between Response Rates and Sample Bias in Cross-National Survey Projects in Europe 访谈者参与受访者选择调节回复率和样本偏差在欧洲跨国调查项目之间的关系
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-05-18 DOI: 10.1093/jssam/smad013
M. Kołczyńska, P. Jabkowski, S. Eckman
Survey researchers and practitioners often assume that higher response rates are associated with a higher quality of survey data. However, the evidence for this claim in face-to-face surveys is mixed. To explain these mixed results, recent studies have proposed that interviewers’ involvement in respondent selection moderates the effect of response rates on data quality. Previous analyses based on data from the European Social Survey found that response rates are positively associated with data quality when interviewer involvement in respondent selection is minimal. However, the association between response rates and data quality is negative when interviewers are more involved in respondent selection through household frame creation or within-household selection of target persons. These studies have hypothesized that some interviewers deviate from prescribed selection procedures to select individuals with higher response propensities, which increase response rates while reducing data quality. We replicate these results with an extended dataset, including more recent European Social Survey rounds and three other European survey projects: the European Quality of Life Survey, European Values Study, and International Social Survey Programme. Based on our results, we recommend that surveys include procedures to verify respondent-selection practices into their fieldwork control procedures.
调查研究人员和从业人员通常认为,较高的回复率与较高的调查数据质量有关。然而,在面对面的调查中,这种说法的证据是混杂的。为了解释这些混合的结果,最近的研究提出,采访者参与被调查者的选择调节了回复率对数据质量的影响。先前基于欧洲社会调查数据的分析发现,当采访者参与受访者选择时,回复率与数据质量呈正相关。然而,当采访者更多地通过家庭框架创建或家庭内目标人员的选择参与受访者选择时,回复率与数据质量之间的关联是负的。这些研究假设,一些采访者偏离了规定的选择程序,选择具有更高反应倾向的个人,这增加了回复率,同时降低了数据质量。我们用扩展的数据集复制了这些结果,包括最近的欧洲社会调查和其他三个欧洲调查项目:欧洲生活质量调查、欧洲价值观研究和国际社会调查计划。根据我们的研究结果,我们建议在调查中包括验证受访者选择实践的程序,以纳入其实地工作控制程序。
{"title":"Interviewer Involvement in Respondent Selection Moderates the Relationship between Response Rates and Sample Bias in Cross-National Survey Projects in Europe","authors":"M. Kołczyńska, P. Jabkowski, S. Eckman","doi":"10.1093/jssam/smad013","DOIUrl":"https://doi.org/10.1093/jssam/smad013","url":null,"abstract":"\u0000 Survey researchers and practitioners often assume that higher response rates are associated with a higher quality of survey data. However, the evidence for this claim in face-to-face surveys is mixed. To explain these mixed results, recent studies have proposed that interviewers’ involvement in respondent selection moderates the effect of response rates on data quality. Previous analyses based on data from the European Social Survey found that response rates are positively associated with data quality when interviewer involvement in respondent selection is minimal. However, the association between response rates and data quality is negative when interviewers are more involved in respondent selection through household frame creation or within-household selection of target persons. These studies have hypothesized that some interviewers deviate from prescribed selection procedures to select individuals with higher response propensities, which increase response rates while reducing data quality. We replicate these results with an extended dataset, including more recent European Social Survey rounds and three other European survey projects: the European Quality of Life Survey, European Values Study, and International Social Survey Programme. Based on our results, we recommend that surveys include procedures to verify respondent-selection practices into their fieldwork control procedures.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46265346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of Covid-19 Prevalence Dynamics from Pooled Data 从汇总数据估计新冠肺炎患病率动态
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-05-12 DOI: 10.1093/jssam/smad011
Braden Scherting, A. Peel, R. Plowright, A. Hoegh
Estimating the prevalence of a disease, such as COVID-19, is necessary for evaluating and mitigating risks of its transmission. Estimates that consider how prevalence changes with time provide more information about these risks but are difficult to obtain due to the necessary survey intensity and commensurate testing costs. Motivated by a dataset on COVID-19, from the University of Notre Dame, we propose pooling and jointly testing multiple samples to reduce testing costs. A nonparametric, hierarchical Bayesian model is used to infer population prevalence from the pooled test results without needing to retest individuals from pools that test positive. This approach is shown to reduce uncertainty compared to individual testing at the same budget and to produce similar estimates compared to individual testing at a much higher budget through simulation studies and an analysis of COVID-19 infections at Notre Dame.
估计新冠肺炎等疾病的流行率对于评估和减轻其传播风险是必要的。考虑流行率如何随时间变化的估计提供了更多关于这些风险的信息,但由于必要的调查强度和相应的检测成本,很难获得。受圣母大学新冠肺炎数据集的启发,我们建议将多个样本合并并联合检测,以降低检测成本。使用非参数分层贝叶斯模型从汇集的测试结果中推断人群患病率,而无需重新测试来自检测呈阳性的汇集中的个体。通过模拟研究和对圣母院新冠肺炎感染的分析,与相同预算下的单独检测相比,这种方法可以减少不确定性,并与更高预算下的个人检测相比产生类似的估计。
{"title":"Estimation of Covid-19 Prevalence Dynamics from Pooled Data","authors":"Braden Scherting, A. Peel, R. Plowright, A. Hoegh","doi":"10.1093/jssam/smad011","DOIUrl":"https://doi.org/10.1093/jssam/smad011","url":null,"abstract":"\u0000 Estimating the prevalence of a disease, such as COVID-19, is necessary for evaluating and mitigating risks of its transmission. Estimates that consider how prevalence changes with time provide more information about these risks but are difficult to obtain due to the necessary survey intensity and commensurate testing costs. Motivated by a dataset on COVID-19, from the University of Notre Dame, we propose pooling and jointly testing multiple samples to reduce testing costs. A nonparametric, hierarchical Bayesian model is used to infer population prevalence from the pooled test results without needing to retest individuals from pools that test positive. This approach is shown to reduce uncertainty compared to individual testing at the same budget and to produce similar estimates compared to individual testing at a much higher budget through simulation studies and an analysis of COVID-19 infections at Notre Dame.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42079968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Experimenting with QR Codes and Envelope Size in Push-to-Web Surveys 推送网络调查中QR码和信封大小的实验
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-04-26 DOI: 10.1093/jssam/smad008
Kyle Endres, E. Heiden, Ki H. Park, M. Losch, K. Harland, Anne L Abbott
Survey researchers are continually evaluating approaches to increase response rates, especially those that can be implemented with little or no costs. In this study, we experimentally evaluated whether or not including Quick Response (QR) codes in mailed recruitment materials for self-administered web surveys increased web survey participation. We also assessed whether mailing these materials in a non-standard envelope size (6 × 9 inch) yielded a higher response rate than invitations mailed in a standard, #10 envelope (4.125 × 9.5 inch). These experiments were embedded in a sequential mixed-mode (dual-frame phone and web) statewide survey. Including a QR code (in addition to a URL) significantly increased the response rate compared to invitations that only included a URL in our study. As expected, a consequence of including the QR code was an elevated number of completions on smartphones or tablets among households randomly assigned to the QR code condition. The use of a larger (6 × 9 inch) envelope did not affect the overall response rate but did significantly boost the response rate for the landline sample (envelopes addressed to “STATE resident”) while having little effect for the wireless sample (envelopes addressed by name). This study suggests that incorporating both QR codes and larger (6 × 9 inch) envelopes in mail recruitment materials for web surveys is a cost-effective approach to increase web participation.
调查研究人员正在不断评估提高回复率的方法,特别是那些可以很少或没有成本实施的方法。在本研究中,我们通过实验评估了在邮寄的自我管理的网络调查招聘材料中加入QR码是否会增加网络调查的参与度。我们还评估了用非标准尺寸的信封(6 × 9英寸)邮寄这些材料是否比用标准10号信封(4.125 × 9.5英寸)邮寄邀请函的回复率更高。这些实验被嵌入到一个连续的混合模式(双框架电话和网络)全州调查中。在我们的研究中,与只包含URL的邀请相比,包含QR码(除了URL)显著提高了回复率。不出所料,加入二维码的结果是,在随机分配到二维码组的家庭中,智能手机或平板电脑上的完成次数增加了。使用较大的(6 × 9英寸)信封不影响总体响应率,但确实显著提高了固定电话样本(信封地址为“州居民”)的响应率,而对无线样本(信封地址为姓名)的影响很小。这项研究表明,将QR码和更大的(6 × 9英寸)信封结合在邮件招聘材料中进行网络调查是提高网络参与度的一种经济有效的方法。
{"title":"Experimenting with QR Codes and Envelope Size in Push-to-Web Surveys","authors":"Kyle Endres, E. Heiden, Ki H. Park, M. Losch, K. Harland, Anne L Abbott","doi":"10.1093/jssam/smad008","DOIUrl":"https://doi.org/10.1093/jssam/smad008","url":null,"abstract":"\u0000 Survey researchers are continually evaluating approaches to increase response rates, especially those that can be implemented with little or no costs. In this study, we experimentally evaluated whether or not including Quick Response (QR) codes in mailed recruitment materials for self-administered web surveys increased web survey participation. We also assessed whether mailing these materials in a non-standard envelope size (6 × 9 inch) yielded a higher response rate than invitations mailed in a standard, #10 envelope (4.125 × 9.5 inch). These experiments were embedded in a sequential mixed-mode (dual-frame phone and web) statewide survey. Including a QR code (in addition to a URL) significantly increased the response rate compared to invitations that only included a URL in our study. As expected, a consequence of including the QR code was an elevated number of completions on smartphones or tablets among households randomly assigned to the QR code condition. The use of a larger (6 × 9 inch) envelope did not affect the overall response rate but did significantly boost the response rate for the landline sample (envelopes addressed to “STATE resident”) while having little effect for the wireless sample (envelopes addressed by name). This study suggests that incorporating both QR codes and larger (6 × 9 inch) envelopes in mail recruitment materials for web surveys is a cost-effective approach to increase web participation.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44973615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA) 复杂抽样调查分析(PRICSSA)首选报告项目
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-04-26 DOI: 10.1093/jssam/smac040
A. Seidenberg, R. Moser, B. West
Methodological issues pertaining to transparency and analytic error have been widely documented for publications featuring analysis of complex sample survey data. The availability of numerous public use datasets to researchers without adequate training in using these data likely contributes to these problems. In an effort to introduce standards for reporting analyses of survey data and promote replication, we propose the Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA), an itemized checklist to guide researchers publishing analyses using complex sample survey data. PRICSSA is modeled after other checklists (e.g., PRISMA, CONSORT) that have been widely adopted for other research designs. The PRICSSA items include a variety of survey characteristics, such as data collection dates, mode(s), response rate, and sample selection process. In addition, essential analytic information—such as sample sizes for all estimates, missing data rates and imputation methods (if applicable), disclosing if any data were deleted, specifying what survey weight and sample design variables were used along with method of variance estimation, and reporting design-adjusted standard errors/confidence intervals for all estimates—are also included. PRICSSA also recommends that authors make all corresponding software code available. Widespread adoption of PRICSSA will help improve the quality of secondary analyses of complex sample survey data through transparency and promote scientific rigor and reproducibility.
与透明度和分析误差有关的方法论问题已被广泛记录在以分析复杂抽样调查数据为特色的出版物中。研究人员在使用这些数据方面没有经过充分培训,就可以获得大量的公共使用数据集,这可能是造成这些问题的原因。为了引入调查数据报告分析的标准并促进复制,我们提出了复杂样本调查分析的首选报告项目(PRICSSA),这是一个逐项检查表,用于指导研究人员使用复杂样本调查数据发布分析。PRICSSA是以其他研究设计中广泛采用的其他检查表(如PRISMA、CONSORT)为模型的。PRICSSA项目包括各种调查特征,如数据收集日期、模式、响应率和样本选择过程。此外,还包括基本的分析信息,如所有估计的样本量、缺失数据率和插补方法(如适用)、披露是否删除了任何数据、指定使用了什么调查权重和样本设计变量以及方差估计方法,以及报告所有估计的经设计调整的标准误差/置信区间。PRICSSA还建议作者提供所有相应的软件代码。PRICSSA的广泛采用将有助于通过透明度提高复杂样本调查数据的二次分析质量,并促进科学的严谨性和再现性。
{"title":"Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA)","authors":"A. Seidenberg, R. Moser, B. West","doi":"10.1093/jssam/smac040","DOIUrl":"https://doi.org/10.1093/jssam/smac040","url":null,"abstract":"\u0000 Methodological issues pertaining to transparency and analytic error have been widely documented for publications featuring analysis of complex sample survey data. The availability of numerous public use datasets to researchers without adequate training in using these data likely contributes to these problems. In an effort to introduce standards for reporting analyses of survey data and promote replication, we propose the Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA), an itemized checklist to guide researchers publishing analyses using complex sample survey data. PRICSSA is modeled after other checklists (e.g., PRISMA, CONSORT) that have been widely adopted for other research designs. The PRICSSA items include a variety of survey characteristics, such as data collection dates, mode(s), response rate, and sample selection process. In addition, essential analytic information—such as sample sizes for all estimates, missing data rates and imputation methods (if applicable), disclosing if any data were deleted, specifying what survey weight and sample design variables were used along with method of variance estimation, and reporting design-adjusted standard errors/confidence intervals for all estimates—are also included. PRICSSA also recommends that authors make all corresponding software code available. Widespread adoption of PRICSSA will help improve the quality of secondary analyses of complex sample survey data through transparency and promote scientific rigor and reproducibility.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42455728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Recent Advances in Data Integration 数据集成的最新进展
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-04-21 DOI: 10.1093/jssam/smad009
J. Sakshaug, R. Steorts
The availability of both survey and non-survey data sources, such as administrative data, social media data, and digital trace data, has grown rapidly over the past decade. With this expansion in data, the statistical, methodological, computational, and ethical challenges around integrating multiple data sources have also grown. This special issue addresses these challenges by highlighting recent innovations and applications in data integration and related topics.
调查和非调查数据源的可用性,如行政数据、社交媒体数据和数字跟踪数据,在过去十年中迅速增长。随着数据的增长,围绕集成多个数据源的统计、方法、计算和伦理挑战也在增长。本期特刊通过强调数据集成和相关主题中的最新创新和应用来解决这些挑战。
{"title":"Recent Advances in Data Integration","authors":"J. Sakshaug, R. Steorts","doi":"10.1093/jssam/smad009","DOIUrl":"https://doi.org/10.1093/jssam/smad009","url":null,"abstract":"\u0000 The availability of both survey and non-survey data sources, such as administrative data, social media data, and digital trace data, has grown rapidly over the past decade. With this expansion in data, the statistical, methodological, computational, and ethical challenges around integrating multiple data sources have also grown. This special issue addresses these challenges by highlighting recent innovations and applications in data integration and related topics.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48178429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Panel Conditioning in A Probability-based Longitudinal study: A Comparison of Respondents with Different Levels of Survey Experience 基于概率的纵向研究中的面板调节:不同调查经验水平的受访者的比较
4区 数学 Q1 Social Sciences Pub Date : 2023-04-13 DOI: 10.1093/jssam/smad004
Fabienne Kraemer, Henning Silber, Bella Struminskaya, Matthias Sand, Michael Bosnjak, Joanna Koßmann, Bernd Weiß
Abstract Learning effects due to repeated interviewing, also known as panel conditioning, are a major threat to response quality in later waves of a panel study. To date, research has not provided a clear picture regarding the circumstances, mechanisms, and dimensions of potential panel conditioning effects. In particular, the effects of conditioning frequency, that is, different levels of experience within a panel, on response quality are underexplored. Against this background, we investigated the effects of panel conditioning by using data from the GESIS Panel, a German mixed-mode probability-based panel study. Using two refreshment samples, we compared three panel cohorts with differing levels of experience on several response quality indicators related to the mechanisms of reflection, satisficing, and social desirability. Overall, we find evidence for both negative (i.e., disadvantageous for response quality) and positive (i.e., advantageous for response quality) panel conditioning. Highly experienced respondents were more likely to satisfice by speeding through the questionnaire. They also had a higher probability of refusing to answer sensitive questions than less experienced panel members. However, more experienced respondents were also more likely to optimize the response process by needing less time compared to panelists with lower experience levels (when controlling for speeding). In contrast, we did not find significant differences with respect to the number of “don’t know” responses, nondifferentiation, the selection of first response categories and mid-responses, and the number of nontriggered filter questions. Of the observed differences, speeding showed the highest magnitude with an average increase of 6.0 percentage points for highly experienced panel members compared to low experienced panelists.
由于反复访谈而产生的学习效应,也被称为小组条件反射,是小组研究后期反应质量的主要威胁。到目前为止,研究还没有提供一个关于环境、机制和潜在面板调节效应的维度的清晰图景。特别是,条件反射频率的影响,即小组内不同水平的经验,对反应质量的探索不足。在此背景下,我们使用德国混合模式基于概率的面板研究GESIS面板的数据调查了面板调节的影响。使用两个茶点样本,我们比较了三个具有不同经验水平的小组队列,涉及反思、满意和社会可取性机制的几个反应质量指标。总的来说,我们发现了消极(即对反应质量不利)和积极(即对反应质量有利)面板条件作用的证据。经验丰富的受访者更有可能通过快速完成问卷而感到满意。与经验不足的小组成员相比,他们拒绝回答敏感问题的可能性也更高。然而,与经验水平较低的小组成员相比,经验丰富的受访者也更有可能优化响应过程,因为他们需要的时间更少(在控制超速的情况下)。相比之下,我们在“不知道”回答的数量、非分化、第一反应类别和中间反应的选择以及非触发过滤问题的数量方面没有发现显著差异。在观察到的差异中,经验丰富的小组成员的超速程度最高,平均比经验不足的小组成员高出6.0个百分点。
{"title":"Panel Conditioning in A Probability-based Longitudinal study: A Comparison of Respondents with Different Levels of Survey Experience","authors":"Fabienne Kraemer, Henning Silber, Bella Struminskaya, Matthias Sand, Michael Bosnjak, Joanna Koßmann, Bernd Weiß","doi":"10.1093/jssam/smad004","DOIUrl":"https://doi.org/10.1093/jssam/smad004","url":null,"abstract":"Abstract Learning effects due to repeated interviewing, also known as panel conditioning, are a major threat to response quality in later waves of a panel study. To date, research has not provided a clear picture regarding the circumstances, mechanisms, and dimensions of potential panel conditioning effects. In particular, the effects of conditioning frequency, that is, different levels of experience within a panel, on response quality are underexplored. Against this background, we investigated the effects of panel conditioning by using data from the GESIS Panel, a German mixed-mode probability-based panel study. Using two refreshment samples, we compared three panel cohorts with differing levels of experience on several response quality indicators related to the mechanisms of reflection, satisficing, and social desirability. Overall, we find evidence for both negative (i.e., disadvantageous for response quality) and positive (i.e., advantageous for response quality) panel conditioning. Highly experienced respondents were more likely to satisfice by speeding through the questionnaire. They also had a higher probability of refusing to answer sensitive questions than less experienced panel members. However, more experienced respondents were also more likely to optimize the response process by needing less time compared to panelists with lower experience levels (when controlling for speeding). In contrast, we did not find significant differences with respect to the number of “don’t know” responses, nondifferentiation, the selection of first response categories and mid-responses, and the number of nontriggered filter questions. Of the observed differences, speeding showed the highest magnitude with an average increase of 6.0 percentage points for highly experienced panel members compared to low experienced panelists.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135189112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Data Collection Interventions to Balance Cost and Quality in a Sequential Multimode Survey 在顺序多模式调查中优化数据收集干预以平衡成本和质量
4区 数学 Q1 Social Sciences Pub Date : 2023-04-08 DOI: 10.1093/jssam/smad007
Stephanie M Coffey, Michael R Elliott
Abstract High-quality survey data collection is getting more expensive to conduct because of decreasing response rates and rising data collection costs. Responsive and adaptive designs have emerged as a framework for targeting and reallocating resources during the data collection period to improve survey data collection efficiency. Here, we report on the implementation and evaluation of a responsive design experiment in the National Survey of College Graduates that optimizes the cost-quality tradeoff by minimizing a function of data collection costs and the root mean squared error of a key survey measure, self-reported salary. We used a Bayesian framework to incorporate prior information and generate predictions of estimated response propensity, self-reported salary, and data collection costs for use in our optimization rule. At three points during the data collection process, we implement the optimization rule and identify cases for which reduced effort would have minimal effect on the mean squared error (RMSE) of mean self-reported salary while allowing us to reduce data collection costs. We find that this optimization process allowed us to reduce data collection costs by nearly 10 percent, without a statistically or practically significant increase in the RMSE of mean salary or a decrease in the unweighted response rate. This experiment demonstrates the potential for these types of designs to more effectively target data collection resources to reach survey quality goals.
由于回复率的下降和数据收集成本的上升,高质量的调查数据收集成本越来越高。响应式和适应性设计已成为数据收集期间确定目标和重新分配资源的框架,以提高调查数据收集效率。在这里,我们报告了在全国大学毕业生调查中响应式设计实验的实施和评估,该实验通过最小化数据收集成本和关键调查措施(自我报告工资)的均方根误差的函数来优化成本-质量权衡。我们使用贝叶斯框架来整合先验信息,并生成估计的响应倾向、自我报告的工资和数据收集成本的预测,以用于我们的优化规则。在数据收集过程中的三个点上,我们实施了优化规则,并确定了减少工作量对平均自我报告工资的均方误差(RMSE)影响最小的情况,同时允许我们降低数据收集成本。我们发现,这种优化过程使我们能够将数据收集成本降低近10%,而平均工资的均方根误差在统计上或实际上没有显著增加,未加权的响应率也没有下降。这个实验证明了这些类型的设计更有效地定位数据收集资源以达到调查质量目标的潜力。
{"title":"Optimizing Data Collection Interventions to Balance Cost and Quality in a Sequential Multimode Survey","authors":"Stephanie M Coffey, Michael R Elliott","doi":"10.1093/jssam/smad007","DOIUrl":"https://doi.org/10.1093/jssam/smad007","url":null,"abstract":"Abstract High-quality survey data collection is getting more expensive to conduct because of decreasing response rates and rising data collection costs. Responsive and adaptive designs have emerged as a framework for targeting and reallocating resources during the data collection period to improve survey data collection efficiency. Here, we report on the implementation and evaluation of a responsive design experiment in the National Survey of College Graduates that optimizes the cost-quality tradeoff by minimizing a function of data collection costs and the root mean squared error of a key survey measure, self-reported salary. We used a Bayesian framework to incorporate prior information and generate predictions of estimated response propensity, self-reported salary, and data collection costs for use in our optimization rule. At three points during the data collection process, we implement the optimization rule and identify cases for which reduced effort would have minimal effect on the mean squared error (RMSE) of mean self-reported salary while allowing us to reduce data collection costs. We find that this optimization process allowed us to reduce data collection costs by nearly 10 percent, without a statistically or practically significant increase in the RMSE of mean salary or a decrease in the unweighted response rate. This experiment demonstrates the potential for these types of designs to more effectively target data collection resources to reach survey quality goals.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135648136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1