首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
COMPARATIVE EFFECTIVENESS OF PROPENSITY SCORE ESTIMATION METHODS FOR INVERSE PROBABILITY OF TREATMENT WEIGHTING ANALYSIS WITH COMPLEX SURVEY DATA: A SIMULATION STUDY. 复杂调查数据处理加权逆概率分析的倾向得分估计方法的比较有效性:模拟研究。
IF 1.6 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2025-04-12 DOI: 10.1093/jssam/smaf003
Lihua Li, Chen Yang, Liangyuan Hu, Wei Zhang, Melissa Aldridge, Bian Liu, Madhu Mazumdar

Propensity score (PS) methods, including inverse probability of treatment weighting (IPTW) analysis, are increasingly applied to complex survey data in geriatric studies to infer causal effects. However, the comparative effectiveness of various PS estimation methods, particularly novel machine learning algorithms, has not been thoroughly explored when complex survey data are involved. We conducted a comprehensive simulation study to compare the following six PS estimation methods in IPTW analysis: Logistic Regression, Covariate Balancing Propensity Score, Generalized Boosted Model, Classification and Regression Tree, Random Forest (RF), and Super Learner. We considered 12 scenarios with varying treatment effects, degrees of non-linearity and non-additivity in the associations between covariates and the exposure, and levels of PS overlap. The performance of these six methods was assessed in terms of mean relative bias, root mean square error, and coverage probability. The results showed a similar performance across all methods when PS overlap was strong. However, RF consistently outperformed the other methods when PS overlap was not strong and under non-additive and non-linear scenarios. The results suggest RF to be a more effective approach for PS estimation than the other proposed methods when applying IPTW analysis to complex survey data for population average treatment effects. The methods were applied to data from the Medicare Beneficiary Current Survey for years 2002-2019 to estimate the impact of hospice use on end-of-life healthcare costs. Findings from the real-world example show that hospice use was significantly associated with reduced end-of-life healthcare costs of Medicare Beneficiaries.

倾向评分(PS)方法,包括治疗加权逆概率(IPTW)分析,越来越多地应用于老年研究中复杂的调查数据,以推断因果关系。然而,当涉及复杂的调查数据时,各种PS估计方法,特别是新颖的机器学习算法的比较有效性尚未得到彻底的探索。我们进行了全面的模拟研究,比较了以下六种PS估计方法:逻辑回归、协变量平衡倾向评分、广义提升模型、分类与回归树、随机森林(RF)和超级学习者。我们考虑了12种不同处理效果的情况,协变量与暴露之间的非线性和非加性程度,以及PS重叠水平。根据平均相对偏差、均方根误差和覆盖概率对这六种方法的性能进行评估。结果表明,当PS重叠较强时,所有方法的性能相似。然而,在PS重叠不强、非加性和非线性场景下,RF始终优于其他方法。结果表明,当将IPTW分析应用于复杂的调查数据以获得人口平均处理效果时,RF是一种比其他方法更有效的PS估计方法。这些方法应用于2002-2019年医疗保险受益人当前调查的数据,以估计临终关怀使用对临终医疗成本的影响。从现实世界的例子中发现,临终关怀的使用与降低医疗保险受益人的生命末期医疗保健费用显着相关。
{"title":"COMPARATIVE EFFECTIVENESS OF PROPENSITY SCORE ESTIMATION METHODS FOR INVERSE PROBABILITY OF TREATMENT WEIGHTING ANALYSIS WITH COMPLEX SURVEY DATA: A SIMULATION STUDY.","authors":"Lihua Li, Chen Yang, Liangyuan Hu, Wei Zhang, Melissa Aldridge, Bian Liu, Madhu Mazumdar","doi":"10.1093/jssam/smaf003","DOIUrl":"10.1093/jssam/smaf003","url":null,"abstract":"<p><p>Propensity score (PS) methods, including inverse probability of treatment weighting (IPTW) analysis, are increasingly applied to complex survey data in geriatric studies to infer causal effects. However, the comparative effectiveness of various PS estimation methods, particularly novel machine learning algorithms, has not been thoroughly explored when complex survey data are involved. We conducted a comprehensive simulation study to compare the following six PS estimation methods in IPTW analysis: Logistic Regression, Covariate Balancing Propensity Score, Generalized Boosted Model, Classification and Regression Tree, Random Forest (RF), and Super Learner. We considered 12 scenarios with varying treatment effects, degrees of non-linearity and non-additivity in the associations between covariates and the exposure, and levels of PS overlap. The performance of these six methods was assessed in terms of mean relative bias, root mean square error, and coverage probability. The results showed a similar performance across all methods when PS overlap was strong. However, RF consistently outperformed the other methods when PS overlap was not strong and under non-additive and non-linear scenarios. The results suggest RF to be a more effective approach for PS estimation than the other proposed methods when applying IPTW analysis to complex survey data for population average treatment effects. The methods were applied to data from the Medicare Beneficiary Current Survey for years 2002-2019 to estimate the impact of hospice use on end-of-life healthcare costs. Findings from the real-world example show that hospice use was significantly associated with reduced end-of-life healthcare costs of Medicare Beneficiaries.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12721855/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145819878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthesizing Surveys with Multiple Units of Observation: An Application to the Longitudinal Aging Study in India. 多观测单位综合调查:在印度纵向老龄化研究中的应用。
IF 1.6 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2025-01-09 eCollection Date: 2025-09-01 DOI: 10.1093/jssam/smae047
Joshua Snoke, Erik Meijer, Drystan Phillips, Jenny Wilkens, Jinkook Lee

We present methodology for creating synthetic data and an application to create a publicly releasable synthetic version of the Longitudinal Aging Study in India (LASI). The LASI, a health and retirement survey, is used for research and educational purposes, but it can only be shared under restricted access due to privacy considerations. We present novel methods to synthesize the survey, maintaining three nested levels of observation-individuals, couples, and households-with both continuous and categorical variables and survey weights. We show that the synthetic data maintains the distributional patterns of the confidential data and largely mitigates identification and attribute disclosure risk. We also present a novel method for controlling the risk and utility tradeoff for the synthetic data that take into account the survey sampling rates. Specifically, we down-weight records that have a high likelihood of being uniquely identifiable in the population due to unique demographic information and oversampling. We show this approach reduces both identification and attribute risk for records while preserving better utility over another common approach of coarsening records. Our methods and evaluations provide a foundation for creating a synthetic version of surveys with multiple units of observation, such as the LASI.

我们提出了创建合成数据的方法和一个应用程序,以创建一个公开发布的印度纵向老龄化研究(LASI)的合成版本。LASI是一项健康和退休调查,用于研究和教育目的,但出于隐私考虑,只能在限制访问的情况下共享。我们提出了新的方法来综合调查,维持三个嵌套的观察水平-个人,夫妇和家庭-具有连续和分类变量和调查权重。研究表明,合成数据保持了机密数据的分布模式,并在很大程度上降低了识别和属性泄露风险。我们还提出了一种新的方法来控制风险和效用权衡的综合数据,考虑到调查抽样率。具体来说,由于独特的人口统计信息和过采样,我们降低了在人群中具有独特可识别性的高可能性的记录的权重。我们展示了这种方法减少了记录的识别和属性风险,同时比另一种常见的粗化记录方法保留了更好的效用。我们的方法和评估为创建具有多个观测单元的综合调查版本提供了基础,例如LASI。
{"title":"Synthesizing Surveys with Multiple Units of Observation: An Application to the Longitudinal Aging Study in India.","authors":"Joshua Snoke, Erik Meijer, Drystan Phillips, Jenny Wilkens, Jinkook Lee","doi":"10.1093/jssam/smae047","DOIUrl":"10.1093/jssam/smae047","url":null,"abstract":"<p><p>We present methodology for creating synthetic data and an application to create a publicly releasable synthetic version of the Longitudinal Aging Study in India (LASI). The LASI, a health and retirement survey, is used for research and educational purposes, but it can only be shared under restricted access due to privacy considerations. We present novel methods to synthesize the survey, maintaining three nested levels of observation-individuals, couples, and households-with both continuous and categorical variables and survey weights. We show that the synthetic data maintains the distributional patterns of the confidential data and largely mitigates identification and attribute disclosure risk. We also present a novel method for controlling the risk and utility tradeoff for the synthetic data that take into account the survey sampling rates. Specifically, we down-weight records that have a high likelihood of being uniquely identifiable in the population due to unique demographic information and oversampling. We show this approach reduces both identification and attribute risk for records while preserving better utility over another common approach of coarsening records. Our methods and evaluations provide a foundation for creating a synthetic version of surveys with multiple units of observation, such as the LASI.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"13 4","pages":"420-444"},"PeriodicalIF":1.6,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145482443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real World Data Versus Probability Surveys for Estimating Health Conditions at the State Level. 真实世界数据与估计州一级健康状况的概率调查。
IF 1.6 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-11-01 DOI: 10.1093/jssam/smae036
David A Marker, Charity Hilton, Jacob Zelko, Jon Duke, Deborah Rolka, Rachel Kaufmann, Richard Boyd

Government statistical offices worldwide are under pressure to produce statistics rapidly and for more detailed geographies, to compete with unofficial estimates available from web-based big data sources or from private companies. Commonly suggested sources of improved health information are electronic health records (EHRs) and medical claims data. These data sources are collectively known as real world data (RWD) because they are generated from routine health care processes, and they are available for millions of patients. It is clear that RWD can provide estimates that are more timely and less expensive to produce- but a key question is whether or not they are very accurate. To test this, we took advantage of a unique health data source that includes a full range of sociodemographic variables and compare estimates using all of those potential weighting variables, versus estimates derived when only age and sex are available for weighting (as is common with most RWD sources). We show that not accounting for other variables can produce misleading, and quite inaccurate, health estimates.

世界各地的政府统计部门都面临着压力,需要迅速编制更详细的地区统计数据,以与基于网络的大数据源或私营公司提供的非官方估计数据竞争。通常建议改进健康信息的来源是电子健康记录(EHRs)和医疗索赔数据。这些数据源统称为真实世界数据(RWD),因为它们是由常规医疗保健流程生成的,可供数百万患者使用。很明显,RWD可以提供更及时、成本更低的估算——但一个关键问题是它们是否非常准确。为了验证这一点,我们利用了一个独特的健康数据源,其中包括各种社会人口变量,并将使用所有这些潜在加权变量的估计值与仅使用年龄和性别进行加权的估计值进行比较(大多数RWD来源都是如此)。我们表明,不考虑其他变量可能会产生误导性的、相当不准确的健康估计。
{"title":"Real World Data Versus Probability Surveys for Estimating Health Conditions at the State Level.","authors":"David A Marker, Charity Hilton, Jacob Zelko, Jon Duke, Deborah Rolka, Rachel Kaufmann, Richard Boyd","doi":"10.1093/jssam/smae036","DOIUrl":"10.1093/jssam/smae036","url":null,"abstract":"<p><p>Government statistical offices worldwide are under pressure to produce statistics rapidly and for more detailed geographies, to compete with unofficial estimates available from web-based big data sources or from private companies. Commonly suggested sources of improved health information are electronic health records (EHRs) and medical claims data. These data sources are collectively known as real world data (RWD) because they are generated from routine health care processes, and they are available for millions of patients. It is clear that RWD can provide estimates that are more timely and less expensive to produce- but a key question is whether or not they are very accurate. To test this, we took advantage of a unique health data source that includes a full range of sociodemographic variables and compare estimates using all of those potential weighting variables, versus estimates derived when only age and sex are available for weighting (as is common with most RWD sources). We show that not accounting for other variables can produce misleading, and quite inaccurate, health estimates.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"12 5","pages":"1515-1530"},"PeriodicalIF":1.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11708384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142950570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing Potential Non-Ignorable Selection Bias in an Off-Wave Mail Survey Implemented in a Long-Standing Panel Study. 在一项长期的小组研究中,分析潜在的不可忽视的选择偏差。
IF 1.6 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-10-23 eCollection Date: 2025-02-01 DOI: 10.1093/jssam/smae039
Heather M Schroeder, Brady T West

Typical design-based methods for weighting probability samples rely on several assumptions, including the random selection of sampled units according to known probabilities of selection and ignorable unit nonresponse. If any of these assumptions are not met, weighting methods that account for the probabilities of selection, nonresponse, and calibration may not fully account for the potential selection bias in a given sample, which could produce misleading population estimates. This analysis investigates possible selection bias in the 2019 Health Survey Mailer (HSM), a sub-study of the longitudinal Health and Retirement Study (HRS). The primary HRS data collection has occurred in "even" years since 1992, but additional survey data collections take place in the "off-wave" odd years via mailed invitations sent to selected participants. While the HSM achieved a high response rate (83 percent), the assumption of ignorable probability-based selection of HRS panel members may not hold due to the eligibility criteria that were imposed. To investigate this possible non-ignorable selection bias, our analysis utilizes a novel analysis method for estimating measures of unadjusted bias for proportions (MUBP), introduced by Andridge et al. in 2019. This method incorporates aggregate information from the larger HRS target population, including means, variances, and covariances for key covariates related to the HSM variables, to inform estimates of proportions. We explore potential non-ignorable selection bias by comparing proportions calculated from the HSM under three conditions: ignoring HRS weights, weighting based on the usual design-based approach for HRS "off-wave" mail surveys, and using the MUBP adjustment. We find examples of differences between the weighted and MUBP-adjusted estimates in four out of ten outcomes we analyzed. However, these differences are modest, and while this result gives some evidence of non-ignorable selection bias, typical design-based weighting methods are sufficient for correcting for it and their use is appropriate in this case.

典型的基于设计的概率样本加权方法依赖于几个假设,包括根据已知的选择概率随机选择抽样单位和可忽略的单位无响应。如果不满足这些假设中的任何一个,那么考虑选择、不响应和校准概率的加权方法可能无法完全考虑给定样本中潜在的选择偏差,从而可能产生误导性的总体估计。本分析调查了纵向健康与退休研究(HRS)的子研究——2019年健康调查梅勒(HSM)中可能存在的选择偏差。自1992年以来,主要的HRS数据收集发生在“偶数”年,但额外的调查数据收集发生在“非波浪”奇数年,通过邮寄邀请发送给选定的参与者。虽然HSM取得了很高的回复率(83%),但由于强加的资格标准,基于可忽略概率的HRS小组成员选择的假设可能不成立。为了研究这种可能的不可忽视的选择偏差,我们的分析使用了一种新的分析方法来估计未调整比例偏差(MUBP)的度量,该方法由Andridge等人于2019年引入。该方法结合了来自较大HRS目标人群的总体信息,包括与HSM变量相关的关键协变量的均值、方差和协方差,以告知比例估计。我们通过比较HSM在三种情况下计算出的比例来探索潜在的不可忽略的选择偏差:忽略HRS权重,基于HRS“非波”邮件调查的通常基于设计的方法进行加权,以及使用MUBP调整。我们在我们分析的十个结果中的四个中发现了加权和经mubp调整的估计之间的差异。然而,这些差异是适度的,虽然这个结果提供了一些不可忽视的选择偏差的证据,但典型的基于设计的加权方法足以纠正它,并且在这种情况下使用它们是合适的。
{"title":"Analyzing Potential Non-Ignorable Selection Bias in an Off-Wave Mail Survey Implemented in a Long-Standing Panel Study.","authors":"Heather M Schroeder, Brady T West","doi":"10.1093/jssam/smae039","DOIUrl":"10.1093/jssam/smae039","url":null,"abstract":"<p><p>Typical design-based methods for weighting probability samples rely on several assumptions, including the random selection of sampled units according to known probabilities of selection and ignorable unit nonresponse. If any of these assumptions are not met, weighting methods that account for the probabilities of selection, nonresponse, and calibration may not fully account for the potential selection bias in a given sample, which could produce misleading population estimates. This analysis investigates possible selection bias in the 2019 Health Survey Mailer (HSM), a sub-study of the longitudinal Health and Retirement Study (HRS). The primary HRS data collection has occurred in \"even\" years since 1992, but additional survey data collections take place in the \"off-wave\" odd years via mailed invitations sent to selected participants. While the HSM achieved a high response rate (83 percent), the assumption of ignorable probability-based selection of HRS panel members may not hold due to the eligibility criteria that were imposed. To investigate this possible non-ignorable selection bias, our analysis utilizes a novel analysis method for estimating measures of unadjusted bias for proportions (MUBP), introduced by Andridge et al. in 2019. This method incorporates aggregate information from the larger HRS target population, including means, variances, and covariances for key covariates related to the HSM variables, to inform estimates of proportions. We explore potential non-ignorable selection bias by comparing proportions calculated from the HSM under three conditions: ignoring HRS weights, weighting based on the usual design-based approach for HRS \"off-wave\" mail surveys, and using the MUBP adjustment. We find examples of differences between the weighted and MUBP-adjusted estimates in four out of ten outcomes we analyzed. However, these differences are modest, and while this result gives some evidence of non-ignorable selection bias, typical design-based weighting methods are sufficient for correcting for it and their use is appropriate in this case.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"13 1","pages":"100-127"},"PeriodicalIF":1.6,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770253/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143059427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Small Area Poverty Estimation under Heteroskedasticity 异方差下的小地区贫困估计
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-01-10 DOI: 10.1093/jssam/smad045
Sumonkanti Das, Ray Chambers
Multilevel models with nested errors are widely used in poverty estimation. An important application in this context is estimating the distribution of poverty as defined by the distribution of income within a set of domains that cover the population of interest. Since unit-level values of income are usually heteroskedastic, the standard homoskedasticity assumptions implicit in popular multilevel models may not be appropriate and can lead to bias, particularly when used to estimate domain-specific income distributions. This article addresses this problem when the income values in the population of interest can be characterized by a two-level mixed linear model with independent and identically distributed domain effects and with independent but not identically distributed individual effects. Estimation of poverty indicators that are functionals of domain-level income distributions is also addressed, and a nonparametric bootstrap procedure is used to estimate mean squared errors and confidence intervals. The proposed methodology is compared with the well-known World Bank poverty mapping methodology for this situation, using model-based simulation experiments as well as an empirical study based on Bangladesh poverty data.
具有嵌套误差的多层次模型被广泛应用于贫困估算。在这种情况下,一个重要的应用就是估算贫困的分布情况,而贫困的分布情况是由覆盖相关人口的一系列领域内的收入分布情况来定义的。由于单位水平的收入值通常是异方差的,因此流行的多层次模型中隐含的标准同方差假设可能并不合适,并可能导致偏差,尤其是在用于估计特定领域的收入分布时。当相关人群的收入值可以用具有独立且同分布的领域效应和具有独立但非同分布的个体效应的两级混合线性模型来描述时,本文就可以解决这个问题。文章还讨论了作为领域级收入分布函数的贫困指标的估算问题,并使用了非参数自举程序来估算均方误差和置信区间。利用基于模型的模拟实验以及基于孟加拉国贫困数据的实证研究,将所提出的方法与著名的世界银行贫困绘图方法进行了比较。
{"title":"Small Area Poverty Estimation under Heteroskedasticity","authors":"Sumonkanti Das, Ray Chambers","doi":"10.1093/jssam/smad045","DOIUrl":"https://doi.org/10.1093/jssam/smad045","url":null,"abstract":"\u0000 Multilevel models with nested errors are widely used in poverty estimation. An important application in this context is estimating the distribution of poverty as defined by the distribution of income within a set of domains that cover the population of interest. Since unit-level values of income are usually heteroskedastic, the standard homoskedasticity assumptions implicit in popular multilevel models may not be appropriate and can lead to bias, particularly when used to estimate domain-specific income distributions. This article addresses this problem when the income values in the population of interest can be characterized by a two-level mixed linear model with independent and identically distributed domain effects and with independent but not identically distributed individual effects. Estimation of poverty indicators that are functionals of domain-level income distributions is also addressed, and a nonparametric bootstrap procedure is used to estimate mean squared errors and confidence intervals. The proposed methodology is compared with the well-known World Bank poverty mapping methodology for this situation, using model-based simulation experiments as well as an empirical study based on Bangladesh poverty data.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"50 9","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139441260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Respondent Attention to Experimental Text Lengths 调查受访者对实验文本长度的关注度
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-01-04 DOI: 10.1093/jssam/smad044
Tobias Rettig, A. Blom
Whether respondents pay adequate attention to a questionnaire has long been of concern to survey researchers. In this study, we measure respondents’ attention with an instruction manipulation check. We investigate which respondents read question texts of experimentally varied lengths and which become inattentive in a probability-based online panel of the German population. We find that respondent attention is closely linked to text length. Individual response speed is strongly correlated with respondent attention, but a fixed cutoff time is unsuitable as a standalone attention indicator. Differing levels of attention are also associated with respondents’ age, gender, education, panel experience, and the device used to complete the survey. Removal of inattentive respondents is thus likely to result in a biased remaining sample. Instead, questions should be curtailed to encourage respondents of different backgrounds and abilities to read them attentively and provide optimized answers.
长期以来,调查研究人员一直关注受访者是否对问卷给予了足够的关注。在本研究中,我们通过指令操作检查来测量受访者的注意力。我们调查了在一个基于概率的德国人口在线小组中,哪些受访者会阅读实验性不同长度的问题文本,哪些受访者会变得注意力不集中。我们发现,受访者的注意力与文本长度密切相关。个人的反应速度与受访者的注意力密切相关,但固定的截止时间并不适合作为独立的注意力指标。不同的注意力水平还与受访者的年龄、性别、教育程度、小组经验以及完成调查所使用的设备有关。因此,剔除注意力不集中的受访者可能会导致剩余样本出现偏差。因此,应减少问题的数量,以鼓励不同背景和能力的受访者专心阅读问题并提供最佳答案。
{"title":"Investigating Respondent Attention to Experimental Text Lengths","authors":"Tobias Rettig, A. Blom","doi":"10.1093/jssam/smad044","DOIUrl":"https://doi.org/10.1093/jssam/smad044","url":null,"abstract":"\u0000 Whether respondents pay adequate attention to a questionnaire has long been of concern to survey researchers. In this study, we measure respondents’ attention with an instruction manipulation check. We investigate which respondents read question texts of experimentally varied lengths and which become inattentive in a probability-based online panel of the German population. We find that respondent attention is closely linked to text length. Individual response speed is strongly correlated with respondent attention, but a fixed cutoff time is unsuitable as a standalone attention indicator. Differing levels of attention are also associated with respondents’ age, gender, education, panel experience, and the device used to complete the survey. Removal of inattentive respondents is thus likely to result in a biased remaining sample. Instead, questions should be curtailed to encourage respondents of different backgrounds and abilities to read them attentively and provide optimized answers.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"35 7","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139385171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Catch-22—the Test–Retest Method of Reliability Estimation 自相矛盾--可靠性估计的测试-重测法
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-12-20 DOI: 10.1093/jssam/smad043
Paula A. Tufiș, D. Alwin, Daniel N Ramírez
This article addresses the problems with the traditional reinterview approach to estimating the reliability of survey measures. Using data from three reinterview (or panel) studies conducted by the General Social Survey, we investigate the differences between the two-wave correlational approach embodied by the traditional reinterview strategy, compared to estimates of reliability that take the stability of traits into account based on a three-wave model. Our results indicate that the problems identified with the two-wave correlational approach reflect a kind of “Catch-22” in the sense that the only solution to the problem is denied by the approach itself. Specifically, we show that the correctly specified two-wave model, which includes the potential for true change in the latent variable, is underidentified, and thus, unless one is willing to make some potentially risky assumptions, reliability parameters are not estimable. This article compares the two-wave correlational approach to an alternative model for estimating reliability, Heise’s estimates based on the three-wave simplex model. Using three waves of data from the GSS panels, which were separated by 2-year intervals between waves, this article examines the conditions under which the wave-1, wave-2 correlations which do not take stability into account approximate the reliability estimate obtained from three-wave simplex models that do take stability into account. The results lead to the conclusion that the differences between estimates depend on the stability and/or fixed nature of the underlying processes involved. Few if any differences are identified when traits are fixed or highly stable, but for traits involving changes in the underlying traits the differences can be quite large, and thus, we argue for the superiority of reinterview designs that involve more than 2 waves in the estimation of reliability parameters.
本文探讨了传统的重新访谈法在估算调查措施可靠性方面存在的问题。我们利用 "综合社会调查 "进行的三项再访谈(或小组)研究的数据,研究了传统再访谈策略所体现的两波相关法与基于三波模型考虑特质稳定性的可靠性估计法之间的差异。我们的研究结果表明,两波相关法发现的问题反映了一种 "Catch-22",即解决问题的唯一方法被该方法本身所否定。具体来说,我们表明,正确指定的两波模型(包括潜在变量真实变化的可能性)识别不足,因此,除非人们愿意做出一些潜在的风险假设,否则可靠性参数是无法估计的。本文将两波相关法与另一种可靠性估计模型--海斯基于三波简单模型的估计法--进行了比较。本文使用了来自全球抽样调查面板的三波数据(波与波之间的间隔为两年),研究了在什么条件下,不考虑稳定性的第一波、第二波相关性与考虑稳定性的三波单纯模型所得到的可靠性估计值相近。结果得出的结论是,估计值之间的差异取决于所涉及的基本过程的稳定性和/或固定性。在特征固定或高度稳定的情况下,即使有差异也很小,但对于涉及基础特征变化的特征,差异可能相当大。
{"title":"A Catch-22—the Test–Retest Method of Reliability Estimation","authors":"Paula A. Tufiș, D. Alwin, Daniel N Ramírez","doi":"10.1093/jssam/smad043","DOIUrl":"https://doi.org/10.1093/jssam/smad043","url":null,"abstract":"\u0000 This article addresses the problems with the traditional reinterview approach to estimating the reliability of survey measures. Using data from three reinterview (or panel) studies conducted by the General Social Survey, we investigate the differences between the two-wave correlational approach embodied by the traditional reinterview strategy, compared to estimates of reliability that take the stability of traits into account based on a three-wave model. Our results indicate that the problems identified with the two-wave correlational approach reflect a kind of “Catch-22” in the sense that the only solution to the problem is denied by the approach itself. Specifically, we show that the correctly specified two-wave model, which includes the potential for true change in the latent variable, is underidentified, and thus, unless one is willing to make some potentially risky assumptions, reliability parameters are not estimable. This article compares the two-wave correlational approach to an alternative model for estimating reliability, Heise’s estimates based on the three-wave simplex model. Using three waves of data from the GSS panels, which were separated by 2-year intervals between waves, this article examines the conditions under which the wave-1, wave-2 correlations which do not take stability into account approximate the reliability estimate obtained from three-wave simplex models that do take stability into account. The results lead to the conclusion that the differences between estimates depend on the stability and/or fixed nature of the underlying processes involved. Few if any differences are identified when traits are fixed or highly stable, but for traits involving changes in the underlying traits the differences can be quite large, and thus, we argue for the superiority of reinterview designs that involve more than 2 waves in the estimation of reliability parameters.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"36 20","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138955719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Poverty Mapping Under Area-Level Random Regression Coefficient Poisson Models 地区级随机回归系数泊松模型下的贫困绘图
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-11-29 DOI: 10.1093/jssam/smad036
Naomi Diz-Rosales, M. Lombardía, Domingo Morales
Under an area-level random regression coefficient Poisson model, this article derives small area predictors of counts and proportions and introduces bootstrap estimators of the mean squared errors (MSEs). The maximum likelihood estimators of the model parameters and the mode predictors of the random effects are calculated by a Laplace approximation algorithm. Simulation experiments are implemented to investigate the behavior of the fitting algorithm, the predictors, and the MSE estimators with and without bias correction. The new statistical methodology is applied to data from the Spanish Living Conditions Survey. The target is to estimate the proportions of women and men under the poverty line by province.
在区域级随机回归系数泊松模型下,本文推导了计数和比例的小区域预测因子,并引入了均方误差(MSE)的自引导估计值。模型参数的最大似然估计值和随机效应的模式预测值是通过拉普拉斯近似算法计算得出的。通过模拟实验,研究了拟合算法、预测器和有无偏差校正的 MSE 估计器的行为。新的统计方法适用于西班牙生活条件调查的数据。目标是估算各省处于贫困线以下的男女比例。
{"title":"Poverty Mapping Under Area-Level Random Regression Coefficient Poisson Models","authors":"Naomi Diz-Rosales, M. Lombardía, Domingo Morales","doi":"10.1093/jssam/smad036","DOIUrl":"https://doi.org/10.1093/jssam/smad036","url":null,"abstract":"Under an area-level random regression coefficient Poisson model, this article derives small area predictors of counts and proportions and introduces bootstrap estimators of the mean squared errors (MSEs). The maximum likelihood estimators of the model parameters and the mode predictors of the random effects are calculated by a Laplace approximation algorithm. Simulation experiments are implemented to investigate the behavior of the fitting algorithm, the predictors, and the MSE estimators with and without bias correction. The new statistical methodology is applied to data from the Spanish Living Conditions Survey. The target is to estimate the proportions of women and men under the poverty line by province.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"23 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139214258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Peekaboo! The Effect of Different Visible Cash Display and Amount Options During Mail Contact When Recruiting to a Probability-Based Panel 躲躲猫!招聘到一个基于概率的小组时,不同的可见现金显示和金额选项在邮件联系期间的影响
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-11-09 DOI: 10.1093/jssam/smad039
Ipek Bilgen, David Dutwin, Roopam Singh, Erlina Hendarwan
Abstract Recent studies consistently showed that making cash visible with a windowed envelope during mail contact increases response rates in surveys. The visible cash aims to pique interest and encourage sampled households to open the envelope. This article extends prior research by examining the effect of additional interventions implemented during mail recruitment to a survey panel on recruitment rates and costs. Specifically, we implemented randomized experiments to examine size (small, large) and location (none, front, back) of the window displaying cash, combined with what part of the cash is shown through the window envelope (numeric amount, face/image), and various prepaid incentive amounts (two $1, one $2, one $5). We used the recruitment effort for NORC’s AmeriSpeak Panel as the data source for this study. The probability-based AmeriSpeak Panel uses an address-based sample and multiple modes of respondent contact, including mail, phone, and in-person outreach during recruitment. Our results were consistent with prior research and showed significant improvement in recruitment rates when cash was displayed through a window during mail contact. We also found that placing the window on the front of the envelope, showing $5 through the envelope compared to $2 and $1, and showing the tender amount compared to the image on the cash through the window were more likely to improve the recruitment rates. Our cost analyses illustrated that the cost difference in printing window versus no window envelope is small. There is no difference in printing cost between front window and back window as they both require custom manufacturing. There is also no cost difference in printing envelopes with small windows versus large windows. Lastly, we found no evidence of mail theft based on our review of the United States Postal Service’s “track and trace” reports, seed mailings sent to staff, and undeliverable mailing rates.
最近的研究一致表明,在邮件联系过程中,用带窗口的信封显示现金可以增加调查的回复率。这些看得见的现金旨在激起人们的兴趣,并鼓励被抽样的家庭打开信封。本文通过检查邮件招聘期间实施的额外干预措施对招聘率和成本的影响,扩展了先前的研究。具体来说,我们实施了随机实验来检查展示现金的窗口的大小(小、大)和位置(没有、前面、后面),并结合窗口信封显示的现金部分(数字金额、人脸/图像)和各种预付奖励金额(两个1美元、一个2美元、一个5美元)。我们使用NORC的AmeriSpeak Panel的招聘工作作为本研究的数据源。基于概率的AmeriSpeak Panel使用基于地址的样本和多种受访者联系模式,包括邮件,电话和招聘期间的亲自外展。我们的结果与之前的研究一致,表明在邮件联系时通过窗口显示现金,招聘率有了显著提高。我们还发现,在信封正面放置窗口,通过信封显示5美元,而不是2美元和1美元,通过窗口显示招标金额,而不是现金上的图像,更有可能提高招聘率。我们的成本分析表明,印刷窗口与无窗口信封的成本差异很小。前窗和后窗的打印成本没有区别,因为它们都需要定制制造。打印小窗口和大窗口的信封也没有成本差异。最后,根据我们对美国邮政服务的“跟踪和追踪”报告、发送给员工的种子邮件和无法投递的邮件率的审查,我们没有发现邮件被盗的证据。
{"title":"Peekaboo! The Effect of Different Visible Cash Display and Amount Options During Mail Contact When Recruiting to a Probability-Based Panel","authors":"Ipek Bilgen, David Dutwin, Roopam Singh, Erlina Hendarwan","doi":"10.1093/jssam/smad039","DOIUrl":"https://doi.org/10.1093/jssam/smad039","url":null,"abstract":"Abstract Recent studies consistently showed that making cash visible with a windowed envelope during mail contact increases response rates in surveys. The visible cash aims to pique interest and encourage sampled households to open the envelope. This article extends prior research by examining the effect of additional interventions implemented during mail recruitment to a survey panel on recruitment rates and costs. Specifically, we implemented randomized experiments to examine size (small, large) and location (none, front, back) of the window displaying cash, combined with what part of the cash is shown through the window envelope (numeric amount, face/image), and various prepaid incentive amounts (two $1, one $2, one $5). We used the recruitment effort for NORC’s AmeriSpeak Panel as the data source for this study. The probability-based AmeriSpeak Panel uses an address-based sample and multiple modes of respondent contact, including mail, phone, and in-person outreach during recruitment. Our results were consistent with prior research and showed significant improvement in recruitment rates when cash was displayed through a window during mail contact. We also found that placing the window on the front of the envelope, showing $5 through the envelope compared to $2 and $1, and showing the tender amount compared to the image on the cash through the window were more likely to improve the recruitment rates. Our cost analyses illustrated that the cost difference in printing window versus no window envelope is small. There is no difference in printing cost between front window and back window as they both require custom manufacturing. There is also no cost difference in printing envelopes with small windows versus large windows. Lastly, we found no evidence of mail theft based on our review of the United States Postal Service’s “track and trace” reports, seed mailings sent to staff, and undeliverable mailing rates.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Correcting Selection Bias in Big Data by Pseudo-Weighting 修正:用伪加权法修正大数据中的选择偏差
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-11-09 DOI: 10.1093/jssam/smad042
{"title":"Correction to: Correcting Selection Bias in Big Data by Pseudo-Weighting","authors":"","doi":"10.1093/jssam/smad042","DOIUrl":"https://doi.org/10.1093/jssam/smad042","url":null,"abstract":"","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1