首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
Constructing State and National Estimates of Vaccination Rates from Immunization Information Systems 从免疫信息系统构建州和国家疫苗接种率估计
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-02-07 DOI: 10.1093/jssam/smac042
T. Raghunathan, K. Kirtland, Ji Li, K. White, B. Murthy, Xia Lin, Latreace Harris, L. Gibbs-Scharf, E. Zell
Immunization Information Systems are confidential computerized population-based systems that collect data from vaccination providers on individual vaccinations administered along with limited patient-level characteristics. Through a data use agreement, Centers for Disease Control and Prevention obtains the individual-level data and aggregates the number of vaccinations for geographical statistical areas defined by the US Census Bureau (counties or equivalent statistical entities) for each vaccine included in system. Currently, 599 counties, covering 11 states, collect and report data using a uniform protocol. We combine these data with inter-decennial population counts from the Population Estimates Program in the US Census Bureau and several covariates from a variety of sources to develop model-based estimates for each of the 3,142 counties in 50 states and the District of Columbia and then aggregate to the state and national levels. We use a hierarchical Bayesian model and Markov Chain Monte Carlo methods to obtain draws from the posterior predictive distribution of the vaccination rates. We use posterior predictive checks and cross-validation to assess the goodness of fit and to validate the models. We also compare the model-based estimates to direct estimates from the National Immunization Surveys.
免疫信息系统是一种保密的基于人群的计算机化系统,从疫苗接种提供者那里收集个人疫苗接种的数据,以及有限的患者水平特征。通过数据使用协议,疾病控制和预防中心获得个人层面的数据,并汇总美国人口普查局(县或同等统计实体)为系统中包含的每种疫苗定义的地理统计区域的疫苗接种数量。目前,覆盖11个州的599个县使用统一协议收集和报告数据。我们将这些数据与美国人口普查局人口估计项目的十年一次的人口统计以及各种来源的几个协变量相结合,为50个州和哥伦比亚特区的3142个县中的每个县制定基于模型的估计,然后汇总到州和国家层面。我们使用分层贝叶斯模型和马尔可夫链蒙特卡罗方法从疫苗接种率的后验预测分布中获得结果。我们使用后验预测检验和交叉验证来评估拟合优度并验证模型。我们还将基于模型的估计与国家免疫调查的直接估计进行了比较。
{"title":"Constructing State and National Estimates of Vaccination Rates from Immunization Information Systems","authors":"T. Raghunathan, K. Kirtland, Ji Li, K. White, B. Murthy, Xia Lin, Latreace Harris, L. Gibbs-Scharf, E. Zell","doi":"10.1093/jssam/smac042","DOIUrl":"https://doi.org/10.1093/jssam/smac042","url":null,"abstract":"\u0000 Immunization Information Systems are confidential computerized population-based systems that collect data from vaccination providers on individual vaccinations administered along with limited patient-level characteristics. Through a data use agreement, Centers for Disease Control and Prevention obtains the individual-level data and aggregates the number of vaccinations for geographical statistical areas defined by the US Census Bureau (counties or equivalent statistical entities) for each vaccine included in system. Currently, 599 counties, covering 11 states, collect and report data using a uniform protocol. We combine these data with inter-decennial population counts from the Population Estimates Program in the US Census Bureau and several covariates from a variety of sources to develop model-based estimates for each of the 3,142 counties in 50 states and the District of Columbia and then aggregate to the state and national levels. We use a hierarchical Bayesian model and Markov Chain Monte Carlo methods to obtain draws from the posterior predictive distribution of the vaccination rates. We use posterior predictive checks and cross-validation to assess the goodness of fit and to validate the models. We also compare the model-based estimates to direct estimates from the National Immunization Surveys.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"1 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41952610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Application of Adaptive Cluster Sampling to Surveying Informal Businesses 自适应聚类抽样在非正式企业调查中的应用
4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-01-27 DOI: 10.1093/jssam/smac037
Gemechu Aga, David C Francis, Filip Jolevski, Jorge Rodriguez Meza, Joshua Seth Wimpey
Abstract Informal business activity is ubiquitous around the world, but it is nearly always uncaptured by administrative data, registries, or commercial sources. For this reason, there are rarely adequate sampling frames available for survey implementers wishing to measure the activity and characteristics of the sector. This article applies a well-established sampling method for rare and/or clustered populations, Adaptive Cluster Sampling (ACS), to a novel population of informal businesses. Generally, it shows that efficiency gains through the application of ACS, when compared to Simple Random Sampling (SRS), are large, particularly at higher levels of fieldwork effort. In particular, ACS efficiency gains over SRS remain sizable at higher values of initial starting samples, but with comparatively high expansion thresholds, which can reduce the fieldwork effort.
非正式的商业活动在世界各地无处不在,但它几乎总是不被管理数据、注册表或商业来源所捕获。因此,很少有足够的抽样框架可供希望衡量该部门的活动和特征的调查执行者使用。本文将一种成熟的针对罕见和/或群集人口的抽样方法——自适应群集抽样(ACS)——应用于一种新的非正式企业群体。总的来说,与简单随机抽样(SRS)相比,应用ACS的效率提高很大,特别是在较高水平的现场工作中。特别是,在较高的初始起始样本值下,ACS效率比SRS的收益仍然相当可观,但膨胀阈值相对较高,这可能会减少现场工作的工作量。
{"title":"An Application of Adaptive Cluster Sampling to Surveying Informal Businesses","authors":"Gemechu Aga, David C Francis, Filip Jolevski, Jorge Rodriguez Meza, Joshua Seth Wimpey","doi":"10.1093/jssam/smac037","DOIUrl":"https://doi.org/10.1093/jssam/smac037","url":null,"abstract":"Abstract Informal business activity is ubiquitous around the world, but it is nearly always uncaptured by administrative data, registries, or commercial sources. For this reason, there are rarely adequate sampling frames available for survey implementers wishing to measure the activity and characteristics of the sector. This article applies a well-established sampling method for rare and/or clustered populations, Adaptive Cluster Sampling (ACS), to a novel population of informal businesses. Generally, it shows that efficiency gains through the application of ACS, when compared to Simple Random Sampling (SRS), are large, particularly at higher levels of fieldwork effort. In particular, ACS efficiency gains over SRS remain sizable at higher values of initial starting samples, but with comparatively high expansion thresholds, which can reduce the fieldwork effort.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135794712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Interviewer Fraud Using Multilevel Models 利用多层次模型检测面试官欺诈
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-01-02 DOI: 10.1093/jssam/smac036
Lukas Olbrich, Yuliya Kosyakova, J. Sakshaug, Silvia Schwanhäuser
Interviewer falsification, such as the complete or partial fabrication of interview data, has been shown to substantially affect the results of survey data. In this study, we apply a method to identify falsifying face-to-face interviewers based on the development of their behavior over the survey field period. We postulate four potential falsifier types: steady low-effort falsifiers, steady high-effort falsifiers, learning falsifiers, and sudden falsifiers. Using large-scale survey data from Germany with verified falsifications, we apply multilevel models with interviewer effects on the intercept, scale, and slope of the interview sequence to test whether falsifiers can be detected based on their dynamic behavior. In addition to identifying a rather high-effort falsifier previously detected by the survey organization, the model flagged two additional suspicious interviewers exhibiting learning behavior, who were subsequently classified as deviant by the survey organization. We additionally apply the analysis approach to publicly available cross-national survey data and find multiple interviewers who show behavior consistent with the postulated falsifier types.
采访者造假,如完全或部分伪造采访数据,已被证明会对调查数据的结果产生重大影响。在这项研究中,我们应用了一种方法,根据他们在调查期间的行为发展来识别伪造的面对面采访者。我们假设了四种潜在的证伪者类型:稳定的低努力证伪者、稳定的高努力证伪器、学习证伪器和突然证伪器。利用来自德国的大规模调查数据,我们对采访序列的截距、规模和斜率应用了具有采访者效应的多层次模型,以测试是否可以根据造假者的动态行为来检测造假者。除了识别调查组织之前检测到的一个相当努力的造假者外,该模型还标记了另外两名表现出学习行为的可疑受访者,他们随后被调查组织归类为离经叛道者。此外,我们将分析方法应用于公开的跨国调查数据,并发现多名受访者的行为与假设的证伪者类型一致。
{"title":"Detecting Interviewer Fraud Using Multilevel Models","authors":"Lukas Olbrich, Yuliya Kosyakova, J. Sakshaug, Silvia Schwanhäuser","doi":"10.1093/jssam/smac036","DOIUrl":"https://doi.org/10.1093/jssam/smac036","url":null,"abstract":"\u0000 Interviewer falsification, such as the complete or partial fabrication of interview data, has been shown to substantially affect the results of survey data. In this study, we apply a method to identify falsifying face-to-face interviewers based on the development of their behavior over the survey field period. We postulate four potential falsifier types: steady low-effort falsifiers, steady high-effort falsifiers, learning falsifiers, and sudden falsifiers. Using large-scale survey data from Germany with verified falsifications, we apply multilevel models with interviewer effects on the intercept, scale, and slope of the interview sequence to test whether falsifiers can be detected based on their dynamic behavior. In addition to identifying a rather high-effort falsifier previously detected by the survey organization, the model flagged two additional suspicious interviewers exhibiting learning behavior, who were subsequently classified as deviant by the survey organization. We additionally apply the analysis approach to publicly available cross-national survey data and find multiple interviewers who show behavior consistent with the postulated falsifier types.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42430170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Web Survey Mode and Panel Effects in a Nationwide Survey of Alcohol Use. 估计全国酒精使用调查的网络调查模式和小组效应。
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2022-11-02 eCollection Date: 2023-11-01 DOI: 10.1093/jssam/smac028
Randal ZuWallack, Matt Jans, Thomas Brassell, Kisha Bailly, James Dayton, Priscilla Martinez, Deidre Patterson, Thomas K Greenfield, Katherine J Karriker-Jaffe

Random-digit dialing (RDD) telephone surveys are challenged by declining response rates and increasing costs. Many surveys that were traditionally conducted via telephone are seeking cost-effective alternatives, such as address-based sampling (ABS) with self-administered web or mail questionnaires. At a fraction of the cost of both telephone and ABS surveys, opt-in web panels are an attractive alternative. The 2019-2020 National Alcohol Survey (NAS) employed three methods: (1) an RDD telephone survey (traditional NAS method); (2) an ABS push-to-web survey; and (3) an opt-in web panel. The study reported here evaluated differences in the three data-collection methods, which we will refer to as "mode effects," on alcohol consumption and health topics. To evaluate mode effects, multivariate regression models were developed predicting these characteristics, and the presence of a mode effect on each outcome was determined by the significance of the three-level effect (RDD-telephone, ABS-web, opt-in web panel) in each model. Those results were then used to adjust for mode effects and produce a "telephone-equivalent" estimate for the ABS and panel data sources. The study found that ABS-web and RDD were similar for most estimates but exhibited differences for sensitive questions including getting drunk and experiencing depression. The opt-in web panel exhibited more differences between it and the other two survey modes. One notable example is the reporting of drinking alcohol at least 3-4 times per week, which was 21 percent for RDD-phone, 24 percent for ABS-web, and 34 percent for opt-in web panel. The regression model adjusts for mode effects, improving comparability with past surveys conducted by telephone; however, the models result in higher variance of the estimates. This method of adjusting for mode effects has broad applications to mode and sample transitions throughout the survey research industry.

随机数字拨号(RDD)电话调查受到回复率下降和成本增加的挑战。许多传统上通过电话进行的调查正在寻求具有成本效益的替代方案,例如基于地址的抽样(ABS)与自我管理的网络或邮件问卷。与电话调查和ABS调查相比,选择加入的网页面板是一个很有吸引力的选择。2019-2020年全国酒精调查(NAS)采用了三种方法:(1)RDD电话调查(传统的NAS方法);(2) ABS推送到网页的调查;(3)一个可选择的网络面板。这里报告的研究评估了三种数据收集方法的差异,我们将其称为“模式效应”,在酒精消费和健康主题上。为了评估模式效应,我们建立了预测这些特征的多元回归模型,并通过每个模型中三层次效应(RDD-telephone, ABS-web, option -in web panel)的显著性来确定模式效应对每个结果的影响。然后,这些结果被用于调整模式效应,并为ABS和面板数据源产生“电话等效”估计。研究发现,ABS-web和RDD在大多数估计上是相似的,但在诸如醉酒和抑郁等敏感问题上表现出差异。可选择的网络面板显示出它与其他两种调查模式之间的差异。一个值得注意的例子是报告每周至少饮酒3-4次,其中RDD-phone为21%,ABS-web为24%,option -in web面板为34%。回归模型调整了模式效应,提高了与以往电话调查的可比性;然而,这些模型导致估计的方差较大。这种调整模式效应的方法在整个调查研究行业的模式和样本过渡中有着广泛的应用。
{"title":"Estimating Web Survey Mode and Panel Effects in a Nationwide Survey of Alcohol Use.","authors":"Randal ZuWallack, Matt Jans, Thomas Brassell, Kisha Bailly, James Dayton, Priscilla Martinez, Deidre Patterson, Thomas K Greenfield, Katherine J Karriker-Jaffe","doi":"10.1093/jssam/smac028","DOIUrl":"https://doi.org/10.1093/jssam/smac028","url":null,"abstract":"<p><p>Random-digit dialing (RDD) telephone surveys are challenged by declining response rates and increasing costs. Many surveys that were traditionally conducted via telephone are seeking cost-effective alternatives, such as address-based sampling (ABS) with self-administered web or mail questionnaires. At a fraction of the cost of both telephone and ABS surveys, opt-in web panels are an attractive alternative. The 2019-2020 National Alcohol Survey (NAS) employed three methods: (1) an RDD telephone survey (traditional NAS method); (2) an ABS push-to-web survey; and (3) an opt-in web panel. The study reported here evaluated differences in the three data-collection methods, which we will refer to as \"mode effects,\" on alcohol consumption and health topics. To evaluate mode effects, multivariate regression models were developed predicting these characteristics, and the presence of a mode effect on each outcome was determined by the significance of the three-level effect (RDD-telephone, ABS-web, opt-in web panel) in each model. Those results were then used to adjust for mode effects and produce a \"telephone-equivalent\" estimate for the ABS and panel data sources. The study found that ABS-web and RDD were similar for most estimates but exhibited differences for sensitive questions including getting drunk and experiencing depression. The opt-in web panel exhibited more differences between it and the other two survey modes. One notable example is the reporting of drinking alcohol at least 3-4 times per week, which was 21 percent for RDD-phone, 24 percent for ABS-web, and 34 percent for opt-in web panel. The regression model adjusts for mode effects, improving comparability with past surveys conducted by telephone; however, the models result in higher variance of the estimates. This method of adjusting for mode effects has broad applications to mode and sample transitions throughout the survey research industry.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"11 5","pages":"1089-1109"},"PeriodicalIF":2.1,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138460650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Capture-Recapture Methodology to Enhance Precision of Representative Sampling-Based Case Count Estimates. 使用捕获-再捕获方法提高具有代表性的基于抽样的病例计数估计的精度。
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2022-11-01 DOI: 10.1093/jssam/smab052
Robert H Lyles, Yuzi Zhang, Lin Ge, Cameron England, Kevin Ward, Timothy L Lash, Lance A Waller

The application of serial principled sampling designs for diagnostic testing is often viewed as an ideal approach to monitoring prevalence and case counts of infectious or chronic diseases. Considering logistics and the need for timeliness and conservation of resources, surveillance efforts can generally benefit from creative designs and accompanying statistical methods to improve the precision of sampling-based estimates and reduce the size of the necessary sample. One option is to augment the analysis with available data from other surveillance streams that identify cases from the population of interest over the same timeframe, but may do so in a highly nonrepresentative manner. We consider monitoring a closed population (e.g., a long-term care facility, patient registry, or community), and encourage the use of capture-recapture methodology to produce an alternative case total estimate to the one obtained by principled sampling. With care in its implementation, even a relatively small simple or stratified random sample not only provides its own valid estimate, but provides the only fully defensible means of justifying a second estimate based on classical capture-recapture methods. We initially propose weighted averaging of the two estimators to achieve greater precision than can be obtained using either alone, and then show how a novel single capture-recapture estimator provides a unified and preferable alternative. We develop a variant on a Dirichlet-multinomial-based credible interval to accompany our hybrid design-based case count estimates, with a view toward improved coverage properties. Finally, we demonstrate the benefits of the approach through simulations designed to mimic an acute infectious disease daily monitoring program or an annual surveillance program to quantify new cases within a fixed patient registry.

在诊断测试中应用连续原则抽样设计通常被视为监测传染病或慢性病流行率和病例数的理想方法。考虑到后勤和对及时性和资源保护的需要,监测工作通常可以受益于创造性的设计和附带的统计方法,以提高基于抽样的估计的精度,并减少必要样本的规模。一种选择是利用来自其他监测流的可用数据来增强分析,这些数据可以在同一时间段内从感兴趣的人群中识别病例,但这样做可能极不具有代表性。我们考虑监测一个封闭的人群(例如,长期护理机构、患者登记处或社区),并鼓励使用捕获-再捕获方法来产生一个替代原则抽样获得的病例总数估计。在其实施过程中,即使是一个相对较小的简单或分层随机样本,也不仅提供了它自己的有效估计,而且提供了唯一完全可辩护的方法来证明基于经典捕获-再捕获方法的第二次估计。我们最初提出了两个估计器的加权平均,以获得比单独使用任一估计器更高的精度,然后展示了一个新的单一捕获-再捕获估计器如何提供统一和更好的替代方案。我们开发了一种基于dirichlet -多项式的可信区间的变体,以配合我们基于混合设计的病例计数估计,以改进覆盖特性。最后,我们通过模拟急性传染病每日监测计划或年度监测计划来量化固定患者登记册内的新病例,展示了该方法的好处。
{"title":"Using Capture-Recapture Methodology to Enhance Precision of Representative Sampling-Based Case Count Estimates.","authors":"Robert H Lyles,&nbsp;Yuzi Zhang,&nbsp;Lin Ge,&nbsp;Cameron England,&nbsp;Kevin Ward,&nbsp;Timothy L Lash,&nbsp;Lance A Waller","doi":"10.1093/jssam/smab052","DOIUrl":"https://doi.org/10.1093/jssam/smab052","url":null,"abstract":"<p><p>The application of serial principled sampling designs for diagnostic testing is often viewed as an ideal approach to monitoring prevalence and case counts of infectious or chronic diseases. Considering logistics and the need for timeliness and conservation of resources, surveillance efforts can generally benefit from creative designs and accompanying statistical methods to improve the precision of sampling-based estimates and reduce the size of the necessary sample. One option is to augment the analysis with available data from other surveillance streams that identify cases from the population of interest over the same timeframe, but may do so in a highly nonrepresentative manner. We consider monitoring a closed population (e.g., a long-term care facility, patient registry, or community), and encourage the use of capture-recapture methodology to produce an alternative case total estimate to the one obtained by principled sampling. With care in its implementation, even a relatively small simple or stratified random sample not only provides its own valid estimate, but provides the only fully defensible means of justifying a second estimate based on classical capture-recapture methods. We initially propose weighted averaging of the two estimators to achieve greater precision than can be obtained using either alone, and then show how a novel single capture-recapture estimator provides a unified and preferable alternative. We develop a variant on a Dirichlet-multinomial-based credible interval to accompany our hybrid design-based case count estimates, with a view toward improved coverage properties. Finally, we demonstrate the benefits of the approach through simulations designed to mimic an acute infectious disease daily monitoring program or an annual surveillance program to quantify new cases within a fixed patient registry.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"10 5","pages":"1292-1318"},"PeriodicalIF":2.1,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643167/pdf/smab052.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9785848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Simple Question Goes a Long Way: A Wording Experiment on Bank Account Ownership. 一个简单的问题走了很长的路:关于银行账户所有权的措辞实验。
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2022-11-01 DOI: 10.1093/jssam/smab045
Marco Angrisani, Mick P Couper

Ownership of a bank account is an objective measure and should be relatively easy to elicit via survey questions. Yet, depending on the interview mode, the wording of the question and its placement within the survey may influence respondents' answers. The Health and Retirement Study (HRS) asset module, as administered online to members of the Understanding America Study (UAS), yielded substantially lower rates of reported bank account ownership than either a single question on ownership in the Current Population Survey (CPS) or the full asset module administered to HRS panelists (both interviewer-administered surveys). We designed and implemented an experiment in the UAS comparing the original HRS question eliciting bank account ownership with two alternative versions that were progressively simplified. We document strong evidence that the original question leads to systematic underestimation of bank account ownership. In contrast, the proportion of bank account owners obtained from the simplest alternative version of the question is very similar to the population benchmark estimate. We investigate treatment effect heterogeneity by cognitive ability and financial literacy. We find that questionnaire simplification affects responses of individuals with higher cognitive ability substantially less than those with lower cognitive ability. Our results suggest that high-quality data from surveys start from asking the right questions, which should be as simple and precise as possible and carefully adapted to the mode of interview.

银行账户的所有权是一种客观衡量标准,应该相对容易通过调查问题得出结论。然而,根据访谈方式的不同,问题的措辞及其在调查中的位置可能会影响受访者的答案。健康与退休研究(HRS)资产模块在了解美国研究(UAS)的成员中进行了在线管理,与当前人口调查(CPS)中的单个所有权问题或对HRS小组成员进行的完整资产模块(均为访谈者管理的调查)相比,报告的银行账户拥有率要低得多。我们在UAS中设计并实施了一个实验,将原始的HRS问题与两个逐步简化的替代版本进行比较。我们记录了强有力的证据,证明最初的问题导致了对银行账户所有权的系统性低估。相比之下,从问题的最简单替代版本获得的银行账户所有者比例与人口基准估计值非常相似。我们通过认知能力和金融知识来研究治疗效果的异质性。研究发现,问卷简化对认知能力高的个体的影响明显小于认知能力低的个体。我们的研究结果表明,高质量的调查数据从提出正确的问题开始,这些问题应该尽可能简单准确,并仔细适应访谈模式。
{"title":"A Simple Question Goes a Long Way: A Wording Experiment on Bank Account Ownership.","authors":"Marco Angrisani,&nbsp;Mick P Couper","doi":"10.1093/jssam/smab045","DOIUrl":"https://doi.org/10.1093/jssam/smab045","url":null,"abstract":"<p><p>Ownership of a bank account is an objective measure and should be relatively easy to elicit via survey questions. Yet, depending on the interview mode, the wording of the question and its placement within the survey may influence respondents' answers. The Health and Retirement Study (HRS) asset module, as administered online to members of the Understanding America Study (UAS), yielded substantially lower rates of reported bank account ownership than either a single question on ownership in the Current Population Survey (CPS) or the full asset module administered to HRS panelists (both interviewer-administered surveys). We designed and implemented an experiment in the UAS comparing the original HRS question eliciting bank account ownership with two alternative versions that were progressively simplified. We document strong evidence that the original question leads to systematic underestimation of bank account ownership. In contrast, the proportion of bank account owners obtained from the simplest alternative version of the question is very similar to the population benchmark estimate. We investigate treatment effect heterogeneity by cognitive ability and financial literacy. We find that questionnaire simplification affects responses of individuals with higher cognitive ability substantially less than those with lower cognitive ability. Our results suggest that high-quality data from surveys start from asking the right questions, which should be as simple and precise as possible and carefully adapted to the mode of interview.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"10 5","pages":"1172-1182"},"PeriodicalIF":2.1,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643168/pdf/smab045.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10370660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical Best Prediction of Small Area Means Based on a Unit-Level Gamma-Poisson Model 基于单位水平Gamma-Pisson模型的小区域均值经验最佳预测
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2022-10-07 DOI: 10.1093/jssam/smac026
Emily J. Berg
Existing small area estimation procedures for count data have important limitations. For instance, an M-quantile-based method is known to be less efficient than model-based procedures if the assumptions of the model hold. Also, frequentist inference procedures for Poisson generalized linear mixed models can be computationally intensive or require approximations. Furthermore, area-level models are incapable of incorporating unit-level covariates. We overcome these limitations by developing a small area estimation procedure for a unit-level gamma-Poisson model. The conjugate form of the model permits computationally simple estimation and prediction procedures. We obtain a closed-form expression for the empirical best predictor of the mean as well as a closed-form mean square error estimator. We validate the procedure through simulations. We illustrate the proposed method using a subset of data from the Iowa Seat-Belt Use survey.
现有的计数数据的小面积估计程序具有重要的局限性。例如,如果模型的假设成立,则已知基于M-量的方法不如基于模型的过程有效。此外,泊松广义线性混合模型的频率论推理过程可能是计算密集型的,或者需要近似。此外,区域级别的模型不能包含单位级别的协变量。我们通过开发单位水平伽玛-泊松模型的小面积估计程序来克服这些限制。该模型的共轭形式允许计算上简单的估计和预测过程。我们得到了均值的经验最佳预测器的闭合形式表达式以及闭合形式的均方误差估计器。我们通过仿真验证了该过程。我们使用爱荷华州安全带使用调查的数据子集来说明所提出的方法。
{"title":"Empirical Best Prediction of Small Area Means Based on a Unit-Level Gamma-Poisson Model","authors":"Emily J. Berg","doi":"10.1093/jssam/smac026","DOIUrl":"https://doi.org/10.1093/jssam/smac026","url":null,"abstract":"\u0000 Existing small area estimation procedures for count data have important limitations. For instance, an M-quantile-based method is known to be less efficient than model-based procedures if the assumptions of the model hold. Also, frequentist inference procedures for Poisson generalized linear mixed models can be computationally intensive or require approximations. Furthermore, area-level models are incapable of incorporating unit-level covariates. We overcome these limitations by developing a small area estimation procedure for a unit-level gamma-Poisson model. The conjugate form of the model permits computationally simple estimation and prediction procedures. We obtain a closed-form expression for the empirical best predictor of the mean as well as a closed-form mean square error estimator. We validate the procedure through simulations. We illustrate the proposed method using a subset of data from the Iowa Seat-Belt Use survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"1 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41515091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Classification for Open-Ended Questions with BERT 用BERT实现开放式问题的自动分类
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2022-09-13 DOI: 10.1093/jssam/smad015
Hyukjun Gweon, Matthias Schonlau
Manual coding of text data from open-ended questions into different categories is time consuming and expensive. Automated coding uses statistical/machine learning to train on a small subset of manually-coded text answers. Recently, pretraining a general language model on vast amounts of unrelated data and then adapting the model to the specific application has proven effective in natural language processing. Using two data sets, we empirically investigate whether BERT, the currently dominant pretrained language model, is more effective at automated coding of answers to open-ended questions than other non-pretrained statistical learning approaches. We found fine-tuning the pretrained BERT parameters is essential as otherwise BERT is not competitive. Second, we found fine-tuned BERT barely beats the non-pretrained statistical learning approaches in terms of classification accuracy when trained on 100 manually coded observations. However, BERT’s relative advantage increases rapidly when more manually coded observations (e.g., 200–400) are available for training. We conclude that for automatically coding answers to open-ended questions BERT is preferable to non-pretrained models such as support vector machines and boosting.
将开放式问题的文本数据手动编码为不同类别既耗时又昂贵。自动编码使用统计/机器学习对手动编码的文本答案的一小部分进行训练。最近,在大量不相关的数据上预训练通用语言模型,然后将模型适应特定的应用程序,在自然语言处理中被证明是有效的。使用两个数据集,我们实证研究了目前占主导地位的预训练语言模型BERT在开放式问题答案的自动编码方面是否比其他未经预训练的统计学习方法更有效。我们发现微调预训练的BERT参数是至关重要的,否则BERT就没有竞争力。其次,我们发现,当对100个手动编码的观测值进行训练时,微调后的BERT在分类精度方面几乎没有超过未经预训练的统计学习方法。然而,当有更多手动编码的观测值(例如200–400)可用于训练时,BERT的相对优势迅速增加。我们得出的结论是,对于自动编码开放式问题的答案,BERT比支持向量机和boosting等非预训练模型更可取。
{"title":"Automated Classification for Open-Ended Questions with BERT","authors":"Hyukjun Gweon, Matthias Schonlau","doi":"10.1093/jssam/smad015","DOIUrl":"https://doi.org/10.1093/jssam/smad015","url":null,"abstract":"\u0000 Manual coding of text data from open-ended questions into different categories is time consuming and expensive. Automated coding uses statistical/machine learning to train on a small subset of manually-coded text answers. Recently, pretraining a general language model on vast amounts of unrelated data and then adapting the model to the specific application has proven effective in natural language processing. Using two data sets, we empirically investigate whether BERT, the currently dominant pretrained language model, is more effective at automated coding of answers to open-ended questions than other non-pretrained statistical learning approaches. We found fine-tuning the pretrained BERT parameters is essential as otherwise BERT is not competitive. Second, we found fine-tuned BERT barely beats the non-pretrained statistical learning approaches in terms of classification accuracy when trained on 100 manually coded observations. However, BERT’s relative advantage increases rapidly when more manually coded observations (e.g., 200–400) are available for training. We conclude that for automatically coding answers to open-ended questions BERT is preferable to non-pretrained models such as support vector machines and boosting.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46968013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling Group-Specific Interviewer Effects on Survey Participation Using Separate Coding for Random Slopes in Multilevel Models 基于分层随机斜率独立编码的访谈者群体调查参与效应建模
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2022-09-02 DOI: 10.1093/jssam/smac025
J. Herzing, A. Blom, B. Meuleman
Despite its importance in terms of survey participation, the literature is sparse on how face-to-face interviewers differentially affect specific groups of sample units. This paper demonstrates how an alternative parametrization of the random components in multilevel models, so-called separate coding, delivers valuable insights into differential interviewer effects for specific groups of sample members. In the example of a face-to-face recruitment interview for a probability-based online panel, we detect small interviewer effects regarding survey participation for non-Internet households, whereas we find sizable interviewer effects for Internet households. We derive practical guidance for survey practitioners to address differential interviewer effects based on the proposed variance decomposition.
尽管在调查参与方面很重要,但关于面对面访谈者如何不同地影响特定样本单位群体的文献很少。本文演示了多层模型中随机成分的另一种参数化,即所谓的独立编码,如何为特定样本成员组的差异访谈者效应提供有价值的见解。在一个基于概率的在线小组的面对面招聘面试的例子中,我们发现对于非互联网家庭的调查参与,面试官的影响很小,而对于互联网家庭,我们发现面试官的影响很大。我们为调查从业者提供实用的指导,以解决基于建议的方差分解的不同访谈者效应。
{"title":"Modeling Group-Specific Interviewer Effects on Survey Participation Using Separate Coding for Random Slopes in Multilevel Models","authors":"J. Herzing, A. Blom, B. Meuleman","doi":"10.1093/jssam/smac025","DOIUrl":"https://doi.org/10.1093/jssam/smac025","url":null,"abstract":"\u0000 Despite its importance in terms of survey participation, the literature is sparse on how face-to-face interviewers differentially affect specific groups of sample units. This paper demonstrates how an alternative parametrization of the random components in multilevel models, so-called separate coding, delivers valuable insights into differential interviewer effects for specific groups of sample members. In the example of a face-to-face recruitment interview for a probability-based online panel, we detect small interviewer effects regarding survey participation for non-Internet households, whereas we find sizable interviewer effects for Internet households. We derive practical guidance for survey practitioners to address differential interviewer effects based on the proposed variance decomposition.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43011963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Experimental Evaluation of Two Approaches for Improving Response to Household Screening Efforts in National Mail/Web Surveys. 在全国邮寄/网络调查中,对提高家庭筛查工作响应度的两种方法进行实验性评估。
IF 2.1 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2022-07-12 eCollection Date: 2023-02-01 DOI: 10.1093/jssam/smac024
James Wagner, Brady T West, Mick P Couper, Shiyu Zhang, Rebecca Gatward, Raphael Nishimura, Htay-Wah Saw

Survey researchers have carefully modified their data collection operations for various reasons, including the rising costs of data collection and the ongoing Coronavirus disease (COVID-19) pandemic, both of which have made in-person interviewing difficult. For large national surveys that require household (HH) screening to determine survey eligibility, cost-efficient screening methods that do not include in-person visits need additional evaluation and testing. A new study, known as the American Family Health Study (AFHS), recently initiated data collection with a national probability sample, using a sequential mixed-mode mail/web protocol for push-to-web US HH screening (targeting persons aged 18-49 years). To better understand optimal approaches for this type of national screening effort, we embedded two randomized experiments in the AFHS data collection. The first tested the use of bilingual respondent materials where mailed invitations to the screener were sent in both English and Spanish to 50 percent of addresses with a high predicted likelihood of having a Spanish speaker and 10 percent of all other addresses. We found that the bilingual approach did not increase the response rate of high-likelihood Spanish-speaking addresses, but consistent with prior work, it increased the proportion of eligible Hispanic respondents identified among completed screeners, especially among addresses predicted to have a high likelihood of having Spanish speakers. The second tested a form of nonresponse follow-up, where a subsample of active sampled HHs that had not yet responded to the screening invitations was sent a priority mailing with a $5 incentive, adding to the $2 incentive provided for all sampled HHs in the initial screening invitation. We found this approach to be quite valuable for increasing the screening survey response rate.

由于各种原因,包括数据收集成本上升和冠状病毒病(COVID-19)的持续流行,调查研究人员对其数据收集操作进行了谨慎的修改,这两种原因都给亲自访问带来了困难。对于需要进行家庭(HH)筛查以确定调查资格的大型全国性调查而言,不包括亲自访问的具有成本效益的筛查方法需要进行更多的评估和测试。最近,一项名为 "美国家庭健康研究"(AFHS)的新研究启动了全国概率样本的数据收集工作,该研究采用了一种顺序混合模式邮件/网络方案,对美国家庭(HH)进行 "推送到网络 "筛查(目标人群为 18-49 岁)。为了更好地了解此类全国性筛查工作的最佳方法,我们在 AFHS 数据收集中嵌入了两个随机试验。第一项实验测试了双语受访者材料的使用情况,即以英语和西班牙语两种语言向50%预测可能有讲西班牙语者的地址和10%所有其他地址邮寄筛查邀请函。我们发现,双语方法并没有提高高可能性讲西班牙语地址的回复率,但与之前的工作一致,它提高了在完成筛选者中识别出的符合条件的西班牙裔受访者的比例,尤其是在预测高可能性讲西班牙语的地址中。第二项测试是一种无响应后续行动,即对尚未对筛查邀请函做出响应的有效抽样家庭的子样本优先邮寄 5 美元的奖励,这是在初始筛查邀请函中为所有抽样家庭提供的 2 美元奖励的基础上增加的。我们发现这种方法对于提高筛查调查的回复率非常有价值。
{"title":"An Experimental Evaluation of Two Approaches for Improving Response to Household Screening Efforts in National Mail/Web Surveys.","authors":"James Wagner, Brady T West, Mick P Couper, Shiyu Zhang, Rebecca Gatward, Raphael Nishimura, Htay-Wah Saw","doi":"10.1093/jssam/smac024","DOIUrl":"10.1093/jssam/smac024","url":null,"abstract":"<p><p>Survey researchers have carefully modified their data collection operations for various reasons, including the rising costs of data collection and the ongoing Coronavirus disease (COVID-19) pandemic, both of which have made in-person interviewing difficult. For large national surveys that require household (HH) screening to determine survey eligibility, cost-efficient screening methods that do not include in-person visits need additional evaluation and testing. A new study, known as the American Family Health Study (AFHS), recently initiated data collection with a national probability sample, using a sequential mixed-mode mail/web protocol for push-to-web US HH screening (targeting persons aged 18-49 years). To better understand optimal approaches for this type of national screening effort, we embedded two randomized experiments in the AFHS data collection. The first tested the use of bilingual respondent materials where mailed invitations to the screener were sent in both English and Spanish to 50 percent of addresses with a high predicted likelihood of having a Spanish speaker and 10 percent of all other addresses. We found that the bilingual approach did not increase the response rate of high-likelihood Spanish-speaking addresses, but consistent with prior work, it increased the proportion of eligible Hispanic respondents identified among completed screeners, especially among addresses predicted to have a high likelihood of having Spanish speakers. The second tested a form of nonresponse follow-up, where a subsample of active sampled HHs that had not yet responded to the screening invitations was sent a priority mailing with a $5 incentive, adding to the $2 incentive provided for all sampled HHs in the initial screening invitation. We found this approach to be quite valuable for increasing the screening survey response rate.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"11 1","pages":"124-140"},"PeriodicalIF":2.1,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875245/pdf/smac024.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9169546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1