首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
Improving Statistical Matching when Auxiliary Information is Available 当辅助信息可用时,改进统计匹配
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-02-13 DOI: 10.1093/jssam/smac038
Angelo Moretti, N. Shlomo
There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.
在国家统计研究所内部,人们越来越有兴趣将包含各种社会领域信息的现有数据集结合起来。统计匹配方法可用于通过一组公共变量集成数据源,其中每个数据集包含属于相同目标人群的不同单元。然而,一个常见的问题与在不同数据源中观察到的变量之间的条件独立性假设有关。在这种情况下,可以使用一个包含所有变量的辅助数据集,通过提供在不同数据集上观察到的变量的相关结构信息来改进统计匹配。我们提出通过校准步骤修改辅助数据集的预测模型,并表明我们可以改善各种设置下的统计匹配结果。我们通过模拟和基于欧盟收入和生活条件统计以及英国生活成本和食品调查的应用程序来评估拟议的方法。
{"title":"Improving Statistical Matching when Auxiliary Information is Available","authors":"Angelo Moretti, N. Shlomo","doi":"10.1093/jssam/smac038","DOIUrl":"https://doi.org/10.1093/jssam/smac038","url":null,"abstract":"\u0000 There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48388115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Constructing State and National Estimates of Vaccination Rates from Immunization Information Systems 从免疫信息系统构建州和国家疫苗接种率估计
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-02-07 DOI: 10.1093/jssam/smac042
T. Raghunathan, K. Kirtland, Ji Li, K. White, B. Murthy, Xia Lin, Latreace Harris, L. Gibbs-Scharf, E. Zell
Immunization Information Systems are confidential computerized population-based systems that collect data from vaccination providers on individual vaccinations administered along with limited patient-level characteristics. Through a data use agreement, Centers for Disease Control and Prevention obtains the individual-level data and aggregates the number of vaccinations for geographical statistical areas defined by the US Census Bureau (counties or equivalent statistical entities) for each vaccine included in system. Currently, 599 counties, covering 11 states, collect and report data using a uniform protocol. We combine these data with inter-decennial population counts from the Population Estimates Program in the US Census Bureau and several covariates from a variety of sources to develop model-based estimates for each of the 3,142 counties in 50 states and the District of Columbia and then aggregate to the state and national levels. We use a hierarchical Bayesian model and Markov Chain Monte Carlo methods to obtain draws from the posterior predictive distribution of the vaccination rates. We use posterior predictive checks and cross-validation to assess the goodness of fit and to validate the models. We also compare the model-based estimates to direct estimates from the National Immunization Surveys.
免疫信息系统是一种保密的基于人群的计算机化系统,从疫苗接种提供者那里收集个人疫苗接种的数据,以及有限的患者水平特征。通过数据使用协议,疾病控制和预防中心获得个人层面的数据,并汇总美国人口普查局(县或同等统计实体)为系统中包含的每种疫苗定义的地理统计区域的疫苗接种数量。目前,覆盖11个州的599个县使用统一协议收集和报告数据。我们将这些数据与美国人口普查局人口估计项目的十年一次的人口统计以及各种来源的几个协变量相结合,为50个州和哥伦比亚特区的3142个县中的每个县制定基于模型的估计,然后汇总到州和国家层面。我们使用分层贝叶斯模型和马尔可夫链蒙特卡罗方法从疫苗接种率的后验预测分布中获得结果。我们使用后验预测检验和交叉验证来评估拟合优度并验证模型。我们还将基于模型的估计与国家免疫调查的直接估计进行了比较。
{"title":"Constructing State and National Estimates of Vaccination Rates from Immunization Information Systems","authors":"T. Raghunathan, K. Kirtland, Ji Li, K. White, B. Murthy, Xia Lin, Latreace Harris, L. Gibbs-Scharf, E. Zell","doi":"10.1093/jssam/smac042","DOIUrl":"https://doi.org/10.1093/jssam/smac042","url":null,"abstract":"\u0000 Immunization Information Systems are confidential computerized population-based systems that collect data from vaccination providers on individual vaccinations administered along with limited patient-level characteristics. Through a data use agreement, Centers for Disease Control and Prevention obtains the individual-level data and aggregates the number of vaccinations for geographical statistical areas defined by the US Census Bureau (counties or equivalent statistical entities) for each vaccine included in system. Currently, 599 counties, covering 11 states, collect and report data using a uniform protocol. We combine these data with inter-decennial population counts from the Population Estimates Program in the US Census Bureau and several covariates from a variety of sources to develop model-based estimates for each of the 3,142 counties in 50 states and the District of Columbia and then aggregate to the state and national levels. We use a hierarchical Bayesian model and Markov Chain Monte Carlo methods to obtain draws from the posterior predictive distribution of the vaccination rates. We use posterior predictive checks and cross-validation to assess the goodness of fit and to validate the models. We also compare the model-based estimates to direct estimates from the National Immunization Surveys.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41952610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Application of Adaptive Cluster Sampling to Surveying Informal Businesses 自适应聚类抽样在非正式企业调查中的应用
4区 数学 Q1 Social Sciences Pub Date : 2023-01-27 DOI: 10.1093/jssam/smac037
Gemechu Aga, David C Francis, Filip Jolevski, Jorge Rodriguez Meza, Joshua Seth Wimpey
Abstract Informal business activity is ubiquitous around the world, but it is nearly always uncaptured by administrative data, registries, or commercial sources. For this reason, there are rarely adequate sampling frames available for survey implementers wishing to measure the activity and characteristics of the sector. This article applies a well-established sampling method for rare and/or clustered populations, Adaptive Cluster Sampling (ACS), to a novel population of informal businesses. Generally, it shows that efficiency gains through the application of ACS, when compared to Simple Random Sampling (SRS), are large, particularly at higher levels of fieldwork effort. In particular, ACS efficiency gains over SRS remain sizable at higher values of initial starting samples, but with comparatively high expansion thresholds, which can reduce the fieldwork effort.
非正式的商业活动在世界各地无处不在,但它几乎总是不被管理数据、注册表或商业来源所捕获。因此,很少有足够的抽样框架可供希望衡量该部门的活动和特征的调查执行者使用。本文将一种成熟的针对罕见和/或群集人口的抽样方法——自适应群集抽样(ACS)——应用于一种新的非正式企业群体。总的来说,与简单随机抽样(SRS)相比,应用ACS的效率提高很大,特别是在较高水平的现场工作中。特别是,在较高的初始起始样本值下,ACS效率比SRS的收益仍然相当可观,但膨胀阈值相对较高,这可能会减少现场工作的工作量。
{"title":"An Application of Adaptive Cluster Sampling to Surveying Informal Businesses","authors":"Gemechu Aga, David C Francis, Filip Jolevski, Jorge Rodriguez Meza, Joshua Seth Wimpey","doi":"10.1093/jssam/smac037","DOIUrl":"https://doi.org/10.1093/jssam/smac037","url":null,"abstract":"Abstract Informal business activity is ubiquitous around the world, but it is nearly always uncaptured by administrative data, registries, or commercial sources. For this reason, there are rarely adequate sampling frames available for survey implementers wishing to measure the activity and characteristics of the sector. This article applies a well-established sampling method for rare and/or clustered populations, Adaptive Cluster Sampling (ACS), to a novel population of informal businesses. Generally, it shows that efficiency gains through the application of ACS, when compared to Simple Random Sampling (SRS), are large, particularly at higher levels of fieldwork effort. In particular, ACS efficiency gains over SRS remain sizable at higher values of initial starting samples, but with comparatively high expansion thresholds, which can reduce the fieldwork effort.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135794712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Interviewer Fraud Using Multilevel Models 利用多层次模型检测面试官欺诈
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-01-02 DOI: 10.1093/jssam/smac036
Lukas Olbrich, Yuliya Kosyakova, J. Sakshaug, Silvia Schwanhäuser
Interviewer falsification, such as the complete or partial fabrication of interview data, has been shown to substantially affect the results of survey data. In this study, we apply a method to identify falsifying face-to-face interviewers based on the development of their behavior over the survey field period. We postulate four potential falsifier types: steady low-effort falsifiers, steady high-effort falsifiers, learning falsifiers, and sudden falsifiers. Using large-scale survey data from Germany with verified falsifications, we apply multilevel models with interviewer effects on the intercept, scale, and slope of the interview sequence to test whether falsifiers can be detected based on their dynamic behavior. In addition to identifying a rather high-effort falsifier previously detected by the survey organization, the model flagged two additional suspicious interviewers exhibiting learning behavior, who were subsequently classified as deviant by the survey organization. We additionally apply the analysis approach to publicly available cross-national survey data and find multiple interviewers who show behavior consistent with the postulated falsifier types.
采访者造假,如完全或部分伪造采访数据,已被证明会对调查数据的结果产生重大影响。在这项研究中,我们应用了一种方法,根据他们在调查期间的行为发展来识别伪造的面对面采访者。我们假设了四种潜在的证伪者类型:稳定的低努力证伪者、稳定的高努力证伪器、学习证伪器和突然证伪器。利用来自德国的大规模调查数据,我们对采访序列的截距、规模和斜率应用了具有采访者效应的多层次模型,以测试是否可以根据造假者的动态行为来检测造假者。除了识别调查组织之前检测到的一个相当努力的造假者外,该模型还标记了另外两名表现出学习行为的可疑受访者,他们随后被调查组织归类为离经叛道者。此外,我们将分析方法应用于公开的跨国调查数据,并发现多名受访者的行为与假设的证伪者类型一致。
{"title":"Detecting Interviewer Fraud Using Multilevel Models","authors":"Lukas Olbrich, Yuliya Kosyakova, J. Sakshaug, Silvia Schwanhäuser","doi":"10.1093/jssam/smac036","DOIUrl":"https://doi.org/10.1093/jssam/smac036","url":null,"abstract":"\u0000 Interviewer falsification, such as the complete or partial fabrication of interview data, has been shown to substantially affect the results of survey data. In this study, we apply a method to identify falsifying face-to-face interviewers based on the development of their behavior over the survey field period. We postulate four potential falsifier types: steady low-effort falsifiers, steady high-effort falsifiers, learning falsifiers, and sudden falsifiers. Using large-scale survey data from Germany with verified falsifications, we apply multilevel models with interviewer effects on the intercept, scale, and slope of the interview sequence to test whether falsifiers can be detected based on their dynamic behavior. In addition to identifying a rather high-effort falsifier previously detected by the survey organization, the model flagged two additional suspicious interviewers exhibiting learning behavior, who were subsequently classified as deviant by the survey organization. We additionally apply the analysis approach to publicly available cross-national survey data and find multiple interviewers who show behavior consistent with the postulated falsifier types.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42430170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependence-Robust Confidence Intervals for Capture-Recapture Surveys. 捕获-再捕获调查的依赖-稳健置信区间
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2022-12-08 eCollection Date: 2023-11-01 DOI: 10.1093/jssam/smac031
Jinghao Sun, Luk Van Baelen, Els Plettinckx, Forrest W Crawford

Capture-recapture (CRC) surveys are used to estimate the size of a population whose members cannot be enumerated directly. CRC surveys have been used to estimate the number of Coronavirus Disease 2019 (COVID-19) infections, people who use drugs, sex workers, conflict casualties, and trafficking victims. When k-capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a 2k contingency table in which one element-the number of individuals appearing in none of the samples-remains unobserved. In the absence of additional assumptions, the population size is not identifiable (i.e., point identified). Stringent assumptions about the dependence between samples are often used to achieve point identification. However, real-world CRC surveys often use convenience samples in which the assumed dependence cannot be guaranteed, and population size estimates under these assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a nontrivial confidence set for the true population size. We construct confidence sets under bounds on pairwise capture probabilities using two methods: test inversion bootstrap confidence intervals and profile likelihood confidence intervals. Simulation results demonstrate well-calibrated confidence sets for each method. In an extensive real-world study, we apply the new methodology to the problem of using heterogeneous survey data to estimate the number of people who inject drugs in Brussels, Belgium.

捕获-再捕获(CRC)调查用于估计不能直接枚举成员的人口规模。CRC调查已被用于估计2019冠状病毒病(COVID-19)感染人数、吸毒者、性工作者、冲突伤亡人数和贩运受害者人数。当获得k个捕获样本时,样本子集中的单位捕获计数自然由2k列联表表示,其中一个元素-未出现在任何样本中的个体数量-仍然未被观察到。在没有额外假设的情况下,人口规模无法确定(即,确定的点)。为了实现点识别,通常使用严格的样本间相关性假设。然而,现实世界的CRC调查经常使用便利样本,其中假设的依赖性不能得到保证,并且在这些假设下的人口规模估计可能缺乏经验可信度。在这项工作中,我们应用部分识别理论来表明,关于样本之间依赖性质的弱假设或定性知识可用于表征真实总体规模的非平凡置信集。我们使用两种方法在两两捕获概率的界限下构造置信集:测试反演自举置信区间和剖面似然置信区间。仿真结果表明,每种方法的置信集都经过了良好的校准。在一项广泛的现实世界研究中,我们将新方法应用于使用异质调查数据来估计比利时布鲁塞尔注射毒品人数的问题。
{"title":"Dependence-Robust Confidence Intervals for Capture-Recapture Surveys.","authors":"Jinghao Sun, Luk Van Baelen, Els Plettinckx, Forrest W Crawford","doi":"10.1093/jssam/smac031","DOIUrl":"10.1093/jssam/smac031","url":null,"abstract":"<p><p>Capture-recapture (CRC) surveys are used to estimate the size of a population whose members cannot be enumerated directly. CRC surveys have been used to estimate the number of Coronavirus Disease 2019 (COVID-19) infections, people who use drugs, sex workers, conflict casualties, and trafficking victims. When <i>k</i>-capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a <math><mrow><msup><mrow><mn>2</mn></mrow><mi>k</mi></msup></mrow></math> contingency table in which one element-the number of individuals appearing in none of the samples-remains unobserved. In the absence of additional assumptions, the population size is not identifiable (i.e., point identified). Stringent assumptions about the dependence between samples are often used to achieve point identification. However, real-world CRC surveys often use convenience samples in which the assumed dependence cannot be guaranteed, and population size estimates under these assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a nontrivial confidence set for the true population size. We construct confidence sets under bounds on pairwise capture probabilities using two methods: test inversion bootstrap confidence intervals and profile likelihood confidence intervals. Simulation results demonstrate well-calibrated confidence sets for each method. In an extensive real-world study, we apply the new methodology to the problem of using heterogeneous survey data to estimate the number of people who inject drugs in Brussels, Belgium.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44877571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating Web Survey Mode and Panel Effects in a Nationwide Survey of Alcohol Use. 估计全国酒精使用调查的网络调查模式和小组效应。
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2022-11-02 eCollection Date: 2023-11-01 DOI: 10.1093/jssam/smac028
Randal ZuWallack, Matt Jans, Thomas Brassell, Kisha Bailly, James Dayton, Priscilla Martinez, Deidre Patterson, Thomas K Greenfield, Katherine J Karriker-Jaffe

Random-digit dialing (RDD) telephone surveys are challenged by declining response rates and increasing costs. Many surveys that were traditionally conducted via telephone are seeking cost-effective alternatives, such as address-based sampling (ABS) with self-administered web or mail questionnaires. At a fraction of the cost of both telephone and ABS surveys, opt-in web panels are an attractive alternative. The 2019-2020 National Alcohol Survey (NAS) employed three methods: (1) an RDD telephone survey (traditional NAS method); (2) an ABS push-to-web survey; and (3) an opt-in web panel. The study reported here evaluated differences in the three data-collection methods, which we will refer to as "mode effects," on alcohol consumption and health topics. To evaluate mode effects, multivariate regression models were developed predicting these characteristics, and the presence of a mode effect on each outcome was determined by the significance of the three-level effect (RDD-telephone, ABS-web, opt-in web panel) in each model. Those results were then used to adjust for mode effects and produce a "telephone-equivalent" estimate for the ABS and panel data sources. The study found that ABS-web and RDD were similar for most estimates but exhibited differences for sensitive questions including getting drunk and experiencing depression. The opt-in web panel exhibited more differences between it and the other two survey modes. One notable example is the reporting of drinking alcohol at least 3-4 times per week, which was 21 percent for RDD-phone, 24 percent for ABS-web, and 34 percent for opt-in web panel. The regression model adjusts for mode effects, improving comparability with past surveys conducted by telephone; however, the models result in higher variance of the estimates. This method of adjusting for mode effects has broad applications to mode and sample transitions throughout the survey research industry.

随机数字拨号(RDD)电话调查受到回复率下降和成本增加的挑战。许多传统上通过电话进行的调查正在寻求具有成本效益的替代方案,例如基于地址的抽样(ABS)与自我管理的网络或邮件问卷。与电话调查和ABS调查相比,选择加入的网页面板是一个很有吸引力的选择。2019-2020年全国酒精调查(NAS)采用了三种方法:(1)RDD电话调查(传统的NAS方法);(2) ABS推送到网页的调查;(3)一个可选择的网络面板。这里报告的研究评估了三种数据收集方法的差异,我们将其称为“模式效应”,在酒精消费和健康主题上。为了评估模式效应,我们建立了预测这些特征的多元回归模型,并通过每个模型中三层次效应(RDD-telephone, ABS-web, option -in web panel)的显著性来确定模式效应对每个结果的影响。然后,这些结果被用于调整模式效应,并为ABS和面板数据源产生“电话等效”估计。研究发现,ABS-web和RDD在大多数估计上是相似的,但在诸如醉酒和抑郁等敏感问题上表现出差异。可选择的网络面板显示出它与其他两种调查模式之间的差异。一个值得注意的例子是报告每周至少饮酒3-4次,其中RDD-phone为21%,ABS-web为24%,option -in web面板为34%。回归模型调整了模式效应,提高了与以往电话调查的可比性;然而,这些模型导致估计的方差较大。这种调整模式效应的方法在整个调查研究行业的模式和样本过渡中有着广泛的应用。
{"title":"Estimating Web Survey Mode and Panel Effects in a Nationwide Survey of Alcohol Use.","authors":"Randal ZuWallack, Matt Jans, Thomas Brassell, Kisha Bailly, James Dayton, Priscilla Martinez, Deidre Patterson, Thomas K Greenfield, Katherine J Karriker-Jaffe","doi":"10.1093/jssam/smac028","DOIUrl":"https://doi.org/10.1093/jssam/smac028","url":null,"abstract":"<p><p>Random-digit dialing (RDD) telephone surveys are challenged by declining response rates and increasing costs. Many surveys that were traditionally conducted via telephone are seeking cost-effective alternatives, such as address-based sampling (ABS) with self-administered web or mail questionnaires. At a fraction of the cost of both telephone and ABS surveys, opt-in web panels are an attractive alternative. The 2019-2020 National Alcohol Survey (NAS) employed three methods: (1) an RDD telephone survey (traditional NAS method); (2) an ABS push-to-web survey; and (3) an opt-in web panel. The study reported here evaluated differences in the three data-collection methods, which we will refer to as \"mode effects,\" on alcohol consumption and health topics. To evaluate mode effects, multivariate regression models were developed predicting these characteristics, and the presence of a mode effect on each outcome was determined by the significance of the three-level effect (RDD-telephone, ABS-web, opt-in web panel) in each model. Those results were then used to adjust for mode effects and produce a \"telephone-equivalent\" estimate for the ABS and panel data sources. The study found that ABS-web and RDD were similar for most estimates but exhibited differences for sensitive questions including getting drunk and experiencing depression. The opt-in web panel exhibited more differences between it and the other two survey modes. One notable example is the reporting of drinking alcohol at least 3-4 times per week, which was 21 percent for RDD-phone, 24 percent for ABS-web, and 34 percent for opt-in web panel. The regression model adjusts for mode effects, improving comparability with past surveys conducted by telephone; however, the models result in higher variance of the estimates. This method of adjusting for mode effects has broad applications to mode and sample transitions throughout the survey research industry.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138460650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simple Question Goes a Long Way: A Wording Experiment on Bank Account Ownership. 一个简单的问题走了很长的路:关于银行账户所有权的措辞实验。
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2022-11-01 DOI: 10.1093/jssam/smab045
Marco Angrisani, Mick P Couper

Ownership of a bank account is an objective measure and should be relatively easy to elicit via survey questions. Yet, depending on the interview mode, the wording of the question and its placement within the survey may influence respondents' answers. The Health and Retirement Study (HRS) asset module, as administered online to members of the Understanding America Study (UAS), yielded substantially lower rates of reported bank account ownership than either a single question on ownership in the Current Population Survey (CPS) or the full asset module administered to HRS panelists (both interviewer-administered surveys). We designed and implemented an experiment in the UAS comparing the original HRS question eliciting bank account ownership with two alternative versions that were progressively simplified. We document strong evidence that the original question leads to systematic underestimation of bank account ownership. In contrast, the proportion of bank account owners obtained from the simplest alternative version of the question is very similar to the population benchmark estimate. We investigate treatment effect heterogeneity by cognitive ability and financial literacy. We find that questionnaire simplification affects responses of individuals with higher cognitive ability substantially less than those with lower cognitive ability. Our results suggest that high-quality data from surveys start from asking the right questions, which should be as simple and precise as possible and carefully adapted to the mode of interview.

银行账户的所有权是一种客观衡量标准,应该相对容易通过调查问题得出结论。然而,根据访谈方式的不同,问题的措辞及其在调查中的位置可能会影响受访者的答案。健康与退休研究(HRS)资产模块在了解美国研究(UAS)的成员中进行了在线管理,与当前人口调查(CPS)中的单个所有权问题或对HRS小组成员进行的完整资产模块(均为访谈者管理的调查)相比,报告的银行账户拥有率要低得多。我们在UAS中设计并实施了一个实验,将原始的HRS问题与两个逐步简化的替代版本进行比较。我们记录了强有力的证据,证明最初的问题导致了对银行账户所有权的系统性低估。相比之下,从问题的最简单替代版本获得的银行账户所有者比例与人口基准估计值非常相似。我们通过认知能力和金融知识来研究治疗效果的异质性。研究发现,问卷简化对认知能力高的个体的影响明显小于认知能力低的个体。我们的研究结果表明,高质量的调查数据从提出正确的问题开始,这些问题应该尽可能简单准确,并仔细适应访谈模式。
{"title":"A Simple Question Goes a Long Way: A Wording Experiment on Bank Account Ownership.","authors":"Marco Angrisani,&nbsp;Mick P Couper","doi":"10.1093/jssam/smab045","DOIUrl":"https://doi.org/10.1093/jssam/smab045","url":null,"abstract":"<p><p>Ownership of a bank account is an objective measure and should be relatively easy to elicit via survey questions. Yet, depending on the interview mode, the wording of the question and its placement within the survey may influence respondents' answers. The Health and Retirement Study (HRS) asset module, as administered online to members of the Understanding America Study (UAS), yielded substantially lower rates of reported bank account ownership than either a single question on ownership in the Current Population Survey (CPS) or the full asset module administered to HRS panelists (both interviewer-administered surveys). We designed and implemented an experiment in the UAS comparing the original HRS question eliciting bank account ownership with two alternative versions that were progressively simplified. We document strong evidence that the original question leads to systematic underestimation of bank account ownership. In contrast, the proportion of bank account owners obtained from the simplest alternative version of the question is very similar to the population benchmark estimate. We investigate treatment effect heterogeneity by cognitive ability and financial literacy. We find that questionnaire simplification affects responses of individuals with higher cognitive ability substantially less than those with lower cognitive ability. Our results suggest that high-quality data from surveys start from asking the right questions, which should be as simple and precise as possible and carefully adapted to the mode of interview.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643168/pdf/smab045.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10370660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Capture-Recapture Methodology to Enhance Precision of Representative Sampling-Based Case Count Estimates. 使用捕获-再捕获方法提高具有代表性的基于抽样的病例计数估计的精度。
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2022-11-01 DOI: 10.1093/jssam/smab052
Robert H Lyles, Yuzi Zhang, Lin Ge, Cameron England, Kevin Ward, Timothy L Lash, Lance A Waller

The application of serial principled sampling designs for diagnostic testing is often viewed as an ideal approach to monitoring prevalence and case counts of infectious or chronic diseases. Considering logistics and the need for timeliness and conservation of resources, surveillance efforts can generally benefit from creative designs and accompanying statistical methods to improve the precision of sampling-based estimates and reduce the size of the necessary sample. One option is to augment the analysis with available data from other surveillance streams that identify cases from the population of interest over the same timeframe, but may do so in a highly nonrepresentative manner. We consider monitoring a closed population (e.g., a long-term care facility, patient registry, or community), and encourage the use of capture-recapture methodology to produce an alternative case total estimate to the one obtained by principled sampling. With care in its implementation, even a relatively small simple or stratified random sample not only provides its own valid estimate, but provides the only fully defensible means of justifying a second estimate based on classical capture-recapture methods. We initially propose weighted averaging of the two estimators to achieve greater precision than can be obtained using either alone, and then show how a novel single capture-recapture estimator provides a unified and preferable alternative. We develop a variant on a Dirichlet-multinomial-based credible interval to accompany our hybrid design-based case count estimates, with a view toward improved coverage properties. Finally, we demonstrate the benefits of the approach through simulations designed to mimic an acute infectious disease daily monitoring program or an annual surveillance program to quantify new cases within a fixed patient registry.

在诊断测试中应用连续原则抽样设计通常被视为监测传染病或慢性病流行率和病例数的理想方法。考虑到后勤和对及时性和资源保护的需要,监测工作通常可以受益于创造性的设计和附带的统计方法,以提高基于抽样的估计的精度,并减少必要样本的规模。一种选择是利用来自其他监测流的可用数据来增强分析,这些数据可以在同一时间段内从感兴趣的人群中识别病例,但这样做可能极不具有代表性。我们考虑监测一个封闭的人群(例如,长期护理机构、患者登记处或社区),并鼓励使用捕获-再捕获方法来产生一个替代原则抽样获得的病例总数估计。在其实施过程中,即使是一个相对较小的简单或分层随机样本,也不仅提供了它自己的有效估计,而且提供了唯一完全可辩护的方法来证明基于经典捕获-再捕获方法的第二次估计。我们最初提出了两个估计器的加权平均,以获得比单独使用任一估计器更高的精度,然后展示了一个新的单一捕获-再捕获估计器如何提供统一和更好的替代方案。我们开发了一种基于dirichlet -多项式的可信区间的变体,以配合我们基于混合设计的病例计数估计,以改进覆盖特性。最后,我们通过模拟急性传染病每日监测计划或年度监测计划来量化固定患者登记册内的新病例,展示了该方法的好处。
{"title":"Using Capture-Recapture Methodology to Enhance Precision of Representative Sampling-Based Case Count Estimates.","authors":"Robert H Lyles,&nbsp;Yuzi Zhang,&nbsp;Lin Ge,&nbsp;Cameron England,&nbsp;Kevin Ward,&nbsp;Timothy L Lash,&nbsp;Lance A Waller","doi":"10.1093/jssam/smab052","DOIUrl":"https://doi.org/10.1093/jssam/smab052","url":null,"abstract":"<p><p>The application of serial principled sampling designs for diagnostic testing is often viewed as an ideal approach to monitoring prevalence and case counts of infectious or chronic diseases. Considering logistics and the need for timeliness and conservation of resources, surveillance efforts can generally benefit from creative designs and accompanying statistical methods to improve the precision of sampling-based estimates and reduce the size of the necessary sample. One option is to augment the analysis with available data from other surveillance streams that identify cases from the population of interest over the same timeframe, but may do so in a highly nonrepresentative manner. We consider monitoring a closed population (e.g., a long-term care facility, patient registry, or community), and encourage the use of capture-recapture methodology to produce an alternative case total estimate to the one obtained by principled sampling. With care in its implementation, even a relatively small simple or stratified random sample not only provides its own valid estimate, but provides the only fully defensible means of justifying a second estimate based on classical capture-recapture methods. We initially propose weighted averaging of the two estimators to achieve greater precision than can be obtained using either alone, and then show how a novel single capture-recapture estimator provides a unified and preferable alternative. We develop a variant on a Dirichlet-multinomial-based credible interval to accompany our hybrid design-based case count estimates, with a view toward improved coverage properties. Finally, we demonstrate the benefits of the approach through simulations designed to mimic an acute infectious disease daily monitoring program or an annual surveillance program to quantify new cases within a fixed patient registry.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643167/pdf/smab052.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9785848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Empirical Best Prediction of Small Area Means Based on a Unit-Level Gamma-Poisson Model 基于单位水平Gamma-Pisson模型的小区域均值经验最佳预测
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2022-10-07 DOI: 10.1093/jssam/smac026
Emily J. Berg
Existing small area estimation procedures for count data have important limitations. For instance, an M-quantile-based method is known to be less efficient than model-based procedures if the assumptions of the model hold. Also, frequentist inference procedures for Poisson generalized linear mixed models can be computationally intensive or require approximations. Furthermore, area-level models are incapable of incorporating unit-level covariates. We overcome these limitations by developing a small area estimation procedure for a unit-level gamma-Poisson model. The conjugate form of the model permits computationally simple estimation and prediction procedures. We obtain a closed-form expression for the empirical best predictor of the mean as well as a closed-form mean square error estimator. We validate the procedure through simulations. We illustrate the proposed method using a subset of data from the Iowa Seat-Belt Use survey.
现有的计数数据的小面积估计程序具有重要的局限性。例如,如果模型的假设成立,则已知基于M-量的方法不如基于模型的过程有效。此外,泊松广义线性混合模型的频率论推理过程可能是计算密集型的,或者需要近似。此外,区域级别的模型不能包含单位级别的协变量。我们通过开发单位水平伽玛-泊松模型的小面积估计程序来克服这些限制。该模型的共轭形式允许计算上简单的估计和预测过程。我们得到了均值的经验最佳预测器的闭合形式表达式以及闭合形式的均方误差估计器。我们通过仿真验证了该过程。我们使用爱荷华州安全带使用调查的数据子集来说明所提出的方法。
{"title":"Empirical Best Prediction of Small Area Means Based on a Unit-Level Gamma-Poisson Model","authors":"Emily J. Berg","doi":"10.1093/jssam/smac026","DOIUrl":"https://doi.org/10.1093/jssam/smac026","url":null,"abstract":"\u0000 Existing small area estimation procedures for count data have important limitations. For instance, an M-quantile-based method is known to be less efficient than model-based procedures if the assumptions of the model hold. Also, frequentist inference procedures for Poisson generalized linear mixed models can be computationally intensive or require approximations. Furthermore, area-level models are incapable of incorporating unit-level covariates. We overcome these limitations by developing a small area estimation procedure for a unit-level gamma-Poisson model. The conjugate form of the model permits computationally simple estimation and prediction procedures. We obtain a closed-form expression for the empirical best predictor of the mean as well as a closed-form mean square error estimator. We validate the procedure through simulations. We illustrate the proposed method using a subset of data from the Iowa Seat-Belt Use survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41515091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Classification for Open-Ended Questions with BERT 用BERT实现开放式问题的自动分类
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2022-09-13 DOI: 10.1093/jssam/smad015
Hyukjun Gweon, Matthias Schonlau
Manual coding of text data from open-ended questions into different categories is time consuming and expensive. Automated coding uses statistical/machine learning to train on a small subset of manually-coded text answers. Recently, pretraining a general language model on vast amounts of unrelated data and then adapting the model to the specific application has proven effective in natural language processing. Using two data sets, we empirically investigate whether BERT, the currently dominant pretrained language model, is more effective at automated coding of answers to open-ended questions than other non-pretrained statistical learning approaches. We found fine-tuning the pretrained BERT parameters is essential as otherwise BERT is not competitive. Second, we found fine-tuned BERT barely beats the non-pretrained statistical learning approaches in terms of classification accuracy when trained on 100 manually coded observations. However, BERT’s relative advantage increases rapidly when more manually coded observations (e.g., 200–400) are available for training. We conclude that for automatically coding answers to open-ended questions BERT is preferable to non-pretrained models such as support vector machines and boosting.
将开放式问题的文本数据手动编码为不同类别既耗时又昂贵。自动编码使用统计/机器学习对手动编码的文本答案的一小部分进行训练。最近,在大量不相关的数据上预训练通用语言模型,然后将模型适应特定的应用程序,在自然语言处理中被证明是有效的。使用两个数据集,我们实证研究了目前占主导地位的预训练语言模型BERT在开放式问题答案的自动编码方面是否比其他未经预训练的统计学习方法更有效。我们发现微调预训练的BERT参数是至关重要的,否则BERT就没有竞争力。其次,我们发现,当对100个手动编码的观测值进行训练时,微调后的BERT在分类精度方面几乎没有超过未经预训练的统计学习方法。然而,当有更多手动编码的观测值(例如200–400)可用于训练时,BERT的相对优势迅速增加。我们得出的结论是,对于自动编码开放式问题的答案,BERT比支持向量机和boosting等非预训练模型更可取。
{"title":"Automated Classification for Open-Ended Questions with BERT","authors":"Hyukjun Gweon, Matthias Schonlau","doi":"10.1093/jssam/smad015","DOIUrl":"https://doi.org/10.1093/jssam/smad015","url":null,"abstract":"\u0000 Manual coding of text data from open-ended questions into different categories is time consuming and expensive. Automated coding uses statistical/machine learning to train on a small subset of manually-coded text answers. Recently, pretraining a general language model on vast amounts of unrelated data and then adapting the model to the specific application has proven effective in natural language processing. Using two data sets, we empirically investigate whether BERT, the currently dominant pretrained language model, is more effective at automated coding of answers to open-ended questions than other non-pretrained statistical learning approaches. We found fine-tuning the pretrained BERT parameters is essential as otherwise BERT is not competitive. Second, we found fine-tuned BERT barely beats the non-pretrained statistical learning approaches in terms of classification accuracy when trained on 100 manually coded observations. However, BERT’s relative advantage increases rapidly when more manually coded observations (e.g., 200–400) are available for training. We conclude that for automatically coding answers to open-ended questions BERT is preferable to non-pretrained models such as support vector machines and boosting.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46968013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1