首页 > 最新文献

Journal of Survey Statistics and Methodology最新文献

英文 中文
Discussion of the 2022 Hansen Lecture: “The Evolution of the Use of Models in Survey Sampling” 2022年Hansen讲座:“调查抽样中模型使用的演变”讨论
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-07-26 DOI: 10.1093/jssam/smad030
F. Breidt
The 2022 Hansen Lecture gave a broad overview of the use of models in survey sampling, with emphasis on modeling approaches to incorporating auxiliary information in survey estimators. This discussion expands upon some issues in model-assisted estimation, exploring data needs and the availability of multipurpose weights for advanced modeling methods.
2022年汉森讲座对调查抽样中模型的使用进行了广泛的概述,重点介绍了在调查估计器中纳入辅助信息的建模方法。本讨论扩展了模型辅助估计中的一些问题,探索了数据需求和高级建模方法的多用途权重的可用性。
{"title":"Discussion of the 2022 Hansen Lecture: “The Evolution of the Use of Models in Survey Sampling”","authors":"F. Breidt","doi":"10.1093/jssam/smad030","DOIUrl":"https://doi.org/10.1093/jssam/smad030","url":null,"abstract":"\u0000 The 2022 Hansen Lecture gave a broad overview of the use of models in survey sampling, with emphasis on modeling approaches to incorporating auxiliary information in survey estimators. This discussion expands upon some issues in model-assisted estimation, exploring data needs and the availability of multipurpose weights for advanced modeling methods.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48827563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating the Size of Clustered Hidden Populations 估计聚类隐藏种群的大小
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-07-17 DOI: 10.1093/jssam/smad025
Laura J Gamble, L. Johnston, P. Pham, P. Vinck, Katherine R. McLaughlin
Successive sampling population size estimation (SS-PSE) is a method used by government agencies, aid organizations, and researchers around the world to estimate the size of hidden populations using data from respondent-driven sampling surveys. SS-PSE addresses a specific need in estimation, since many countries rely on having accurate size estimates to plan and allocate finite resources to address the needs of hidden populations. However, SS-PSE relies on several assumptions, one of which requires the underlying social network of the hidden population to be fully connected. We propose two modifications to SS-PSE for estimating the size of hidden populations whose underlying social network is composed of disjoint clusters. The first method is a theoretically straightforward extension of SS-PSE, but it relies on prior information that may be difficult to obtain in practice. The second method extends the Bayesian SS-PSE model by introducing a new set of parameters that allow for clustered estimation without requiring the additional prior information. After providing theoretical justification for both novel methods, we then assess their performance using simulations and apply the Clustered SS-PSE method to a population of internally displaced persons in Bamako, Mali.
连续抽样人口规模估计(SS-PSE)是世界各地的政府机构、援助组织和研究人员使用的一种方法,利用受访者驱动的抽样调查数据估计隐藏人口的规模。SS-PSE解决了估计方面的具体需要,因为许多国家依靠准确的规模估计来规划和分配有限的资源,以满足隐藏人口的需要。然而,SS-PSE依赖于几个假设,其中一个假设要求隐藏人群的潜在社会网络完全连接。我们提出了对SS-PSE的两种修改,用于估计潜在社会网络由不相交的集群组成的隐藏群体的大小。第一种方法在理论上是SS-PSE的直接扩展,但它依赖于在实践中可能难以获得的先验信息。第二种方法通过引入一组新的参数来扩展贝叶斯SS-PSE模型,这些参数允许在不需要额外先验信息的情况下进行聚类估计。在为这两种新方法提供了理论依据之后,我们利用模拟评估了它们的性能,并将聚类SS-PSE方法应用于马里巴马科的国内流离失所者群体。
{"title":"Estimating the Size of Clustered Hidden Populations","authors":"Laura J Gamble, L. Johnston, P. Pham, P. Vinck, Katherine R. McLaughlin","doi":"10.1093/jssam/smad025","DOIUrl":"https://doi.org/10.1093/jssam/smad025","url":null,"abstract":"\u0000 Successive sampling population size estimation (SS-PSE) is a method used by government agencies, aid organizations, and researchers around the world to estimate the size of hidden populations using data from respondent-driven sampling surveys. SS-PSE addresses a specific need in estimation, since many countries rely on having accurate size estimates to plan and allocate finite resources to address the needs of hidden populations. However, SS-PSE relies on several assumptions, one of which requires the underlying social network of the hidden population to be fully connected. We propose two modifications to SS-PSE for estimating the size of hidden populations whose underlying social network is composed of disjoint clusters. The first method is a theoretically straightforward extension of SS-PSE, but it relies on prior information that may be difficult to obtain in practice. The second method extends the Bayesian SS-PSE model by introducing a new set of parameters that allow for clustered estimation without requiring the additional prior information. After providing theoretical justification for both novel methods, we then assess their performance using simulations and apply the Clustered SS-PSE method to a population of internally displaced persons in Bamako, Mali.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48480602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate Small-area Estimation for Mixed-type Response Variables With Item Nonresponse 项目无响应的混合型响应变量的多变量小面积估计
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-06-28 DOI: 10.1093/jssam/smad018
Haoliang Sun, Emily J. Berg, Zhengyuan Zhu
Many surveys collect information on discrete characteristics and continuous variables, that is, mixed-type variables. Small-area statistics of interest include means or proportions of the response variables as well as their domain means, which are the mean values at each level of a different categorical variable. However, item nonresponse in survey data increases the complexity of small-area estimation. To address this issue, we propose a multivariate mixed-effects model for mixed-type response variables subject to item nonresponse. We apply this method to two data structures where the data are missing completely at random by design. We use empirical data from two separate studies: a survey of pet owners and a dataset from the National Resources Inventory. In these applications, our proposed method leads to improvements relative to a direct estimator and a predictor based on a univariate model.
许多调查收集关于离散特征和连续变量的信息,即混合型变量。感兴趣的小区域统计包括响应变量的平均值或比例以及它们的域平均值,这是不同类别变量在每个级别的平均值。然而,调查数据中的项目无响应增加了小面积估计的复杂性。为了解决这个问题,我们提出了一个针对项目无反应的混合型反应变量的多变量混合效应模型。我们将这种方法应用于两个数据结构,其中数据通过设计完全随机丢失。我们使用了来自两项独立研究的经验数据:一项是对宠物主人的调查,另一项是来自国家资源清单的数据集。在这些应用中,我们提出的方法相对于基于单变量模型的直接估计器和预测器进行了改进。
{"title":"Multivariate Small-area Estimation for Mixed-type Response Variables With Item Nonresponse","authors":"Haoliang Sun, Emily J. Berg, Zhengyuan Zhu","doi":"10.1093/jssam/smad018","DOIUrl":"https://doi.org/10.1093/jssam/smad018","url":null,"abstract":"\u0000 Many surveys collect information on discrete characteristics and continuous variables, that is, mixed-type variables. Small-area statistics of interest include means or proportions of the response variables as well as their domain means, which are the mean values at each level of a different categorical variable. However, item nonresponse in survey data increases the complexity of small-area estimation. To address this issue, we propose a multivariate mixed-effects model for mixed-type response variables subject to item nonresponse. We apply this method to two data structures where the data are missing completely at random by design. We use empirical data from two separate studies: a survey of pet owners and a dataset from the National Resources Inventory. In these applications, our proposed method leads to improvements relative to a direct estimator and a predictor based on a univariate model.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49177498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hansen Lecture 2022: The Evolution of the Use of Models in Survey Sampling Hansen讲座2022:调查抽样中模型使用的演变
4区 数学 Q1 Social Sciences Pub Date : 2023-06-28 DOI: 10.1093/jssam/smad021
Richard Valliant
Abstract Morris Hansen made seminal contributions to the early development of sampling theory, including convincing government survey administrators to use probability sampling as opposed to nonprobability (NP) methods like quota sampling. He codified many of the early results in design-based sampling theory in his 1953 two-volume set co-authored with Hurwitz and Madow. Since those developments, the explicit use of models has proliferated in sampling for use in basic point estimation, nonresponse and noncoverage adjustment, imputation, and a variety of other areas. This paper summarizes some of the early developments, controversies in the design-based versus model-based debate, and uses of models for inference from probability and NP samples.
莫里斯·汉森对抽样理论的早期发展做出了开创性的贡献,包括说服政府调查管理者使用概率抽样而不是配额抽样等非概率(NP)方法。他在1953年与赫尔维茨和马多合著的两卷文集中整理了许多基于设计的抽样理论的早期结果。由于这些发展,模型的明确使用已经在采样中激增,用于基本点估计、非响应和非覆盖调整、imputation和各种其他领域。本文总结了一些早期的发展,基于设计与基于模型的争论中的争议,以及从概率和NP样本中进行推理的模型的使用。
{"title":"Hansen Lecture 2022: The Evolution of the Use of Models in Survey Sampling","authors":"Richard Valliant","doi":"10.1093/jssam/smad021","DOIUrl":"https://doi.org/10.1093/jssam/smad021","url":null,"abstract":"Abstract Morris Hansen made seminal contributions to the early development of sampling theory, including convincing government survey administrators to use probability sampling as opposed to nonprobability (NP) methods like quota sampling. He codified many of the early results in design-based sampling theory in his 1953 two-volume set co-authored with Hurwitz and Madow. Since those developments, the explicit use of models has proliferated in sampling for use in basic point estimation, nonresponse and noncoverage adjustment, imputation, and a variety of other areas. This paper summarizes some of the early developments, controversies in the design-based versus model-based debate, and uses of models for inference from probability and NP samples.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135259876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey Consent to Administrative Data Linkage: Five Experiments on Wording and Format 调查同意行政数据联动:五项措辞与格式实验
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-06-27 DOI: 10.1093/jssam/smad019
A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach
To maximize the value of the data while minimizing respondent burden, survey data are increasingly linked to administrative records. Record linkage often requires the informed consent of survey respondents and failure to obtain consent reduces sample size and may lead to selection bias. Relatively little is known about how best to word and format consent requests in surveys. We conducted a series of experiments in a probability household panel and an online access panel to understand how various features of the design of the consent request can affect informed consent. We experimentally varied: (i) the readability of the consent request, (ii) placement of the consent request in the survey, (iii) consent as default versus the standard opt-in consent question, (iv) offering additional information, and (v) a priming treatment focusing on trust in the data holder. For each experiment, we examine the effects of the treatments on consent rates, objective understanding of the consent request (measured with knowledge test questions), subjective understanding (how well the respondent felt they understood the request), confidence in their decision, response times, and whether they read any of the additional information materials. We find that the default wording and offering additional information do not increase consent rates. Improving the readability of the consent question increases objective understanding but does not increase the consent rate. However, asking for consent early in the survey and priming respondents to consider their trust in the administrative data holder both increase consent rates without negatively affecting understanding of the request.
为了最大限度地提高数据的价值,同时最大限度地减少受访者的负担,调查数据越来越多地与行政记录挂钩。记录联系通常需要调查受访者的知情同意,未能获得同意会减少样本量,并可能导致选择偏差。对于如何在调查中最好地表达和格式化同意请求,人们知之甚少。我们在概率家庭小组和在线访问小组中进行了一系列实验,以了解同意请求设计的各种特征如何影响知情同意。我们通过实验改变了:(i)同意请求的可读性,(ii)同意请求在调查中的位置,(iii)默认同意与标准的选择加入同意问题,(iv)提供额外信息,以及(v)专注于对数据持有者的信任的启动处理。对于每个实验,我们检查了治疗对同意率、对同意请求的客观理解(用知识测试题衡量)、主观理解(受访者认为他们理解请求的程度)、对自己决定的信心、回答时间以及他们是否阅读了任何附加信息材料的影响。我们发现,默认的措辞和提供额外的信息并不会提高同意率。提高同意问题的可读性可以增加客观理解,但不会增加同意率。然而,在调查的早期征求同意,并促使受访者考虑他们对行政数据持有者的信任,都会提高同意率,而不会对对请求的理解产生负面影响。
{"title":"Survey Consent to Administrative Data Linkage: Five Experiments on Wording and Format","authors":"A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach","doi":"10.1093/jssam/smad019","DOIUrl":"https://doi.org/10.1093/jssam/smad019","url":null,"abstract":"\u0000 To maximize the value of the data while minimizing respondent burden, survey data are increasingly linked to administrative records. Record linkage often requires the informed consent of survey respondents and failure to obtain consent reduces sample size and may lead to selection bias. Relatively little is known about how best to word and format consent requests in surveys. We conducted a series of experiments in a probability household panel and an online access panel to understand how various features of the design of the consent request can affect informed consent. We experimentally varied: (i) the readability of the consent request, (ii) placement of the consent request in the survey, (iii) consent as default versus the standard opt-in consent question, (iv) offering additional information, and (v) a priming treatment focusing on trust in the data holder. For each experiment, we examine the effects of the treatments on consent rates, objective understanding of the consent request (measured with knowledge test questions), subjective understanding (how well the respondent felt they understood the request), confidence in their decision, response times, and whether they read any of the additional information materials. We find that the default wording and offering additional information do not increase consent rates. Improving the readability of the consent question increases objective understanding but does not increase the consent rate. However, asking for consent early in the survey and priming respondents to consider their trust in the administrative data holder both increase consent rates without negatively affecting understanding of the request.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44943344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pseudo-Bayesian Small-Area Estimation 伪贝叶斯小面积估计
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-06-19 DOI: 10.1093/jssam/smad012
G. Datta, Juhyung Lee, Jiacheng Li
In sample surveys, a subpopulation is referred to as a “small area” or “small domain” if it does not have a large enough sample that alone will yield an adequately accurate estimate of a characteristic. In small-area estimation, the sample size from various subpopulations is often too small to accurately estimate its mean, and so one borrows strength from similar subpopulations through an appropriate model based on relevant covariates. The empirical best linear unbiased prediction (EBLUP) method has been the dominant frequentist model-based approach in small-area estimation. This method relies on estimation of model parameters based on the marginal distribution of the data. As an alternative to this method, the observed best prediction (OBP) method estimates the parameters by minimizing an objective function that is implied by the total mean squared prediction error. We use this objective function in the Fay–Herriot model to construct a pseudo-posterior distribution for the model parameters under nearly noninformative priors for them. Data analysis and simulation show that the pseudo-Bayesian estimators (PBEs) compete favorably with the OBPs and EBLUPs. The PBE estimates are robust to mean misspecification and have good frequentist properties. Being Bayesian by construction, they automatically avoid negative estimates of standard errors, enjoy a dual justification, and provide an attractive alternative to practitioners.
在抽样调查中,如果一个亚群体没有足够大的样本来单独产生对某一特征的充分准确的估计,那么它就被称为“小区域”或“小领域”。在小区域估计中,来自不同亚种群的样本量往往太小,无法准确估计其平均值,因此可以通过基于相关协变量的适当模型从相似的亚种群中借鉴力量。经验最佳线性无偏预测(EBLUP)方法是小区域估计中基于频率模型的主要方法。该方法依赖于基于数据边缘分布的模型参数估计。作为该方法的替代方案,观测最佳预测(OBP)方法通过最小化由总均方预测误差隐含的目标函数来估计参数。我们在Fay-Herriot模型中使用这个目标函数来构造模型参数在接近无信息先验条件下的伪后验分布。数据分析和仿真表明,伪贝叶斯估计器(PBEs)与OBPs和eblps具有较好的竞争优势。PBE估计具有鲁棒性,不会导致错误规范,并且具有良好的频率特性。作为贝叶斯构造,它们自动避免了对标准误差的负估计,享有双重证明,并为从业者提供了一个有吸引力的替代方案。
{"title":"Pseudo-Bayesian Small-Area Estimation","authors":"G. Datta, Juhyung Lee, Jiacheng Li","doi":"10.1093/jssam/smad012","DOIUrl":"https://doi.org/10.1093/jssam/smad012","url":null,"abstract":"\u0000 In sample surveys, a subpopulation is referred to as a “small area” or “small domain” if it does not have a large enough sample that alone will yield an adequately accurate estimate of a characteristic. In small-area estimation, the sample size from various subpopulations is often too small to accurately estimate its mean, and so one borrows strength from similar subpopulations through an appropriate model based on relevant covariates. The empirical best linear unbiased prediction (EBLUP) method has been the dominant frequentist model-based approach in small-area estimation. This method relies on estimation of model parameters based on the marginal distribution of the data. As an alternative to this method, the observed best prediction (OBP) method estimates the parameters by minimizing an objective function that is implied by the total mean squared prediction error. We use this objective function in the Fay–Herriot model to construct a pseudo-posterior distribution for the model parameters under nearly noninformative priors for them. Data analysis and simulation show that the pseudo-Bayesian estimators (PBEs) compete favorably with the OBPs and EBLUPs. The PBE estimates are robust to mean misspecification and have good frequentist properties. Being Bayesian by construction, they automatically avoid negative estimates of standard errors, enjoy a dual justification, and provide an attractive alternative to practitioners.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48464247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Entropy Design by a Markov Chain Process 马尔可夫链过程的最大熵设计
4区 数学 Q1 Social Sciences Pub Date : 2023-06-14 DOI: 10.1093/jssam/smad010
Yves Tillé, Bardia Panahbehagh
Abstract In this article, we study an implementation of maximum entropy (ME) design utilizing a Markov chain. This design, which is also called the conditional Poisson sampling design, is difficult to implement. We first present a new method for calculating the weights associated with conditional Poisson sampling. Then, we study a very simple method of random exchanges of units, which allows switching from one sample to another. This exchange system defines an irreducible and aperiodic Markov chain whose ME design is the stationary distribution. The design can be implemented without enumerating all possible samples. By repeating the exchange process a large number of times, it is possible to select a sample that respects the design. The process is simple to implement, and its convergence rate has been investigated theoretically and by simulation, which led to promising results.
摘要本文研究了一种利用马尔可夫链实现最大熵设计的方法。这种设计,也被称为条件泊松抽样设计,很难实现。我们首先提出了一种计算条件泊松抽样相关权值的新方法。然后,我们研究了一种非常简单的单位随机交换方法,它允许从一个样本切换到另一个样本。该交换系统定义了一个不可约的非周期马尔可夫链,其ME设计为平稳分布。该设计可以在不列举所有可能的样本的情况下实现。通过多次重复交换过程,可以选择符合设计的样品。该方法实现简单,并对其收敛速度进行了理论和仿真研究,结果令人满意。
{"title":"Maximum Entropy Design by a Markov Chain Process","authors":"Yves Tillé, Bardia Panahbehagh","doi":"10.1093/jssam/smad010","DOIUrl":"https://doi.org/10.1093/jssam/smad010","url":null,"abstract":"Abstract In this article, we study an implementation of maximum entropy (ME) design utilizing a Markov chain. This design, which is also called the conditional Poisson sampling design, is difficult to implement. We first present a new method for calculating the weights associated with conditional Poisson sampling. Then, we study a very simple method of random exchanges of units, which allows switching from one sample to another. This exchange system defines an irreducible and aperiodic Markov chain whose ME design is the stationary distribution. The design can be implemented without enumerating all possible samples. By repeating the exchange process a large number of times, it is possible to select a sample that respects the design. The process is simple to implement, and its convergence rate has been investigated theoretically and by simulation, which led to promising results.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135961420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comprehensive Overview of Unit-Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling 信息抽样下小面积估算调查数据的单位级建模综述
4区 数学 Q1 Social Sciences Pub Date : 2023-06-14 DOI: 10.1093/jssam/smad020
Paul A Parker, Ryan Janicki, Scott H Holan
Abstract Model-based small area estimation is frequently used in conjunction with survey data to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models, literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This article provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches.
基于模型的小区域估计经常与调查数据结合使用,以建立对欠采样或未采样地理区域的估计。可以在区域级或单元级指定这些模型,但是单元级模型通常提供潜在的优势,例如更精确的估计和容易的空间聚合。然而,相对于区域级模型,单位级模型的文献较少流行。在单位一级对小区域进行建模时,由于使用了收集调查数据的信息抽样机制,经常会出现挑战。这篇文章提供了一个全面的方法审查下的信息抽样单位级模型,重点是贝叶斯方法。
{"title":"A Comprehensive Overview of Unit-Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad020","DOIUrl":"https://doi.org/10.1093/jssam/smad020","url":null,"abstract":"Abstract Model-based small area estimation is frequently used in conjunction with survey data to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models, literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This article provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135860164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Comparison of Unit-Level Small Area Estimation Modeling Approaches for Survey Data Under Informative Sampling 信息抽样下调查数据单位级小面积估算建模方法比较
4区 数学 Q1 Social Sciences Pub Date : 2023-06-14 DOI: 10.1093/jssam/smad022
Paul A Parker, Ryan Janicki, Scott H Holan
Abstract Unit-level modeling strategies offer many advantages relative to the area-level models that are most often used in the context of small area estimation. For example, unit-level models aggregate naturally, allowing for estimates at any desired resolution, and also offer greater precision in many cases. We compare a variety of the methods available in the literature related to unit-level modeling for small area estimation. Specifically, to provide insight into the differences between methods, we conduct a simulation study that compares several of the general approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey.
单元级建模策略相对于最常用于小面积估计的区域级模型具有许多优点。例如,单元级模型自然地聚集,允许在任何期望的分辨率下进行估计,并且在许多情况下也提供更高的精度。我们比较了文献中与小面积估计的单元级建模相关的各种方法。具体来说,为了深入了解方法之间的差异,我们进行了一项模拟研究,比较了几种一般方法。此外,通过在美国社区调查中的应用进一步说明了用于模拟的方法。
{"title":"Comparison of Unit-Level Small Area Estimation Modeling Approaches for Survey Data Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad022","DOIUrl":"https://doi.org/10.1093/jssam/smad022","url":null,"abstract":"Abstract Unit-level modeling strategies offer many advantages relative to the area-level models that are most often used in the context of small area estimation. For example, unit-level models aggregate naturally, allowing for estimates at any desired resolution, and also offer greater precision in many cases. We compare a variety of the methods available in the literature related to unit-level modeling for small area estimation. Specifically, to provide insight into the differences between methods, we conduct a simulation study that compares several of the general approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135859749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error 利用多来源大数据的预测建模,提高样本效率,减少调查无响应误差
IF 2.1 4区 数学 Q1 Social Sciences Pub Date : 2023-06-10 DOI: 10.1093/jssam/smad016
David Dutwin, Patrick Coyle, I. Bilgen, N. English
Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.
大数据作为调查数据的补充,有时作为替代,在最好的情况下,作为“力量倍增器”来提高调查分析和洞察力。我们详细介绍了一个用例,即大数据分类器(BDC),以替代更传统的针对特定家庭和个人属性进行调查抽样的目标家庭方法。与地理定位和商业供应商标志的使用非常相似,我们详细描述了bdc预测任何给定家庭(例如,家庭中有孩子或西班牙裔人)的可能性的能力。我们专门构建了15个bdc,结合了来自全国代表性的基于概率的大型面板的数据和来自公共和私人来源的一系列大数据,然后评估了这些bdc在三个大型调查数据集中成功预测其预测属性范围的有效性。对于每个BDC和每个数据应用程序,我们将BDC与地理聚类和供应商标志的历史样本目标技术的相对有效性进行了比较。总体而言,bdc在针对亚群体的能力方面略有提高。我们发现预测的类别始终更有效,而其他bdc与供应商标记相当,尽管总是优于地理聚类。我们提出了一些bdc的相对优势和劣势,作为识别和随后采样低发病率和其他人群的新方法。
{"title":"Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error","authors":"David Dutwin, Patrick Coyle, I. Bilgen, N. English","doi":"10.1093/jssam/smad016","DOIUrl":"https://doi.org/10.1093/jssam/smad016","url":null,"abstract":"\u0000 Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45867293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Survey Statistics and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1