首页 > 最新文献

International Statistical Review最新文献

英文 中文
Improving Probabilistic Record Linkage Using Statistical Prediction Models 利用统计预测模型改进概率记录链接
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-12-04 DOI: 10.1111/insr.12535
Angelo Moretti, N. Shlomo
Record linkage brings together information from records in two or more data sources that are believed to belong to the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations and the challenge is to link statistical units that are subject to error. We provide an overview of record linkage techniques and specifically investigate the classic Fellegi and Sunter probabilistic record linkage framework to assess whether the decision rule for classifying pairs into sets of matches and non‐matches can be improved by incorporating a statistical prediction model. We also study whether the enhanced linkage rule can provide better results in terms of preserving associations between variables in the linked data file that are not used in the matching procedure. A simulation study and an application based on real data are used to evaluate the methods.
{"title":"Improving Probabilistic Record Linkage Using Statistical Prediction Models","authors":"Angelo Moretti, N. Shlomo","doi":"10.1111/insr.12535","DOIUrl":"https://doi.org/10.1111/insr.12535","url":null,"abstract":"Record linkage brings together information from records in two or more data sources that are believed to belong to the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations and the challenge is to link statistical units that are subject to error. We provide an overview of record linkage techniques and specifically investigate the classic Fellegi and Sunter probabilistic record linkage framework to assess whether the decision rule for classifying pairs into sets of matches and non‐matches can be improved by incorporating a statistical prediction model. We also study whether the enhanced linkage rule can provide better results in terms of preserving associations between variables in the linked data file that are not used in the matching procedure. A simulation study and an application based on real data are used to evaluate the methods.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46915188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elaboration Models with Symmetric Information Divergence. 具有对称信息发散的精化模型。
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-12-01 Epub Date: 2022-04-20 DOI: 10.1111/insr.12499
Majid Asadi, Karthik Devarajan, Nader Ebrahimi, Ehsan S Soofi, Lauren Spirko-Burns

Various statistical methodologies embed a probability distribution in a more flexible family of distributions. The latter is called elaboration model, which is constructed by choice or a formal procedure and evaluated by asymmetric measures such as the likelihood ratio and Kullback-Leibler information. The use of asymmetric measures can be problematic for this purpose. This paper introduces two formal procedures, referred to as link functions, that embed any baseline distribution with a continuous density on the real line into model elaborations. Conditions are given for the link functions to render symmetric Kullback-Leibler divergence, Rényi divergence, and phi-divergence family. The first link function elaborates quantiles of the baseline probability distribution. This approach produces continuous counterparts of the binary probability models. Examples include the Cauchy, probit, logit, Laplace, and Student-t links. The second link function elaborates the baseline survival function. Examples include the proportional odds and change point links. The logistic distribution is characterized as the one that satisfies the conditions for both links. An application demonstrates advantages of symmetric divergence measures for assessing the efficacy of covariates.

各种统计方法将概率分布嵌入到更灵活的分布族中。后者被称为精化模型,它通过选择或正式程序构建,并通过似然比和Kullback-Leibler信息等非对称度量来评估。为了达到这个目的,使用非对称度量可能会有问题。本文介绍了两个形式化的过程,称为链接函数,将任何在实线上具有连续密度的基线分布嵌入到模型阐述中。给出了连杆函数呈现对称Kullback-Leibler散度、r nyi散度和phi-散度族的条件。第一个链接函数阐述了基线概率分布的分位数。这种方法产生二元概率模型的连续对应物。例子包括柯西、probit、logit、拉普拉斯和Student-t链接。第二个链接函数阐述了基线生存函数。示例包括比例赔率和更改点链接。物流分布的特征是同时满足这两个环节的条件。一个应用证明了对称散度措施的优点,以评估协变量的效力。
{"title":"Elaboration Models with Symmetric Information Divergence.","authors":"Majid Asadi, Karthik Devarajan, Nader Ebrahimi, Ehsan S Soofi, Lauren Spirko-Burns","doi":"10.1111/insr.12499","DOIUrl":"10.1111/insr.12499","url":null,"abstract":"<p><p>Various statistical methodologies embed a probability distribution in a more flexible family of distributions. The latter is called <i>elaboration model</i>, which is constructed by choice or a formal procedure and evaluated by asymmetric measures such as the likelihood ratio and Kullback-Leibler information. The use of asymmetric measures can be problematic for this purpose. This paper introduces two formal procedures, referred to as link functions, that embed any baseline distribution with a continuous density on the real line into model elaborations. Conditions are given for the link functions to render symmetric Kullback-Leibler divergence, Rényi divergence, and phi-divergence family. The first link function elaborates quantiles of the baseline probability distribution. This approach produces continuous counterparts of the binary probability models. Examples include the Cauchy, probit, logit, Laplace, and Student-<i>t</i> links. The second link function elaborates the baseline survival function. Examples include the proportional odds and change point links. The logistic distribution is characterized as the one that satisfies the conditions for both links. An application demonstrates advantages of symmetric divergence measures for assessing the efficacy of covariates.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10193517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9509528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A joint normal-binary (probit) model 一个联合正态二值(probit)模型
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-11-08 DOI: 10.1111/insr.12532
Margaux Delporte, Steffen Fieuws, Geert Molenberghs, Geert Verbeke, Simeon Situma Wanyama, Elpis Hatziagorou, Christiane De Boeck

In biomedical research, often hierarchical binary and continuous responses need to be jointly modelled. In joint generalised linear mixed models, this can be done with correlated random effects, which allows examining the association structure between the various responses and the evolution of this association over time. In addition, the effect of covariates on all outcomes can be assessed simultaneously. Still, investigating this association is often limited to examining the correlations between the responses on an underlying scale. In addition, the interpretation of this hierarchical model is conditional on the subject-specific random effects. This paper extends this approach and shows how manifest correlations can be computed, that is, the associations between the observed responses. Further, a marginal model is formulated, in which the interpretation is no longer conditional on the random effects. In addition, prediction intervals are derived of one subvector of responses conditional on the other. These methods are applied in a case study of the lung function and allergic bronchopulmonary aspergillosis in patients with cystic fibrosis.

在生物医学研究中,通常需要对分层二值响应和连续响应进行联合建模。在联合广义线性混合模型中,这可以通过相关随机效应来完成,这允许检查各种响应之间的关联结构以及这种关联随时间的演变。此外,协变量对所有结果的影响可以同时评估。尽管如此,调查这种联系往往仅限于检查潜在规模上的反应之间的相关性。此外,该分层模型的解释取决于特定主题的随机效应。本文扩展了这种方法,并展示了如何计算明显的相关性,即观察到的响应之间的关联。进一步,建立了一个边际模型,其中解释不再以随机效应为条件。此外,还推导了响应的一个子向量以另一个子向量为条件的预测区间。这些方法应用于肺功能和过敏性支气管肺曲菌病的囊性纤维化患者的个案研究。
{"title":"A joint normal-binary (probit) model","authors":"Margaux Delporte,&nbsp;Steffen Fieuws,&nbsp;Geert Molenberghs,&nbsp;Geert Verbeke,&nbsp;Simeon Situma Wanyama,&nbsp;Elpis Hatziagorou,&nbsp;Christiane De Boeck","doi":"10.1111/insr.12532","DOIUrl":"10.1111/insr.12532","url":null,"abstract":"<div>\u0000 \u0000 <p>In biomedical research, often hierarchical binary and continuous responses need to be jointly modelled. In joint generalised linear mixed models, this can be done with correlated random effects, which allows examining the association structure between the various responses and the evolution of this association over time. In addition, the effect of covariates on all outcomes can be assessed simultaneously. Still, investigating this association is often limited to examining the correlations between the responses on an underlying scale. In addition, the interpretation of this hierarchical model is conditional on the subject-specific random effects. This paper extends this approach and shows how manifest correlations can be computed, that is, the associations between the observed responses. Further, a marginal model is formulated, in which the interpretation is no longer conditional on the random effects. In addition, prediction intervals are derived of one subvector of responses conditional on the other. These methods are applied in a case study of the lung function and allergic bronchopulmonary aspergillosis in patients with cystic fibrosis.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44702563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Likelihood-Based Inference for the Finite Population Mean with Post-Stratification Information Under Non-Ignorable Non-Response 不可忽略非响应下具有分层后信息的有限总体均值的似然推断
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-10-25 DOI: 10.1111/insr.12527
Sahar Z. Zangeneh, Roderick J. Little

We describe models and likelihood-based estimation of the finite population mean for a survey subject to unit non-response, when post-stratification information is available from external sources. A feature of the models is that they do not require the assumption that the data are missing at random (MAR). As a result, the proposed models provide estimates under weaker assumptions than those required in the absence of post-stratification information, thus allowing more robust inferences. In particular, we describe models for estimation of the finite population mean of a survey outcome with categorical covariates and externally observed categorical post-stratifiers. We compare inferences from the proposed method with existing design-based estimators via simulations. We apply our methods to school-level data from California Department of Education to estimate the mean academic performance index (API) score in years 1999 and 2000. We end with a discussion.

当从外部来源获得分层后信息时,我们描述了受单位无响应调查的有限总体均值的模型和基于似然的估计。这些模型的一个特点是,它们不需要假设数据是随机丢失的。因此,所提出的模型在较弱的假设下提供估计,而不是在缺乏分层后信息的情况下提供估计,从而允许更可靠的推断。特别是,我们描述了用分类协变量和外部观察的分类后分层来估计调查结果的有限总体均值的模型。我们通过仿真比较了所提出的方法与现有的基于设计的估计方法的推断。我们将我们的方法应用于加州教育部的校级数据,以估计1999年和2000年的平均学业表现指数(API)分数。我们以讨论结束。
{"title":"Likelihood-Based Inference for the Finite Population Mean with Post-Stratification Information Under Non-Ignorable Non-Response","authors":"Sahar Z. Zangeneh,&nbsp;Roderick J. Little","doi":"10.1111/insr.12527","DOIUrl":"10.1111/insr.12527","url":null,"abstract":"<div>\u0000 \u0000 <p>We describe models and likelihood-based estimation of the finite population mean for a survey subject to unit non-response, when post-stratification information is available from external sources. A feature of the models is that they do not require the assumption that the data are missing at random (MAR). As a result, the proposed models provide estimates under weaker assumptions than those required in the absence of post-stratification information, thus allowing more robust inferences. In particular, we describe models for estimation of the finite population mean of a survey outcome with categorical covariates and externally observed categorical post-stratifiers. We compare inferences from the proposed method with existing design-based estimators via simulations. We apply our methods to school-level data from California Department of Education to estimate the mean academic performance index (API) score in years 1999 and 2000. We end with a discussion.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47286071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Global seasonal and pandemic patterns in influenza: An application of longitudinal study designs 流感的全球季节性和大流行模式:纵向研究设计的应用
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-10-23 DOI: 10.1111/insr.12529
Elena N. Naumova, Ryan B. Simpson, Bingjie Zhou, Meghan A. Hartwick

The confluence of growing analytic capacities and global surveillance systems for seasonal infections has created new opportunities to further develop statistical methodology and advance the understanding of the global disease dynamics. We developed a framework to characterise the seasonality of infectious diseases for publicly available global health surveillance data. Specifically, we aimed to estimate the seasonal characteristics and their uncertainty using mixed effects models with harmonic components and the δ-method and develop multi-panel visualisations to present complex interplay of seasonal peaks across geographic locations. We compiled a set of 2 422 weekly time series of 14 reported outcomes for 173 Member States from the World Health Organization's (WHO) international influenza virological surveillance system, FluNet, from 02 January 1995 through 20 June 2021. We produced an analecta of data visualisations to describe global travelling waves of influenza while addressing issues of data completeness and credibility. Our results offer directions for further improvements in data collection, reporting, analysis and development of statistical methodology and predictive approaches.

日益增长的分析能力和全球季节性感染监测系统的汇合为进一步发展统计方法和促进对全球疾病动态的了解创造了新的机会。我们开发了一个框架,为公开的全球卫生监测数据描述传染病的季节性特征。具体来说,我们的目标是使用谐波分量和δ方法的混合效应模型来估计季节特征及其不确定性,并开发多面板可视化来呈现不同地理位置的季节峰值的复杂相互作用。从1995年1月2日至2021年6月20日,我们编制了一套2422个每周时间序列,其中包括世界卫生组织(世卫组织)国际流感病毒学监测系统fluet的173个会员国的14项报告结果。我们制作了一份数据可视化的analecata,以描述全球流感传播波,同时解决数据完整性和可信度问题。我们的研究结果为进一步改进数据收集、报告、分析以及统计方法和预测方法的发展提供了方向。
{"title":"Global seasonal and pandemic patterns in influenza: An application of longitudinal study designs","authors":"Elena N. Naumova,&nbsp;Ryan B. Simpson,&nbsp;Bingjie Zhou,&nbsp;Meghan A. Hartwick","doi":"10.1111/insr.12529","DOIUrl":"10.1111/insr.12529","url":null,"abstract":"<div>\u0000 \u0000 <p>The confluence of growing analytic capacities and global surveillance systems for seasonal infections has created new opportunities to further develop statistical methodology and advance the understanding of the global disease dynamics. We developed a framework to characterise the seasonality of infectious diseases for publicly available global health surveillance data. Specifically, we aimed to estimate the seasonal characteristics and their uncertainty using mixed effects models with harmonic components and the δ-method and develop multi-panel visualisations to present complex interplay of seasonal peaks across geographic locations. We compiled a set of 2 422 weekly time series of 14 reported outcomes for 173 Member States from the World Health Organization's (WHO) international influenza virological surveillance system, FluNet, from 02 January 1995 through 20 June 2021. We produced an analecta of data visualisations to describe global travelling waves of influenza while addressing issues of data completeness and credibility. Our results offer directions for further improvements in data collection, reporting, analysis and development of statistical methodology and predictive approaches.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41356663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synergy of Biostatistics and Epidemiology in Air Pollution Health Effects Studies 生物统计学和流行病学在空气污染健康影响研究中的协同作用
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-10-21 DOI: 10.1111/insr.12525
Douglas W. Dockery

The extraordinary advances in quantifying the health effects of ambient air pollution over the last five decades have led to dramatic improvement in air quality in the United States. This work has been possible through innovative epidemiologic study designs coupled with advanced statistical analytic methods. This paper presents a historical perspective on the coordinated developments of epidemiologic designs and statistical methods for air pollution health effects studies at the Harvard School of Public Health.

在过去的五十年里,在量化环境空气污染对健康的影响方面取得了非凡的进步,这使得美国的空气质量得到了巨大的改善。通过创新的流行病学研究设计与先进的统计分析方法相结合,这项工作成为可能。本文介绍了哈佛大学公共卫生学院空气污染健康影响研究的流行病学设计和统计方法的协调发展的历史观点。
{"title":"Synergy of Biostatistics and Epidemiology in Air Pollution Health Effects Studies","authors":"Douglas W. Dockery","doi":"10.1111/insr.12525","DOIUrl":"10.1111/insr.12525","url":null,"abstract":"<p>The extraordinary advances in quantifying the health effects of ambient air pollution over the last five decades have led to dramatic improvement in air quality in the United States. This work has been possible through innovative epidemiologic study designs coupled with advanced statistical analytic methods. This paper presents a historical perspective on the coordinated developments of epidemiologic designs and statistical methods for air pollution health effects studies at the Harvard School of Public Health.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b1/fc/INSR-90-S67.PMC9828424.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10526357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Path algorithms for fused lasso signal approximator with application to COVID-19 spread in Korea 融合套索信号逼近器路径算法及其在国内COVID-19传播中的应用
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-10-19 DOI: 10.1111/insr.12521
Won Son, Johan Lim, Donghyeon Yu

The fused lasso signal approximator (FLSA) is a smoothing procedure for noisy observations that uses fused lasso penalty on unobserved mean levels to find sparse signal blocks. Several path algorithms have been developed to obtain the whole solution path of the FLSA. However, it is known that the FLSA has model selection inconsistency when the underlying signals have a stair-case block, where three consecutive signal blocks are either strictly increasing or decreasing. Modified path algorithms for the FLSA have been proposed to guarantee model selection consistency regardless of the stair-case block. In this paper, we provide a comprehensive review of the path algorithms for the FLSA and prove the properties of the recently modified path algorithms' hitting times. Specifically, we reinterpret the modified path algorithm as the path algorithm for local FLSA problems and reveal the condition that the hitting time for the fusion of the modified path algorithm is not monotone in a tuning parameter. To recover the monotonicity of the solution path, we propose a pathwise adaptive FLSA having monotonicity with similar performance as the modified solution path algorithm. Finally, we apply the proposed method to the number of daily-confirmed cases of COVID-19 in Korea to identify the change points of its spread.

融合套索信号逼近器(FLSA)是一种用于噪声观测的平滑过程,它在未观测到的平均水平上使用融合套索惩罚来寻找稀疏信号块。已经开发了几种路径算法来获得FLSA的整个求解路径。然而,已知当基础信号具有阶梯块时,FLSA具有模型选择不一致性,其中三个连续信号块严格增加或减少。已经提出了FLSA的改进路径算法,以保证模型选择的一致性,而不考虑楼梯间块。在本文中,我们对FLSA的路径算法进行了全面的回顾,并证明了最近修改的路径算法的命中时间的性质。具体来说,我们将改进的路径算法重新解释为局部FLSA问题的路径算法,并揭示了改进的路径方法的融合命中时间在调谐参数上不是单调的条件。为了恢复解路径的单调性,我们提出了一种具有单调性的路径自适应FLSA,其性能与改进的解路径算法相似。最后,我们将所提出的方法应用于韩国每日确诊的新冠肺炎病例数,以确定其传播的变化点。
{"title":"Path algorithms for fused lasso signal approximator with application to COVID-19 spread in Korea","authors":"Won Son,&nbsp;Johan Lim,&nbsp;Donghyeon Yu","doi":"10.1111/insr.12521","DOIUrl":"10.1111/insr.12521","url":null,"abstract":"<div>\u0000 \u0000 <p>The fused lasso signal approximator (FLSA) is a smoothing procedure for noisy observations that uses fused lasso penalty on unobserved mean levels to find sparse signal blocks. Several path algorithms have been developed to obtain the whole solution path of the FLSA. However, it is known that the FLSA has model selection inconsistency when the underlying signals have a stair-case block, where three consecutive signal blocks are either strictly increasing or decreasing. Modified path algorithms for the FLSA have been proposed to guarantee model selection consistency regardless of the stair-case block. In this paper, we provide a comprehensive review of the path algorithms for the FLSA and prove the properties of the recently modified path algorithms' hitting times. Specifically, we reinterpret the modified path algorithm as the path algorithm for local FLSA problems and reveal the condition that the hitting time for the fusion of the modified path algorithm is not monotone in a tuning parameter. To recover the monotonicity of the solution path, we propose a pathwise adaptive FLSA having monotonicity with similar performance as the modified solution path algorithm. Finally, we apply the proposed method to the number of daily-confirmed cases of COVID-19 in Korea to identify the change points of its spread.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874640/pdf/INSR-9999-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10584381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accounting for Non-ignorable Sampling and Non-response in Statistical Matching 统计匹配中不可忽略抽样和无响应的解释
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-10-19 DOI: 10.1111/insr.12524
Daniela Marella, Danny Pfeffermann

Data for statistical analysis is often available from different samples, with each sample containing measurements on only some of the variables of interest. Statistical matching attempts to generate a fused database containing matched measurements on all the target variables. In this article, we consider the use of statistical matching when the samples are drawn by informative sampling designs and are subject to not missing at random non-response. The problem with ignoring the sampling process and non-response is that the distribution of the data observed for the responding units can be very different from the distribution holding for the population data, which may distort the inference process and result in a matched database that misrepresents the joint distribution in the population. Our proposed methodology employs the empirical likelihood approach and is shown to perform well in a simulation experiment and when applied to real sample data.

用于统计分析的数据通常来自不同的样本,每个样本只包含对感兴趣的一些变量的测量。统计匹配尝试生成包含所有目标变量的匹配测量的融合数据库。在这篇文章中,当样本是通过信息采样设计绘制的,并且在随机无响应时不会丢失时,我们考虑使用统计匹配。忽略采样过程和非响应的问题是,响应单元观测到的数据分布可能与总体数据的分布非常不同,这可能会扭曲推理过程,并导致匹配的数据库歪曲总体中的联合分布。我们提出的方法采用了经验似然法,并在模拟实验中和应用于真实样本数据时表现良好。
{"title":"Accounting for Non-ignorable Sampling and Non-response in Statistical Matching","authors":"Daniela Marella,&nbsp;Danny Pfeffermann","doi":"10.1111/insr.12524","DOIUrl":"10.1111/insr.12524","url":null,"abstract":"<p>Data for statistical analysis is often available from different samples, with each sample containing measurements on only some of the variables of interest. Statistical matching attempts to generate a fused database containing matched measurements on all the target variables. In this article, we consider the use of statistical matching when the samples are drawn by informative sampling designs and are subject to not missing at random non-response. The problem with ignoring the sampling process and non-response is that the distribution of the data observed for the responding units can be very different from the distribution holding for the population data, which may distort the inference process and result in a matched database that misrepresents the joint distribution in the population. Our proposed methodology employs the empirical likelihood approach and is shown to perform well in a simulation experiment and when applied to real sample data.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47607196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Thinking Clearly with Data: A Guide to Quantitative Reasoning and AnalysisEthan BuenodeMesquita and AnthonyFowlerPrinceton University Press, 2021, 400 pages, $95.00/£74.00, hardback ISBN: 978‐0‐691‐21436‐8 用数据清晰思考:定量推理和分析指南伊桑·布埃诺·德梅斯基塔和安东尼·福斯特普林斯顿大学出版社,2021年,400页,95.00美元/ 74.00英镑,精装本ISBN: 978‐0‐691‐21436‐8
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-10-19 DOI: 10.1111/insr.12530
G. Dekkers
{"title":"Thinking Clearly with Data: A Guide to Quantitative Reasoning and AnalysisEthan BuenodeMesquita and AnthonyFowlerPrinceton University Press, 2021, 400 pages, $95.00/£74.00, hardback ISBN: 978‐0‐691‐21436‐8","authors":"G. Dekkers","doi":"10.1111/insr.12530","DOIUrl":"https://doi.org/10.1111/insr.12530","url":null,"abstract":"","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49030563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Bootstrap Variance Procedure for the Generalised Regression Estimator 广义回归估计量的自举方差法
IF 2 3区 数学 Q1 Mathematics Pub Date : 2022-10-19 DOI: 10.1111/insr.12528
Marius Stefan, Michael A. Hidiroglou

The generalised regression estimator (GREG) uses auxiliary data that are available from the finite population to improve the efficiency of the estimator of a total (mean). Estimators of the variance of GREG that have been proposed in the sampling literature include those based on Taylor linearisation and the jackknife techniques. Approximations based on Taylor expansions are reasonable for large samples. However, when the sample size is small, the Taylor-based variance estimator has a large negative bias. The jackknife variance estimators overestimate the variance of GREG for small sample sizes. We offset these setbacks using a bootstrap procedure for estimating the variance of the GREG. The method uses a bootstrap population constructed with the model underlying the GREG estimator. Repeated samples are selected in the bootstrap population according to the design used to select the initial sample, and the variability associated with these bootstrap samples is used to compute the proposed bootstrap variance estimator. Simulations show that the new bootstrap estimator has a small bias for samples that have few observations.

广义回归估计器(GREG)使用从有限总体中可用的辅助数据来提高总(均值)估计器的效率。在抽样文献中提出的方差估计包括基于泰勒线性化和刀切技术的方差估计。基于泰勒展开的近似对于大样本是合理的。然而,当样本量较小时,基于泰勒的方差估计量具有较大的负偏差。对于小样本量,折刀方差估计器高估了GREG的方差。我们使用自举方法来估计GREG的方差来抵消这些挫折。该方法使用基于GREG估计器的模型构造的自举总体。根据用于选择初始样本的设计,在自举总体中选择重复样本,并使用与这些自举样本相关的可变性来计算提出的自举方差估计量。仿真结果表明,对于观测值较少的样本,新的自举估计器具有较小的偏差。
{"title":"A Bootstrap Variance Procedure for the Generalised Regression Estimator","authors":"Marius Stefan,&nbsp;Michael A. Hidiroglou","doi":"10.1111/insr.12528","DOIUrl":"10.1111/insr.12528","url":null,"abstract":"<div>\u0000 \u0000 <p>The generalised regression estimator (GREG) uses auxiliary data that are available from the finite population to improve the efficiency of the estimator of a total (mean). Estimators of the variance of GREG that have been proposed in the sampling literature include those based on Taylor linearisation and the jackknife techniques. Approximations based on Taylor expansions are reasonable for large samples. However, when the sample size is small, the Taylor-based variance estimator has a large negative bias. The jackknife variance estimators overestimate the variance of GREG for small sample sizes. We offset these setbacks using a bootstrap procedure for estimating the variance of the GREG. The method uses a bootstrap population constructed with the model underlying the GREG estimator. Repeated samples are selected in the bootstrap population according to the design used to select the initial sample, and the variability associated with these bootstrap samples is used to compute the proposed bootstrap variance estimator. Simulations show that the new bootstrap estimator has a small bias for samples that have few observations.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48498860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Statistical Review
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1