首页 > 最新文献

arXiv: Methodology最新文献

英文 中文
Optimal tests for elliptical symmetry: specified and unspecified location 椭圆对称的最佳测试:指定和未指定的位置
Pub Date : 2019-11-19 DOI: 10.3150/20-BEJ1305
Slađana Babić, Laetitia Gelbgras, M. Hallin, Christophe Ley
Although the assumption of elliptical symmetry is quite common in multivariate analysis and widespread in a number of applications, the problem of testing the null hypothesis of ellipticity so far has not been addressed in a fully satisfactory way. Most of the literature in the area indeed addresses the null hypothesis of elliptical symmetry with specified location and actually addresses location rather than non-elliptical alternatives. In thi spaper, we are proposing new classes of testing procedures,both for specified and unspecified location. The backbone of our construction is Le Cam’s asymptotic theory of statistical experiments, and optimality is to be understood locally and asymptotically within the family of generalized skew-elliptical distributions. The tests we are proposing are meeting all the desired properties of a “good” test of elliptical symmetry:they have a simple asymptotic distribution under the entire null hypothesis of elliptical symmetry with unspecified radial density and shape parameter; they are affine-invariant, computationally fast, intuitively understandable, and not too demanding in terms of moments. While achieving optimality against generalized skew-elliptical alternatives, they remain quite powerful under a much broader class of non-elliptical distributions and significantly outperform the available competitors
虽然椭圆对称假设在多元分析中很常见,在许多应用中也很广泛,但迄今为止,检验椭圆性零假设的问题还没有得到完全令人满意的解决。该领域的大多数文献确实解决了具有指定位置的椭圆对称的零假设,并且实际上解决了位置而不是非椭圆替代。在本文中,我们提出了新的测试程序类别,既适用于指定地点,也适用于未指定地点。我们构建的支柱是Le Cam的渐近统计实验理论,最优性是在广义偏椭圆分布族中局部和渐近地理解的。我们提出的检验满足椭圆对称“好”检验的所有期望性质:它们在椭圆对称的整个零假设下具有不指定径向密度和形状参数的简单渐近分布;它们是仿射不变的,计算速度快,直观易懂,并且在矩方面要求不高。在实现针对广义偏椭圆替代方案的最优性的同时,它们在更广泛的非椭圆分布类别下仍然非常强大,并且明显优于现有的竞争对手
{"title":"Optimal tests for elliptical symmetry: specified and unspecified location","authors":"Slađana Babić, Laetitia Gelbgras, M. Hallin, Christophe Ley","doi":"10.3150/20-BEJ1305","DOIUrl":"https://doi.org/10.3150/20-BEJ1305","url":null,"abstract":"Although the assumption of elliptical symmetry is quite common in multivariate analysis and widespread in a number of applications, the problem of testing the null hypothesis of ellipticity so far has not been addressed in a fully satisfactory way. Most of the literature in the area indeed addresses the null hypothesis of elliptical symmetry with specified location and actually addresses location rather than non-elliptical alternatives. In thi spaper, we are proposing new classes of testing procedures,both for specified and unspecified location. The backbone of our construction is Le Cam’s asymptotic theory of statistical experiments, and optimality is to be understood locally and asymptotically within the family of generalized skew-elliptical distributions. The tests we are proposing are meeting all the desired properties of a “good” test of elliptical symmetry:they have a simple asymptotic distribution under the entire null hypothesis of elliptical symmetry with unspecified radial density and shape parameter; they are affine-invariant, computationally fast, intuitively understandable, and not too demanding in terms of moments. While achieving optimality against generalized skew-elliptical alternatives, they remain quite powerful under a much broader class of non-elliptical distributions and significantly outperform the available competitors","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115518222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Wasserstein $F$-tests and confidence bands for the Fréchet regression of density response curves 密度响应曲线的fr<s:1>回归的Wasserstein $F$检验和置信带
Pub Date : 2019-10-29 DOI: 10.1214/20-AOS1971
Alexander Petersen, Xi Liu, A. Divani
Data consisting of samples of probability density functions are increasingly prevalent, necessitating the development of methodologies for their analysis that respect the inherent nonlinearities associated with densities. In many applications, density curves appear as functional response objects in a regression model with vector predictors. For such models, inference is key to understand the importance of density-predictor relationships, and the uncertainty associated with the estimated conditional mean densities, defined as conditional Frechet means under a suitable metric. Using the Wasserstein geometry of optimal transport, we consider the Frechet regression of density curve responses and develop tests for global and partial effects, as well as simultaneous confidence bands for estimated conditional mean densities. The asymptotic behavior of these objects is based on underlying functional central limit theorems within Wasserstein space, and we demonstrate that they are asymptotically of the correct size and coverage, with uniformly strong consistency of the proposed tests under sequences of contiguous alternatives. The accuracy of these methods, including nominal size, power, and coverage, is assessed through simulations, and their utility is illustrated through a regression analysis of post-intracerebral hemorrhage hematoma densities and their associations with a set of clinical and radiological covariates.
由概率密度函数的样本组成的数据越来越普遍,这就需要开发与密度相关的固有非线性相关的分析方法。在许多应用中,密度曲线在带有向量预测器的回归模型中表现为功能响应对象。对于这样的模型,推断是理解密度-预测器关系的重要性的关键,以及与估计的条件平均密度相关的不确定性,定义为适当度量下的条件Frechet平均值。使用最佳输运的Wasserstein几何,我们考虑密度曲线响应的Frechet回归,并开发了全局和局部效应的测试,以及估计条件平均密度的同时置信带。这些对象的渐近行为是基于Wasserstein空间中潜在的泛函中心极限定理,我们证明了它们的渐近大小和覆盖是正确的,并且在相邻备选序列下所提出的检验具有一致的强一致性。通过模拟评估这些方法的准确性,包括标称大小、功率和覆盖范围,并通过脑出血后血肿密度的回归分析及其与一组临床和放射协变量的关联来说明它们的实用性。
{"title":"Wasserstein $F$-tests and confidence bands for the Fréchet regression of density response curves","authors":"Alexander Petersen, Xi Liu, A. Divani","doi":"10.1214/20-AOS1971","DOIUrl":"https://doi.org/10.1214/20-AOS1971","url":null,"abstract":"Data consisting of samples of probability density functions are increasingly prevalent, necessitating the development of methodologies for their analysis that respect the inherent nonlinearities associated with densities. In many applications, density curves appear as functional response objects in a regression model with vector predictors. For such models, inference is key to understand the importance of density-predictor relationships, and the uncertainty associated with the estimated conditional mean densities, defined as conditional Frechet means under a suitable metric. Using the Wasserstein geometry of optimal transport, we consider the Frechet regression of density curve responses and develop tests for global and partial effects, as well as simultaneous confidence bands for estimated conditional mean densities. The asymptotic behavior of these objects is based on underlying functional central limit theorems within Wasserstein space, and we demonstrate that they are asymptotically of the correct size and coverage, with uniformly strong consistency of the proposed tests under sequences of contiguous alternatives. The accuracy of these methods, including nominal size, power, and coverage, is assessed through simulations, and their utility is illustrated through a regression analysis of post-intracerebral hemorrhage hematoma densities and their associations with a set of clinical and radiological covariates.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127802104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Structural Equation Models as Computation Graphs 结构方程模型作为计算图
Pub Date : 2019-10-16 DOI: 10.23668/PSYCHARCHIVES.2623
E. V. Kesteren, D. Oberski
Structural equation modeling (SEM) is a popular tool in the social and behavioural sciences, where it is being applied to ever more complex data types. The high-dimensional data produced by modern sensors, brain images, or (epi)genetic measurements require variable selection using parameter penalization; experimental models combining disparate data sources benefit from regularization to obtain a stable result; and genomic SEM or network models lead to alternative objective functions. With each proposed extension, researchers currently have to completely reformulate SEM and its optimization algorithm -- a challenging and time-consuming task. In this paper, we consider each SEM as a computation graph, a flexible method of specifying objective functions borrowed from the field of deep learning. When combined with state-of-the-art optimizers, our computation graph approach can extend SEM without the need for bespoke software development. We show that both existing and novel SEM improvements follow naturally from our approach. To demonstrate, we discuss least absolute deviation estimation and penalized regression models. We also introduce spike-and-slab SEM, which may perform better when shrinkage of large factor loadings is not desired. By applying computation graphs to SEM, we hope to greatly accelerate the process of developing SEM techniques, paving the way for new applications. We provide an accompanying R package tensorsem.
结构方程建模(SEM)在社会和行为科学中是一种流行的工具,它被应用于越来越复杂的数据类型。由现代传感器、大脑图像或(epi)遗传测量产生的高维数据需要使用参数惩罚进行变量选择;结合不同数据源的实验模型受益于正则化以获得稳定的结果;和基因组扫描电镜或网络模型导致替代的目标函数。对于每一个提出的扩展,研究人员目前都必须完全重新制定SEM及其优化算法,这是一项具有挑战性且耗时的任务。在本文中,我们将每个SEM视为一个计算图,这是一种从深度学习领域借鉴的灵活的指定目标函数的方法。当与最先进的优化器相结合时,我们的计算图方法可以扩展SEM,而无需定制软件开发。我们表明,现有的和新的SEM改进自然遵循我们的方法。为了证明这一点,我们讨论了最小绝对偏差估计和惩罚回归模型。我们还介绍了尖峰-板状扫描电镜,当不需要大因子加载的收缩时,它可能会表现得更好。通过将计算图应用于扫描电镜,我们希望大大加快扫描电镜技术的发展进程,为新的应用铺平道路。我们提供了一个附带的R包张量。
{"title":"Structural Equation Models as Computation Graphs","authors":"E. V. Kesteren, D. Oberski","doi":"10.23668/PSYCHARCHIVES.2623","DOIUrl":"https://doi.org/10.23668/PSYCHARCHIVES.2623","url":null,"abstract":"Structural equation modeling (SEM) is a popular tool in the social and behavioural sciences, where it is being applied to ever more complex data types. The high-dimensional data produced by modern sensors, brain images, or (epi)genetic measurements require variable selection using parameter penalization; experimental models combining disparate data sources benefit from regularization to obtain a stable result; and genomic SEM or network models lead to alternative objective functions. With each proposed extension, researchers currently have to completely reformulate SEM and its optimization algorithm -- a challenging and time-consuming task. In this paper, we consider each SEM as a computation graph, a flexible method of specifying objective functions borrowed from the field of deep learning. When combined with state-of-the-art optimizers, our computation graph approach can extend SEM without the need for bespoke software development. We show that both existing and novel SEM improvements follow naturally from our approach. To demonstrate, we discuss least absolute deviation estimation and penalized regression models. We also introduce spike-and-slab SEM, which may perform better when shrinkage of large factor loadings is not desired. By applying computation graphs to SEM, we hope to greatly accelerate the process of developing SEM techniques, paving the way for new applications. We provide an accompanying R package tensorsem.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125262448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Interactive martingale tests for the global null 全局null的交互式鞅测试
Pub Date : 2019-09-16 DOI: 10.1214/20-ejs1790
Boyan Duan, Aaditya Ramdas, Sivaraman Balakrishnan, L. Wasserman
Global null testing is a classical problem going back about a century to Fisher's and Stouffer's combination tests. In this work, we present simple martingale analogs of these classical tests, which are applicable in two distinct settings: (a) the online setting in which there is a possibly infinite sequence of $p$-values, and (b) the batch setting, where one uses prior knowledge to preorder the hypotheses. Through theory and simulations, we demonstrate that our martingale variants have higher power than their classical counterparts even when the preordering is only weakly informative. Finally, using a recent idea of "masking" $p$-values, we develop a novel interactive test for the global null that can take advantage of covariates and repeated user guidance to create a data-adaptive ordering that achieves higher detection power against structured alternatives.
全局零检验是一个经典问题,可以追溯到大约一个世纪前的Fisher和Stouffer的组合检验。在这项工作中,我们提出了这些经典检验的简单鞅类似物,它们适用于两种不同的设置:(a)在线设置,其中可能存在无限序列的$p$值,以及(b)批设置,其中使用先验知识预先排序假设。通过理论和仿真,我们证明了我们的鞅变体即使在预排序只有弱信息的情况下也比它们的经典变体具有更高的功率。最后,使用最近的“屏蔽”$p$值的想法,我们开发了一种新的全局null交互式测试,该测试可以利用协变量和重复用户指导来创建数据自适应排序,从而对结构化替代方案实现更高的检测能力。
{"title":"Interactive martingale tests for the global null","authors":"Boyan Duan, Aaditya Ramdas, Sivaraman Balakrishnan, L. Wasserman","doi":"10.1214/20-ejs1790","DOIUrl":"https://doi.org/10.1214/20-ejs1790","url":null,"abstract":"Global null testing is a classical problem going back about a century to Fisher's and Stouffer's combination tests. In this work, we present simple martingale analogs of these classical tests, which are applicable in two distinct settings: (a) the online setting in which there is a possibly infinite sequence of $p$-values, and (b) the batch setting, where one uses prior knowledge to preorder the hypotheses. Through theory and simulations, we demonstrate that our martingale variants have higher power than their classical counterparts even when the preordering is only weakly informative. Finally, using a recent idea of \"masking\" $p$-values, we develop a novel interactive test for the global null that can take advantage of covariates and repeated user guidance to create a data-adaptive ordering that achieves higher detection power against structured alternatives.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121558758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Bayesian Model Calibration for Extrapolative Prediction via Gibbs Posteriors. Gibbs后验外推预测的贝叶斯模型校正。
Pub Date : 2019-09-01 DOI: 10.2172/1763261
S. Woody, N. Ghaffari, L. Hund
The current standard Bayesian approach to model calibration, which assigns a Gaussian process prior to the discrepancy term, often suffers from issues of unidentifiability and computational complexity and instability. When the goal is to quantify uncertainty in physical parameters for extrapolative prediction, then there is no need to perform inference on the discrepancy term. With this in mind, we introduce Gibbs posteriors as an alternative Bayesian method for model calibration, which updates the prior with a loss function connecting the data to the parameter. The target of inference is the physical parameter value which minimizes the expected loss. We propose to tune the loss scale of the Gibbs posterior to maintain nominal frequentist coverage under assumptions of the form of model discrepancy, and present a bootstrap implementation for approximating coverage rates. Our approach is highly modular, allowing an analyst to easily encode a wide variety of such assumptions. Furthermore, we provide a principled method of combining posteriors calculated from data subsets. We apply our methods to data from an experiment measuring the material properties of tantalum.
目前标准的贝叶斯模型校准方法,在差异项之前分配一个高斯过程,经常存在不可识别性、计算复杂性和不稳定性的问题。当目标是量化物理参数中的不确定性进行外推预测时,则不需要对差异项进行推理。考虑到这一点,我们引入吉布斯后验作为模型校准的替代贝叶斯方法,它通过将数据连接到参数的损失函数来更新先验。推理的目标是使预期损失最小的物理参数值。我们建议调整Gibbs后验的损失尺度,以在模型差异形式的假设下保持名义频率覆盖,并提出近似覆盖率的自举实现。我们的方法是高度模块化的,允许分析人员轻松地对各种各样的假设进行编码。此外,我们提供了一种结合从数据子集计算的后验的原则方法。我们将我们的方法应用于测量钽材料性质的实验数据。
{"title":"Bayesian Model Calibration for Extrapolative Prediction via Gibbs Posteriors.","authors":"S. Woody, N. Ghaffari, L. Hund","doi":"10.2172/1763261","DOIUrl":"https://doi.org/10.2172/1763261","url":null,"abstract":"The current standard Bayesian approach to model calibration, which assigns a Gaussian process prior to the discrepancy term, often suffers from issues of unidentifiability and computational complexity and instability. When the goal is to quantify uncertainty in physical parameters for extrapolative prediction, then there is no need to perform inference on the discrepancy term. With this in mind, we introduce Gibbs posteriors as an alternative Bayesian method for model calibration, which updates the prior with a loss function connecting the data to the parameter. The target of inference is the physical parameter value which minimizes the expected loss. We propose to tune the loss scale of the Gibbs posterior to maintain nominal frequentist coverage under assumptions of the form of model discrepancy, and present a bootstrap implementation for approximating coverage rates. Our approach is highly modular, allowing an analyst to easily encode a wide variety of such assumptions. Furthermore, we provide a principled method of combining posteriors calculated from data subsets. We apply our methods to data from an experiment measuring the material properties of tantalum.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130064484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Clustering Longitudinal Life-Course Sequences Using Mixtures of Exponential-Distance Models 利用混合指数距离模型聚类纵向生命历程序列
Pub Date : 2019-08-21 DOI: 10.31235/osf.io/f5n8k
Keefe Murphy, Brendan Murphy, R. Piccarreta, I. C. Gormley
Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics.Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.
序列分析是一种日益流行的方法,用于分析由受试者在一段时间内经历的有序活动集合所表示的生命历程。在这里,我们分析了一组调查数据集,其中包含了一群年龄在16岁至22岁之间的北爱尔兰青年的职业轨迹信息。我们提出了一种新颖的、基于模型的聚类方法,适合于从整体角度分析这些数据,目的是估计典型职业轨迹的数量,识别这些模式的相关特征,并评估这些模式受背景特征影响的程度。存在几种标准来测量分类序列之间的两两不相似性。通常,不相似矩阵被用作启发式聚类算法的输入。我们开发的方法系列直接使用指数距离模型的混合物来聚类序列。基于汉明距离度量的加权变量的模型允许参数估计的封闭形式表达式。同时,允许组件隶属概率依赖于固定的协变量,并在聚类过程中容纳采样权重,从而产生对北爱尔兰数据的新见解。特别是,我们发现学校考试成绩是集群隶属度的最重要的单一预测因子。
{"title":"Clustering Longitudinal Life-Course Sequences Using Mixtures of Exponential-Distance Models","authors":"Keefe Murphy, Brendan Murphy, R. Piccarreta, I. C. Gormley","doi":"10.31235/osf.io/f5n8k","DOIUrl":"https://doi.org/10.31235/osf.io/f5n8k","url":null,"abstract":"Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics.Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132776684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Full-semiparametric-likelihood-based inference for non-ignorable missing data 不可忽略缺失数据的全半参数似然推理
Pub Date : 2019-08-04 DOI: 10.5705/ss.202019.0243
Yukun Liu, Pengfei Li, J. Qin
During the past few decades, missing-data problems have been studied extensively, with a focus on the ignorable missing case, where the missing probability depends only on observable quantities. By contrast, research into non-ignorable missing data problems is quite limited. The main difficulty in solving such problems is that the missing probability and the regression likelihood function are tangled together in the likelihood presentation, and the model parameters may not be identifiable even under strong parametric model assumptions. In this paper we discuss a semiparametric model for non-ignorable missing data and propose a maximum full semiparametric likelihood estimation method, which is an efficient combination of the parametric conditional likelihood and the marginal nonparametric biased sampling likelihood. The extra marginal likelihood contribution can not only produce efficiency gain but also identify the underlying model parameters without additional assumptions. We further show that the proposed estimators for the underlying parameters and the response mean are semiparametrically efficient. Extensive simulations and a real data analysis demonstrate the advantage of the proposed method over competing methods.
在过去的几十年里,丢失数据问题得到了广泛的研究,重点是可忽略的丢失情况,其中丢失概率仅取决于可观察到的数量。相比之下,对不可忽视的缺失数据问题的研究相当有限。解决这类问题的主要困难在于缺失概率和回归似然函数在似然表示中纠缠在一起,即使在强参数模型假设下,模型参数也可能无法识别。本文讨论了不可忽略缺失数据的半参数模型,提出了一种极大全半参数似然估计方法,该方法是参数条件似然和边际非参数偏抽样似然的有效结合。额外的边际似然贡献不仅可以产生效率增益,而且可以在没有额外假设的情况下识别潜在的模型参数。我们进一步证明了所提出的基础参数估计和响应均值估计是半参数有效的。大量的仿真和实际数据分析表明,该方法优于竞争对手的方法。
{"title":"Full-semiparametric-likelihood-based inference for non-ignorable missing data","authors":"Yukun Liu, Pengfei Li, J. Qin","doi":"10.5705/ss.202019.0243","DOIUrl":"https://doi.org/10.5705/ss.202019.0243","url":null,"abstract":"During the past few decades, missing-data problems have been studied extensively, with a focus on the ignorable missing case, where the missing probability depends only on observable quantities. By contrast, research into non-ignorable missing data problems is quite limited. The main difficulty in solving such problems is that the missing probability and the regression likelihood function are tangled together in the likelihood presentation, and the model parameters may not be identifiable even under strong parametric model assumptions. In this paper we discuss a semiparametric model for non-ignorable missing data and propose a maximum full semiparametric likelihood estimation method, which is an efficient combination of the parametric conditional likelihood and the marginal nonparametric biased sampling likelihood. The extra marginal likelihood contribution can not only produce efficiency gain but also identify the underlying model parameters without additional assumptions. We further show that the proposed estimators for the underlying parameters and the response mean are semiparametrically efficient. Extensive simulations and a real data analysis demonstrate the advantage of the proposed method over competing methods.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121283531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths 在估计超市进入对心血管疾病死亡的影响时,减轻未观察到的空间混淆
Pub Date : 2019-07-28 DOI: 10.1214/20-aoas1377
P. Schnell, Georgia Papadogeorgou
Confounding by unmeasured spatial variables has received some attention in the spatial statistics and causal inference literatures, but concepts and approaches have remained largely separated. In this paper, we aim to bridge these distinct strands of statistics by considering unmeasured spatial confounding within a causal inference framework, and estimating effects using outcome regression tools popular within the spatial literature. First, we show how using spatially correlated random effects in the outcome model, an approach common among spatial statisticians, does not necessarily mitigate bias due to spatial confounding, a previously published but not universally known result. Motivated by the bias term of commonly-used estimators, we propose an affine estimator which addresses this deficiency. We discuss how unbiased estimation of causal parameters in the presence of unmeasured spatial confounding can only be achieved under an untestable set of assumptions which will often be application-specific. We provide a set of assumptions which describe how the exposure and outcome of interest relate to the unmeasured variables, and which is sufficient for identification of the causal effect based on the observed data. We examine identifiability issues through the lens of restricted maximum likelihood estimation in linear models, and implement our method using a fully Bayesian approach applicable to any type of outcome variable. This work is motivated by and used to estimate the effect of county-level limited access to supermarkets on the rate of cardiovascular disease deaths in the elderly across the whole continental United States. Even though standard approaches return null or protective effects, our approach uncovers evidence of unobserved spatial confounding, and indicates that limited supermarket access has a harmful effect on cardiovascular mortality.
在空间统计和因果推理文献中,不可测空间变量引起的混淆得到了一定的关注,但概念和方法仍然存在很大的分离。在本文中,我们的目标是通过在因果推理框架内考虑不可测量的空间混淆,并使用空间文献中流行的结果回归工具估计效果,来弥合这些不同的统计数据。首先,我们展示了如何在结果模型中使用空间相关随机效应,这是空间统计学家常用的一种方法,并不一定能减轻由于空间混淆造成的偏差,这是一个先前发表但并非普遍已知的结果。根据常用估计量的偏置项,我们提出了一种仿射估计量来解决这一缺陷。我们讨论了在存在不可测量的空间混淆的情况下,如何只能在一组不可测试的假设下实现因果参数的无偏估计,这些假设通常是特定于应用的。我们提供了一组假设,描述了暴露和结果与未测量变量的关系,这足以根据观察到的数据确定因果关系。我们通过线性模型中限制最大似然估计的镜头检查可识别性问题,并使用适用于任何类型结果变量的完全贝叶斯方法实现我们的方法。这项工作的动机是用来估计县级超市限制对整个美国大陆老年人心血管疾病死亡率的影响。尽管标准方法的结果为零或保护效应,但我们的方法发现了未观察到的空间混淆的证据,并表明有限的超市通道对心血管死亡率有有害影响。
{"title":"Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths","authors":"P. Schnell, Georgia Papadogeorgou","doi":"10.1214/20-aoas1377","DOIUrl":"https://doi.org/10.1214/20-aoas1377","url":null,"abstract":"Confounding by unmeasured spatial variables has received some attention in the spatial statistics and causal inference literatures, but concepts and approaches have remained largely separated. In this paper, we aim to bridge these distinct strands of statistics by considering unmeasured spatial confounding within a causal inference framework, and estimating effects using outcome regression tools popular within the spatial literature. First, we show how using spatially correlated random effects in the outcome model, an approach common among spatial statisticians, does not necessarily mitigate bias due to spatial confounding, a previously published but not universally known result. Motivated by the bias term of commonly-used estimators, we propose an affine estimator which addresses this deficiency. We discuss how unbiased estimation of causal parameters in the presence of unmeasured spatial confounding can only be achieved under an untestable set of assumptions which will often be application-specific. We provide a set of assumptions which describe how the exposure and outcome of interest relate to the unmeasured variables, and which is sufficient for identification of the causal effect based on the observed data. We examine identifiability issues through the lens of restricted maximum likelihood estimation in linear models, and implement our method using a fully Bayesian approach applicable to any type of outcome variable. This work is motivated by and used to estimate the effect of county-level limited access to supermarkets on the rate of cardiovascular disease deaths in the elderly across the whole continental United States. Even though standard approaches return null or protective effects, our approach uncovers evidence of unobserved spatial confounding, and indicates that limited supermarket access has a harmful effect on cardiovascular mortality.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123049858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
An Approach to Efficient Fitting of Univariate and Multivariate Stochastic Volatility Models 一种有效拟合单变量和多变量随机波动模型的方法
Pub Date : 2019-07-19 DOI: 10.13140/RG.2.2.29926.37440
Chen Gong, D. Stoffer
The stochastic volatility model is a popular tool for modeling the volatility of assets. The model is a nonlinear and non-Gaussian state space model, and consequently is difficult to fit. Many approaches, both classical and Bayesian, have been developed that rely on numerically intensive techniques such as quasi-maximum likelihood estimation and Markov chain Monte Carlo (MCMC). Convergence and mixing problems still plague MCMC algorithms when drawing samples sequentially from the posterior distributions. While particle Gibbs methods have been successful when applied to nonlinear or non-Gaussian state space models in general, slow convergence still haunts the technique when applied specifically to stochastic volatility models. We present an approach that couples particle Gibbs with ancestral sampling and joint parameter sampling that ameliorates the slow convergence and mixing problems when fitting both univariate and multivariate stochastic volatility models. We demonstrate the enhanced method on various numerical examples.
随机波动率模型是一种常用的资产波动率建模工具。该模型是一个非线性非高斯状态空间模型,难以拟合。许多方法,无论是经典的还是贝叶斯的,都依赖于数值密集的技术,如拟极大似然估计和马尔可夫链蒙特卡罗(MCMC)。收敛和混合问题仍然困扰着MCMC算法从后验分布中顺序抽取样本。虽然粒子Gibbs方法在一般的非线性或非高斯状态空间模型中已经取得了成功,但当应用于随机波动模型时,缓慢的收敛性仍然困扰着该技术。提出了一种将粒子Gibbs与祖先采样和联合参数采样相结合的方法,改善了拟合单变量和多变量随机波动模型时的缓慢收敛和混合问题。通过各种数值算例对该方法进行了验证。
{"title":"An Approach to Efficient Fitting of Univariate and Multivariate Stochastic Volatility Models","authors":"Chen Gong, D. Stoffer","doi":"10.13140/RG.2.2.29926.37440","DOIUrl":"https://doi.org/10.13140/RG.2.2.29926.37440","url":null,"abstract":"The stochastic volatility model is a popular tool for modeling the volatility of assets. The model is a nonlinear and non-Gaussian state space model, and consequently is difficult to fit. Many approaches, both classical and Bayesian, have been developed that rely on numerically intensive techniques such as quasi-maximum likelihood estimation and Markov chain Monte Carlo (MCMC). Convergence and mixing problems still plague MCMC algorithms when drawing samples sequentially from the posterior distributions. While particle Gibbs methods have been successful when applied to nonlinear or non-Gaussian state space models in general, slow convergence still haunts the technique when applied specifically to stochastic volatility models. We present an approach that couples particle Gibbs with ancestral sampling and joint parameter sampling that ameliorates the slow convergence and mixing problems when fitting both univariate and multivariate stochastic volatility models. We demonstrate the enhanced method on various numerical examples.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133657659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random family method: Confirming inter-generational relations by restricted re-sampling 随机家族法:通过限制性再抽样确定代际关系
Pub Date : 2019-07-05 DOI: 10.31219/osf.io/h5fqn
T. Usuzaki, M. Chiba, S. Hotta
Randomness is one of the important key concepts of statistics. In epidemiology or medical science, we investigate our hypotheses and interpret results through this statistical randomness. We hypothesized by imposing some conditions to this randomness, interpretation of our result may be changed. In this article, we introduced the restricted re-sampling method to confirm inter-generational relations and presented an example.
随机性是统计学的重要概念之一。在流行病学或医学科学中,我们通过统计随机性调查我们的假设并解释结果。我们假设,通过对这种随机性施加一些条件,对我们结果的解释可能会改变。本文介绍了用限制性重抽样法确定代际关系的方法,并给出了一个实例。
{"title":"Random family method: Confirming inter-generational relations by restricted re-sampling","authors":"T. Usuzaki, M. Chiba, S. Hotta","doi":"10.31219/osf.io/h5fqn","DOIUrl":"https://doi.org/10.31219/osf.io/h5fqn","url":null,"abstract":"Randomness is one of the important key concepts of statistics. In epidemiology or medical science, we investigate our hypotheses and interpret results through this statistical randomness. We hypothesized by imposing some conditions to this randomness, interpretation of our result may be changed. In this article, we introduced the restricted re-sampling method to confirm inter-generational relations and presented an example.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"201 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121000523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
arXiv: Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1