首页 > 最新文献

arXiv: Methodology最新文献

英文 中文
A Family-based Graphical Approach for Testing Hierarchically Ordered Families of Hypotheses 一种基于族的图解方法用于检验层次有序的假设族
Pub Date : 2018-12-01 DOI: 10.13140/RG.2.2.23109.29929
Z. Qiu, Li Yu, Wenge Guo
In applications of clinical trials, tested hypotheses are often grouped as multiple hierarchically ordered families. To test such structured hypotheses, various gatekeeping strategies have been developed in the literature, such as series gatekeeping, parallel gatekeeping, tree-structured gatekeeping strategies, etc. However, these gatekeeping strategies are often either non-intuitive or less flexible when addressing increasingly complex logical relationships among families of hypotheses. In order to overcome the issue, in this paper, we develop a new family-based graphical approach, which can easily derive and visualize different gatekeeping strategies. In the proposed approach, a directed and weighted graph is used to represent the generated gatekeeping strategy where each node corresponds to a family of hypotheses and two simple updating rules are used for updating the critical value of each family and the transition coefficient between any two families. Theoretically, we show that the proposed graphical approach strongly controls the overall familywise error rate at a pre-specified level. Through some case studies and a real clinical example, we demonstrate simplicity and flexibility of the proposed approach.
在临床试验的应用中,被测试的假设通常被分组为多个等级有序的家庭。为了检验这种结构化的假设,文献中发展了各种守门策略,如系列守门、并行守门、树状守门策略等。然而,在处理假设家族之间日益复杂的逻辑关系时,这些把关策略通常不是直观的,就是不太灵活。为了克服这个问题,在本文中,我们开发了一种新的基于家庭的图形化方法,可以很容易地推导和可视化不同的把关策略。在该方法中,使用有向加权图来表示生成的把关策略,其中每个节点对应一个假设族,并使用两个简单的更新规则来更新每个族的临界值和任意两个族之间的过渡系数。从理论上讲,我们证明了所提出的图形方法在预先指定的水平上强有力地控制了总体家庭误差率。通过一些案例研究和一个真实的临床例子,我们证明了所提出的方法的简单性和灵活性。
{"title":"A Family-based Graphical Approach for Testing Hierarchically Ordered Families of Hypotheses","authors":"Z. Qiu, Li Yu, Wenge Guo","doi":"10.13140/RG.2.2.23109.29929","DOIUrl":"https://doi.org/10.13140/RG.2.2.23109.29929","url":null,"abstract":"In applications of clinical trials, tested hypotheses are often grouped as multiple hierarchically ordered families. To test such structured hypotheses, various gatekeeping strategies have been developed in the literature, such as series gatekeeping, parallel gatekeeping, tree-structured gatekeeping strategies, etc. However, these gatekeeping strategies are often either non-intuitive or less flexible when addressing increasingly complex logical relationships among families of hypotheses. In order to overcome the issue, in this paper, we develop a new family-based graphical approach, which can easily derive and visualize different gatekeeping strategies. In the proposed approach, a directed and weighted graph is used to represent the generated gatekeeping strategy where each node corresponds to a family of hypotheses and two simple updating rules are used for updating the critical value of each family and the transition coefficient between any two families. Theoretically, we show that the proposed graphical approach strongly controls the overall familywise error rate at a pre-specified level. Through some case studies and a real clinical example, we demonstrate simplicity and flexibility of the proposed approach.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114572865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis 高维对数误差变量回归及其在微生物成分数据分析中的应用
Pub Date : 2018-11-28 DOI: 10.1093/BIOMET/ASAB020
Pixu Shi, Yuchen Zhou, Anru R. Zhang
In microbiome and genomic study, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. In this article, we introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. We also consider a general log-error-in-variable regression model with corresponding estimation method to accommodate broader situations. The merit of the procedure is illustrated through real data analysis and simulation studies.
在微生物组学和基因组学研究中,成分数据的回归已经成为鉴定与临床表型相关的微生物分类群或基因的关键工具。为了解释排序深度的变化,通常使用经典的对数对比模型,其中读取计数归一化为组合。然而,零读取计数和协变量的随机性仍然是关键问题。在本文中,我们通过一种新颖的高维变量对数误差回归模型,介绍了一种非常简单、可解释且有效的组合数据回归估计方法。该方法既能对可能存在过分散的测序数据进行校正,又能避免主观臆测读取计数为零。我们提供了估计误差匹配上下限的理论证明。为了适应更广泛的情况,我们还考虑了具有相应估计方法的一般变量对数误差回归模型。通过实际数据分析和仿真研究,说明了该方法的优越性。
{"title":"High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis","authors":"Pixu Shi, Yuchen Zhou, Anru R. Zhang","doi":"10.1093/BIOMET/ASAB020","DOIUrl":"https://doi.org/10.1093/BIOMET/ASAB020","url":null,"abstract":"In microbiome and genomic study, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. \u0000In this article, we introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. We also consider a general log-error-in-variable regression model with corresponding estimation method to accommodate broader situations. The merit of the procedure is illustrated through real data analysis and simulation studies.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128965402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Spectral Deconfounding via Perturbed Sparse Linear Models 基于摄动稀疏线性模型的谱反建立
Pub Date : 2018-11-13 DOI: 10.3929/ETHZ-B-000459190
Domagoj Cevid, Peter Buhlmann, N. Meinshausen
Standard high-dimensional regression methods assume that the underlying coefficient vector is sparse. This might not be true in some cases, in particular in presence of hidden, confounding variables. Such hidden confounding can be represented as a high-dimensional linear model where the sparse coefficient vector is perturbed. For this model, we develop and investigate a class of methods that are based on running the Lasso on preprocessed data. The preprocessing step consists of applying certain spectral transformations that change the singular values of the design matrix. We show that, under some assumptions, one can achieve the optimal $ell_1$-error rate for estimating the underlying sparse coefficient vector. Our theory also covers the Lava estimator (Chernozhukov et al. [2017]) for a special model class. The performance of the method is illustrated on simulated data and a genomic dataset.
标准的高维回归方法假设底层系数向量是稀疏的。在某些情况下,这可能不是真的,特别是在存在隐藏的混淆变量的情况下。这种隐藏的混杂可以表示为一个高维的线性模型,其中稀疏系数向量被扰动。对于这个模型,我们开发并研究了一类基于在预处理数据上运行Lasso的方法。预处理步骤包括应用某些谱变换来改变设计矩阵的奇异值。我们证明,在某些假设下,我们可以获得最优的$ell_1$-错误率来估计潜在的稀疏系数向量。我们的理论还涵盖了用于特殊模型类的熔岩估计器(Chernozhukov等人[2017])。在模拟数据和基因组数据集上验证了该方法的性能。
{"title":"Spectral Deconfounding via Perturbed Sparse Linear Models","authors":"Domagoj Cevid, Peter Buhlmann, N. Meinshausen","doi":"10.3929/ETHZ-B-000459190","DOIUrl":"https://doi.org/10.3929/ETHZ-B-000459190","url":null,"abstract":"Standard high-dimensional regression methods assume that the underlying coefficient vector is sparse. This might not be true in some cases, in particular in presence of hidden, confounding variables. Such hidden confounding can be represented as a high-dimensional linear model where the sparse coefficient vector is perturbed. For this model, we develop and investigate a class of methods that are based on running the Lasso on preprocessed data. The preprocessing step consists of applying certain spectral transformations that change the singular values of the design matrix. We show that, under some assumptions, one can achieve the optimal $ell_1$-error rate for estimating the underlying sparse coefficient vector. Our theory also covers the Lava estimator (Chernozhukov et al. [2017]) for a special model class. The performance of the method is illustrated on simulated data and a genomic dataset.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123043105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Handbook of Mixture Analysis 混合物分析手册
Pub Date : 2018-11-01 DOI: 10.1201/9780429055911
I. C. Gormley, Sylvia Frühwirth-Schnatter
Mixtures of experts models provide a framework in which covariates may be included in mixture models. This is achieved by modelling the parameters of the mixture model as functions of the concomitant covariates. Given their mixture model foundation, mixtures of experts models possess a diverse range of analytic uses, from clustering observations to capturing parameter heterogeneity in cross-sectional data. This chapter focuses on delineating the mixture of experts modelling framework and demonstrates the utility and flexibility of mixtures of experts models as an analytic tool.
专家混合模型提供了一个框架,其中协变量可以包含在混合模型中。这是通过将混合模型的参数建模为伴随协变量的函数来实现的。鉴于混合模型的基础,混合专家模型具有多种分析用途,从聚类观察到捕获横截面数据中的参数异质性。本章重点描述混合专家建模框架,并展示混合专家模型作为分析工具的实用性和灵活性。
{"title":"Handbook of Mixture Analysis","authors":"I. C. Gormley, Sylvia Frühwirth-Schnatter","doi":"10.1201/9780429055911","DOIUrl":"https://doi.org/10.1201/9780429055911","url":null,"abstract":"Mixtures of experts models provide a framework in which covariates may be included in mixture models. This is achieved by modelling the parameters of the mixture model as functions of the concomitant covariates. Given their mixture model foundation, mixtures of experts models possess a diverse range of analytic uses, from clustering observations to capturing parameter heterogeneity in cross-sectional data. This chapter focuses on delineating the mixture of experts modelling framework and demonstrates the utility and flexibility of mixtures of experts models as an analytic tool.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123047435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Fast Exact Bayesian Inference for Sparse Signals in the Normal Sequence Model. 正态序列模型中稀疏信号的快速精确贝叶斯推断。
Pub Date : 2018-10-25 DOI: 10.1214/20-ba1227
T. Erven, Botond Szabó
We consider exact algorithms for Bayesian inference with model selection priors (including spike-and-slab priors) in the sparse normal sequence model. Because the best existing exact algorithm becomes numerically unstable for sample sizes over n=500, there has been much attention for alternative approaches like approximate algorithms (Gibbs sampling, variational Bayes, etc.), shrinkage priors (e.g. the Horseshoe prior and the Spike-and-Slab LASSO) or empirical Bayesian methods. However, by introducing algorithmic ideas from online sequential prediction, we show that exact calculations are feasible for much larger sample sizes: for general model selection priors we reach n=25000, and for certain spike-and-slab priors we can easily reach n=100000. We further prove a de Finetti-like result for finite sample sizes that characterizes exactly which model selection priors can be expressed as spike-and-slab priors. Finally, the computational speed and numerical accuracy of the proposed methods are demonstrated in experiments on simulated data and on a prostate cancer data set. In our experimental evaluation we compute guaranteed bounds on the numerical accuracy of all new algorithms, which shows that the proposed methods are numerically reliable whereas an alternative based on long division is not.
我们考虑了稀疏正态序列模型中具有模型选择先验(包括尖峰-板先验)的贝叶斯推理的精确算法。由于现有的最佳精确算法对于超过n=500的样本量在数值上变得不稳定,因此有很多人关注替代方法,如近似算法(吉布斯抽样,变分贝叶斯等),收缩先验(例如马蹄先验和钉板LASSO)或经验贝叶斯方法。然而,通过引入在线序列预测的算法思想,我们表明精确的计算对于更大的样本量是可行的:对于一般的模型选择先验,我们达到n=25000,对于某些spike-and-slab先验,我们可以轻松达到n=100000。对于有限样本量,我们进一步证明了一个类似于de finetti的结果,该结果准确地表征了哪些模型选择先验可以表示为尖峰-板先验。最后,在模拟数据和前列腺癌数据集上进行了实验,验证了所提方法的计算速度和数值精度。在我们的实验评估中,我们计算了所有新算法数值精度的保证边界,这表明所提出的方法在数值上是可靠的,而基于长除法的替代方法则不是。
{"title":"Fast Exact Bayesian Inference for Sparse Signals in the Normal Sequence Model.","authors":"T. Erven, Botond Szabó","doi":"10.1214/20-ba1227","DOIUrl":"https://doi.org/10.1214/20-ba1227","url":null,"abstract":"We consider exact algorithms for Bayesian inference with model selection priors (including spike-and-slab priors) in the sparse normal sequence model. Because the best existing exact algorithm becomes numerically unstable for sample sizes over n=500, there has been much attention for alternative approaches like approximate algorithms (Gibbs sampling, variational Bayes, etc.), shrinkage priors (e.g. the Horseshoe prior and the Spike-and-Slab LASSO) or empirical Bayesian methods. However, by introducing algorithmic ideas from online sequential prediction, we show that exact calculations are feasible for much larger sample sizes: for general model selection priors we reach n=25000, and for certain spike-and-slab priors we can easily reach n=100000. We further prove a de Finetti-like result for finite sample sizes that characterizes exactly which model selection priors can be expressed as spike-and-slab priors. Finally, the computational speed and numerical accuracy of the proposed methods are demonstrated in experiments on simulated data and on a prostate cancer data set. In our experimental evaluation we compute guaranteed bounds on the numerical accuracy of all new algorithms, which shows that the proposed methods are numerically reliable whereas an alternative based on long division is not.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126742370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Inverse Problems and Data Assimilation 逆问题与数据同化
Pub Date : 2018-10-15 DOI: 10.1017/9781009414319
D. Sanz-Alonso, A. Stuart, Armeen Taeb
This concise introduction provides an entry point to the world of inverse problems and data assimilation for advanced undergraduates and beginning graduate students in the mathematical sciences. It will also appeal to researchers in science and engineering who are interested in the systematic underpinnings of methodologies widely used in their disciplines. The authors examine inverse problems and data assimilation in turn, before exploring the use of data assimilation methods to solve generic inverse problems by introducing an artificial algorithmic time. Topics covered include maximum a posteriori estimation, (stochastic) gradient descent, variational Bayes, Monte Carlo, importance sampling and Markov chain Monte Carlo for inverse problems; and 3DVAR, 4DVAR, extended and ensemble Kalman filters, and particle filters for data assimilation. The book contains a wealth of examples and exercises, and can be used to accompany courses as well as for self-study.
这简明的介绍提供了一个入口点,以反问题和数据同化的世界为先进的本科生和研究生开始在数学科学。它也将吸引科学和工程领域的研究人员,他们对在其学科中广泛使用的方法的系统基础感兴趣。作者依次研究了逆问题和数据同化,然后通过引入人工算法时间来探索使用数据同化方法来解决一般逆问题。涵盖的主题包括最大后验估计,(随机)梯度下降,变分贝叶斯,蒙特卡罗,重要抽样和反问题的马尔可夫链蒙特卡罗;以及用于数据同化的3DVAR、4DVAR、扩展卡尔曼滤波和集合卡尔曼滤波以及粒子滤波。这本书包含了丰富的例子和练习,并可用于伴随课程以及自学。
{"title":"Inverse Problems and Data Assimilation","authors":"D. Sanz-Alonso, A. Stuart, Armeen Taeb","doi":"10.1017/9781009414319","DOIUrl":"https://doi.org/10.1017/9781009414319","url":null,"abstract":"This concise introduction provides an entry point to the world of inverse problems and data assimilation for advanced undergraduates and beginning graduate students in the mathematical sciences. It will also appeal to researchers in science and engineering who are interested in the systematic underpinnings of methodologies widely used in their disciplines. The authors examine inverse problems and data assimilation in turn, before exploring the use of data assimilation methods to solve generic inverse problems by introducing an artificial algorithmic time. Topics covered include maximum a posteriori estimation, (stochastic) gradient descent, variational Bayes, Monte Carlo, importance sampling and Markov chain Monte Carlo for inverse problems; and 3DVAR, 4DVAR, extended and ensemble Kalman filters, and particle filters for data assimilation. The book contains a wealth of examples and exercises, and can be used to accompany courses as well as for self-study.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131612099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Sum decomposition of divergence into three divergences 将散度求和分解为三个散度
Pub Date : 2018-10-03 DOI: 10.31219/osf.io/dvcbt
T. Nishiyama
Divergence functions play a key role as to measure the discrepancy between two points in the field of machine learning, statistics and signal processing. Well-known divergences are the Bregman divergences, the Jensen divergences and the f-divergences.In this paper, we show that the symmetric Bregman divergence can be decomposed into the sum of two types of Jensen divergences and the Bregman divergence.Furthermore, applying this result, we show another sum decomposition of divergence is possible which includes f-divergences explicitly.
散度函数在机器学习、统计学和信号处理等领域中,对于度量两点之间的差异起着关键作用。众所周知的散度是Bregman散度,Jensen散度和f散度。本文证明了对称Bregman散度可以分解为两类Jensen散度和Bregman散度的和。进一步,应用这一结果,我们证明了另一种散度和分解是可能的,它显式地包含了f-散度。
{"title":"Sum decomposition of divergence into three divergences","authors":"T. Nishiyama","doi":"10.31219/osf.io/dvcbt","DOIUrl":"https://doi.org/10.31219/osf.io/dvcbt","url":null,"abstract":"Divergence functions play a key role as to measure the discrepancy between two points in the field of machine learning, statistics and signal processing. Well-known divergences are the Bregman divergences, the Jensen divergences and the f-divergences.In this paper, we show that the symmetric Bregman divergence can be decomposed into the sum of two types of Jensen divergences and the Bregman divergence.Furthermore, applying this result, we show another sum decomposition of divergence is possible which includes f-divergences explicitly.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"349 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128954640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On Semiparametric Instrumental Variable Estimation of Average Treatment Effects through Data Fusion 基于数据融合的平均治疗效果的半参数仪器变量估计
Pub Date : 2018-10-01 DOI: 10.5705/ss.202020.0081
Baoluo Sun, Wang Miao
Suppose one is interested in estimating causal effects in the presence of potentially unmeasured confounding with the aid of a valid instrumental variable. This paper investigates the problem of making inferences about the average treatment effect when data are fused from two separate sources, one of which contains information on the treatment and the other contains information on the outcome, while values for the instrument and a vector of baseline covariates are recorded in both. We provide a general set of sufficient conditions under which the average treatment effect is nonparametrically identified from the observed data law induced by data fusion, even when the data are from two heterogeneous populations, and derive the efficiency bound for estimating this causal parameter. For inference, we develop both parametric and semiparametric methods, including a multiply robust and locally efficient estimator that is consistent even under partial misspecification of the observed data model. We illustrate the methods through simulations and an application on public housing projects.
假设有人有兴趣借助有效的工具变量估计存在潜在的无法测量的混杂的因果效应。本文研究了当两个不同来源的数据融合在一起时对平均治疗效果进行推断的问题,其中一个包含治疗信息,另一个包含结果信息,而仪器和基线协变量向量的值都记录在两个来源中。我们提供了一组一般的充分条件,在这些条件下,即使数据来自两个异质群体,也可以从数据融合引起的观测数据规律中非参数地识别平均处理效果,并推导了估计该因果参数的效率界。对于推理,我们开发了参数和半参数方法,包括一个即使在观测数据模型的部分错误说明下也是一致的多重鲁棒和局部有效的估计器。我们通过模拟和公共住房项目的应用来说明这些方法。
{"title":"On Semiparametric Instrumental Variable Estimation of Average Treatment Effects through Data Fusion","authors":"Baoluo Sun, Wang Miao","doi":"10.5705/ss.202020.0081","DOIUrl":"https://doi.org/10.5705/ss.202020.0081","url":null,"abstract":"Suppose one is interested in estimating causal effects in the presence of potentially unmeasured confounding with the aid of a valid instrumental variable. This paper investigates the problem of making inferences about the average treatment effect when data are fused from two separate sources, one of which contains information on the treatment and the other contains information on the outcome, while values for the instrument and a vector of baseline covariates are recorded in both. We provide a general set of sufficient conditions under which the average treatment effect is nonparametrically identified from the observed data law induced by data fusion, even when the data are from two heterogeneous populations, and derive the efficiency bound for estimating this causal parameter. For inference, we develop both parametric and semiparametric methods, including a multiply robust and locally efficient estimator that is consistent even under partial misspecification of the observed data model. We illustrate the methods through simulations and an application on public housing projects.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133666555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
New $L^2$-type exponentiality tests. 新的$L^2$型指数检验。
Pub Date : 2018-09-20 DOI: 10.2436/20.8080.02.78
Marija Cupari'c, Bojana Milovsevi'c, Marko Obradovi'c
We introduce new consistent and scale-free goodness-of-fit tests for the exponential distribution based on Puri-Rubin characterization. For the construction of test statistics we employ weighted $L^2$ distance between $V$-empirical Laplace transforms of random variables that appear in the characterization. The resulting test statistics are degenerate V-statistics with estimated parameters. We compare our tests, in terms of the Bahadur efficiency, to the likelihood ratio test, as well as some recent characterization based goodness-of-fit tests for the exponential distribution. We also compare the powers of our tests to the powers of some recent and classical exponentiality tests. In both criteria, our tests are shown to be strong and outperform most of their competitors.
我们引入了新的基于Puri-Rubin表征的指数分布的一致和无标度的拟合优度检验。对于检验统计量的构造,我们采用加权的$L^2$ V$之间的距离-表征中出现的随机变量的经验拉普拉斯变换。得到的测试统计量是退化的带有估计参数的v统计量。就Bahadur效率而言,我们将我们的测试与似然比测试以及最近一些基于指数分布特征的拟合优度测试进行了比较。我们还将我们的测试的幂与一些最近的和经典的指数测试的幂进行了比较。在这两项标准中,我们的测试都表现得很出色,超过了大多数竞争对手。
{"title":"New $L^2$-type exponentiality tests.","authors":"Marija Cupari'c, Bojana Milovsevi'c, Marko Obradovi'c","doi":"10.2436/20.8080.02.78","DOIUrl":"https://doi.org/10.2436/20.8080.02.78","url":null,"abstract":"We introduce new consistent and scale-free goodness-of-fit tests for the exponential distribution based on Puri-Rubin characterization. For the construction of test statistics we employ weighted $L^2$ distance between $V$-empirical Laplace transforms of random variables that appear in the characterization. The resulting test statistics are degenerate V-statistics with estimated parameters. We compare our tests, in terms of the Bahadur efficiency, to the likelihood ratio test, as well as some recent characterization based goodness-of-fit tests for the exponential distribution. We also compare the powers of our tests to the powers of some recent and classical exponentiality tests. In both criteria, our tests are shown to be strong and outperform most of their competitors.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126081050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Wasserstein Gradients for the Temporal Evolution of Probability Distributions 概率分布时间演化的Wasserstein梯度
Pub Date : 2018-09-10 DOI: 10.1214/21-EJS1883
Yaqing Chen, H. Muller
Many studies have been conducted on flows of probability measures, often in terms of gradient flows. We introduce here a novel approach for the modeling of the instantaneous evolution of empirically observed distribution flows over time with a data-analytic focus that has not yet been explored. The proposed model describes the observed flow of distributions on one-dimensional Euclidean space $mathbb{R}$ over time based on the Wasserstein distance, utilizing derivatives of optimal transport maps over time. The resulting time dynamics of optimal transport maps are illustrated with time-varying distribution data that include yearly income distributions, the evolution of mortality over calendar years, and data on age-dependent height distributions of children from the longitudinal Z"urich growth study.
对概率测度的流动进行了许多研究,通常是在梯度流动方面。我们在这里介绍了一种新的方法,用于模拟经验观察到的分布流随时间的瞬时演变,其数据分析重点尚未被探索。该模型基于Wasserstein距离描述一维欧几里得空间$mathbb{R}$随时间的分布流,利用最优运输图随时间的导数。最优交通地图的时间动态用时变分布数据来说明,这些数据包括年收入分布、历年死亡率的演变,以及来自纵向富Z生长研究的儿童年龄相关身高分布数据。
{"title":"Wasserstein Gradients for the Temporal Evolution of Probability Distributions","authors":"Yaqing Chen, H. Muller","doi":"10.1214/21-EJS1883","DOIUrl":"https://doi.org/10.1214/21-EJS1883","url":null,"abstract":"Many studies have been conducted on flows of probability measures, often in terms of gradient flows. We introduce here a novel approach for the modeling of the instantaneous evolution of empirically observed distribution flows over time with a data-analytic focus that has not yet been explored. The proposed model describes the observed flow of distributions on one-dimensional Euclidean space $mathbb{R}$ over time based on the Wasserstein distance, utilizing derivatives of optimal transport maps over time. The resulting time dynamics of optimal transport maps are illustrated with time-varying distribution data that include yearly income distributions, the evolution of mortality over calendar years, and data on age-dependent height distributions of children from the longitudinal Z\"urich growth study.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125274080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
arXiv: Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1