首页 > 最新文献

arXiv - MATH - Statistics Theory最新文献

英文 中文
Reconstructions of piece-wise continuous and discrete functions using moments 利用矩重构片断连续和离散函数
Pub Date : 2023-12-07 DOI: arxiv-2312.04462
Robert Mnatsakanov, Rafik Aramyan, Farhad Jafari
The problem of recovering a moment-determinate multivariate function $f$ viaits moment sequence is studied. Under mild conditions on $f$, the point-wiseand $L_1$-rates of convergence for the proposed constructions are established.The cases where $f$ is the indicator function of a set, and represents adiscrete probability mass function are also investigated. Calculations of theapproximants and simulation studies are conducted to graphically illustrate thebehavior of the approximations in several simple examples. Analytical andsimulated errors of proposed approximations are recorded in Tables 1-3.
本文研究了通过矩序列恢复矩确定多元函数 $f$ 的问题。在 $f$ 的温和条件下,建立了所提出的构造的点收敛率和 $L_1$ 收敛率。还研究了 $f$ 是集合的指示函数和代表离散概率质量函数的情况。通过对近似值的计算和仿真研究,在几个简单的例子中形象地说明了近似值的行为。表 1-3 记录了所提近似值的分析误差和模拟误差。
{"title":"Reconstructions of piece-wise continuous and discrete functions using moments","authors":"Robert Mnatsakanov, Rafik Aramyan, Farhad Jafari","doi":"arxiv-2312.04462","DOIUrl":"https://doi.org/arxiv-2312.04462","url":null,"abstract":"The problem of recovering a moment-determinate multivariate function $f$ via\u0000its moment sequence is studied. Under mild conditions on $f$, the point-wise\u0000and $L_1$-rates of convergence for the proposed constructions are established.\u0000The cases where $f$ is the indicator function of a set, and represents a\u0000discrete probability mass function are also investigated. Calculations of the\u0000approximants and simulation studies are conducted to graphically illustrate the\u0000behavior of the approximations in several simple examples. Analytical and\u0000simulated errors of proposed approximations are recorded in Tables 1-3.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138553608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
E-values, Multiple Testing and Beyond e值,多重测试及其他
Pub Date : 2023-12-05 DOI: arxiv-2312.02905
Guanxun Li, Xianyang Zhang
We discover a connection between the Benjamini-Hochberg (BH) procedure andthe recently proposed e-BH procedure [Wang and Ramdas, 2022] with a suitablydefined set of e-values. This insight extends to a generalized version of theBH procedure and the model-free multiple testing procedure in Barber andCand`es [2015] (BC) with a general form of rejection rules. The connectionprovides an effective way of developing new multiple testing procedures byaggregating or assembling e-values resulting from the BH and BC procedures andtheir use in different subsets of the data. In particular, we propose newmultiple testing methodologies in three applications, including a hybridapproach that integrates the BH and BC procedures, a multiple testing procedureaimed at ensuring a new notion of fairness by controlling both the group-wiseand overall false discovery rates (FDR), and a structure adaptive multipletesting procedure that can incorporate external covariate information to boostdetection power. One notable feature of the proposed methods is that we use adata-dependent approach for assigning weights to e-values, significantlyenhancing the efficiency of the resulting e-BH procedure. The construction ofthe weights is non-trivial and is motivated by the leave-one-out analysis forthe BH and BC procedures. In theory, we prove that the proposed e-BH procedureswith data-dependent weights in the three applications ensure finite sample FDRcontrol. Furthermore, we demonstrate the efficiency of the proposed methodsthrough numerical studies in the three applications.
我们发现Benjamini-Hochberg (BH)程序和最近提出的e-BH程序之间的联系[Wang和Ramdas, 2022]具有适当定义的e值集。这一见解扩展到Barber and and ' es [2015] (BC)中具有一般形式的拒绝规则的bh程序和无模型多重测试程序的通用版本。该连接提供了一种有效的方法,通过聚合或组装由BH和BC程序产生的e值及其在不同数据子集中的使用来开发新的多个测试程序。特别地,我们在三个应用中提出了新的多重测试方法,包括集成BH和BC程序的混合方法,旨在通过控制群体智慧和整体错误发现率(FDR)来确保新的公平性概念的多重测试程序,以及可以结合外部协变量信息以提高检测能力的结构自适应多重测试程序。所提出方法的一个显著特征是,我们使用数据依赖的方法为e值分配权重,显著提高了所得e-BH过程的效率。权重的构造是非平凡的,其动机是对BH和BC过程的留一分析。理论上,我们证明了在这三种应用中所提出的具有数据相关权值的e-BH过程确保了有限样本fdr控制。此外,我们通过三个应用的数值研究证明了所提出方法的有效性。
{"title":"E-values, Multiple Testing and Beyond","authors":"Guanxun Li, Xianyang Zhang","doi":"arxiv-2312.02905","DOIUrl":"https://doi.org/arxiv-2312.02905","url":null,"abstract":"We discover a connection between the Benjamini-Hochberg (BH) procedure and\u0000the recently proposed e-BH procedure [Wang and Ramdas, 2022] with a suitably\u0000defined set of e-values. This insight extends to a generalized version of the\u0000BH procedure and the model-free multiple testing procedure in Barber and\u0000Cand`es [2015] (BC) with a general form of rejection rules. The connection\u0000provides an effective way of developing new multiple testing procedures by\u0000aggregating or assembling e-values resulting from the BH and BC procedures and\u0000their use in different subsets of the data. In particular, we propose new\u0000multiple testing methodologies in three applications, including a hybrid\u0000approach that integrates the BH and BC procedures, a multiple testing procedure\u0000aimed at ensuring a new notion of fairness by controlling both the group-wise\u0000and overall false discovery rates (FDR), and a structure adaptive multiple\u0000testing procedure that can incorporate external covariate information to boost\u0000detection power. One notable feature of the proposed methods is that we use a\u0000data-dependent approach for assigning weights to e-values, significantly\u0000enhancing the efficiency of the resulting e-BH procedure. The construction of\u0000the weights is non-trivial and is motivated by the leave-one-out analysis for\u0000the BH and BC procedures. In theory, we prove that the proposed e-BH procedures\u0000with data-dependent weights in the three applications ensure finite sample FDR\u0000control. Furthermore, we demonstrate the efficiency of the proposed methods\u0000through numerical studies in the three applications.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"93 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space 基于Wasserstein空间多面体优化的平均场变分推理算法
Pub Date : 2023-12-05 DOI: arxiv-2312.02849
Yiheng Jiang, Sinho Chewi, Aram-Alexandre Pooladian
We develop a theory of finite-dimensional polyhedral subsets over theWasserstein space and optimization of functionals over them via first-ordermethods. Our main application is to the problem of mean-field variationalinference, which seeks to approximate a distribution $pi$ over $mathbb{R}^d$by a product measure $pi^star$. When $pi$ is strongly log-concave andlog-smooth, we provide (1) approximation rates certifying that $pi^star$ isclose to the minimizer $pi^star_diamond$ of the KL divergence over aemph{polyhedral} set $mathcal{P}_diamond$, and (2) an algorithm forminimizing $text{KL}(cdot|pi)$ over $mathcal{P}_diamond$ with acceleratedcomplexity $O(sqrt kappa log(kappa d/varepsilon^2))$, where $kappa$ isthe condition number of $pi$.
我们建立了wasserstein空间上有限维多面体子集的理论,并通过一阶方法对其上的泛函进行了优化。我们的主要应用是均值场变分推理问题,它寻求近似分布 $pi$ 结束 $mathbb{R}^d$通过产品测量 $pi^star$. 什么时候 $pi$ 是强对数凹和对数光滑,我们提供(1)近似速率证明 $pi^star$ 接近最小化器 $pi^star_diamond$ KL散度的值 emph{多面体} 集合 $mathcal{P}_diamond$(2)最小化算法 $text{KL}(cdot|pi)$ 结束 $mathcal{P}_diamond$ 伴随着加速的复杂性 $O(sqrt kappa log(kappa d/varepsilon^2))$,其中 $kappa$ 的条件数 $pi$.
{"title":"Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space","authors":"Yiheng Jiang, Sinho Chewi, Aram-Alexandre Pooladian","doi":"arxiv-2312.02849","DOIUrl":"https://doi.org/arxiv-2312.02849","url":null,"abstract":"We develop a theory of finite-dimensional polyhedral subsets over the\u0000Wasserstein space and optimization of functionals over them via first-order\u0000methods. Our main application is to the problem of mean-field variational\u0000inference, which seeks to approximate a distribution $pi$ over $mathbb{R}^d$\u0000by a product measure $pi^star$. When $pi$ is strongly log-concave and\u0000log-smooth, we provide (1) approximation rates certifying that $pi^star$ is\u0000close to the minimizer $pi^star_diamond$ of the KL divergence over a\u0000emph{polyhedral} set $mathcal{P}_diamond$, and (2) an algorithm for\u0000minimizing $text{KL}(cdot|pi)$ over $mathcal{P}_diamond$ with accelerated\u0000complexity $O(sqrt kappa log(kappa d/varepsilon^2))$, where $kappa$ is\u0000the condition number of $pi$.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"86 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Characterization of Optimal Prediction Measures via $ell_1$ Minimization 通过 $ell_1$ 最小化确定最佳预测措施的特征
Pub Date : 2023-12-05 DOI: arxiv-2312.03091
Len Bos
Suppose that $KsubsetC$ is compact and that $z_0inCbackslash K$ is anexternal point. An optimal prediction measure for regression by polynomials ofdegree at most $n,$ is one for which the variance of the prediction at $z_0$ isas small as possible. Hoel and Levine (cite{HL}) have considered the case of$K=[-1,1]$ and $z_0=x_0in Rbackslash [-1,1],$ where they show that thesupport of the optimal measure is the $n+1$ extremme points of the Chebyshevpolynomial $T_n(x)$ and characterizing the optimal weights in terms of absolutevalues of fundamental interpolating Lagrange polynomials. More recently,cite{BLO} has given the equivalence of the optimal prediction problem withthat of finding polynomials of extremal growth. They also study in detail thecase of $K=[-1,1]$ and $z_0=iain iR,$ purely imaginary. In this work wegeneralize the Hoel-Levine formula to the general case when the support of theoptimal measure is a finite set and give a formula for the optimal weights interms of a $ell_1$ minimization problem.
假设 $KsubsetC$ 是紧凑的,并且 $z_0inCbackslash K$ 是一个外部点。用最多为$n,$的多项式进行回归的最优预测度量是使$z_0$处的预测方差尽可能小的度量。Hoel 和 Levine(cite{HL})考虑了 $K=[-1,1]$ 和 $z_0=x_0in Rbackslash [-1,1] $ 的情况,在这种情况下,他们证明了最优度量的支撑点是切比雪夫多项式 $T_n(x)$ 的 $n+1$ 极值点,并用基本插值拉格朗日多项式的绝对值描述了最优权重。最近,cite{BLO}给出了最优预测问题与寻找极值增长多项式问题的等价性。他们还详细研究了 $K=[-1,1]$ 和 $z_0=iain iR,$ 纯虚的情况。在这项工作中,我们将 Hoel-Levine 公式推广到最优度量的支持是有限集的一般情况,并给出了最优权重与 $ell_1$ 最小化问题的关系式。
{"title":"A Characterization of Optimal Prediction Measures via $ell_1$ Minimization","authors":"Len Bos","doi":"arxiv-2312.03091","DOIUrl":"https://doi.org/arxiv-2312.03091","url":null,"abstract":"Suppose that $KsubsetC$ is compact and that $z_0inCbackslash K$ is an\u0000external point. An optimal prediction measure for regression by polynomials of\u0000degree at most $n,$ is one for which the variance of the prediction at $z_0$ is\u0000as small as possible. Hoel and Levine (cite{HL}) have considered the case of\u0000$K=[-1,1]$ and $z_0=x_0in Rbackslash [-1,1],$ where they show that the\u0000support of the optimal measure is the $n+1$ extremme points of the Chebyshev\u0000polynomial $T_n(x)$ and characterizing the optimal weights in terms of absolute\u0000values of fundamental interpolating Lagrange polynomials. More recently,\u0000cite{BLO} has given the equivalence of the optimal prediction problem with\u0000that of finding polynomials of extremal growth. They also study in detail the\u0000case of $K=[-1,1]$ and $z_0=iain iR,$ purely imaginary. In this work we\u0000generalize the Hoel-Levine formula to the general case when the support of the\u0000optimal measure is a finite set and give a formula for the optimal weights in\u0000terms of a $ell_1$ minimization problem.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum likelihood thresholds of Gaussian graphical models and graphical lasso 高斯图形模型和图形套索的最大似然阈值
Pub Date : 2023-12-05 DOI: arxiv-2312.03145
Daniel Irving Bernstein, Hayden Outlaw
Associated to each graph G is a Gaussian graphical model. Such models areoften used in high-dimensional settings, i.e. where there are relatively fewdata points compared to the number of variables. The maximum likelihoodthreshold of a graph is the minimum number of data points required to fit thecorresponding graphical model using maximum likelihood estimation. Graphicallasso is a method for selecting and fitting a graphical model. In this project,we ask: when graphical lasso is used to select and fit a graphical model on ndata points, how likely is it that n is greater than or equal to the maximumlikelihood threshold of the corresponding graph? Our results are a series ofcomputational experiments.
与每个图形 G 关联的是一个高斯图形模型。这种模型通常用于高维环境,即与变量数量相比数据点相对较少的情况。图形的最大似然阈值是使用最大似然估计拟合相应图形模型所需的最小数据点数。图形拟合是一种选择和拟合图形模型的方法。在这个项目中,我们要问:当使用图形套索在 nd 个数据点上选择和拟合图形模型时,n 大于或等于相应图形的最大似然阈值的可能性有多大?我们的结果是一系列计算实验的结果。
{"title":"Maximum likelihood thresholds of Gaussian graphical models and graphical lasso","authors":"Daniel Irving Bernstein, Hayden Outlaw","doi":"arxiv-2312.03145","DOIUrl":"https://doi.org/arxiv-2312.03145","url":null,"abstract":"Associated to each graph G is a Gaussian graphical model. Such models are\u0000often used in high-dimensional settings, i.e. where there are relatively few\u0000data points compared to the number of variables. The maximum likelihood\u0000threshold of a graph is the minimum number of data points required to fit the\u0000corresponding graphical model using maximum likelihood estimation. Graphical\u0000lasso is a method for selecting and fitting a graphical model. In this project,\u0000we ask: when graphical lasso is used to select and fit a graphical model on n\u0000data points, how likely is it that n is greater than or equal to the maximum\u0000likelihood threshold of the corresponding graph? Our results are a series of\u0000computational experiments.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Central limit theorem for the average closure coefficient 平均闭合系数的中心极限定理
Pub Date : 2023-12-05 DOI: arxiv-2312.03142
Mingao Yuan
Many real-world networks exhibit the phenomenon of edge clustering, which istypically measured by the average clustering coefficient. Recently, analternative measure, the average closure coefficient, is proposed to quantifylocal clustering. It is shown that the average closure coefficient possesses anumber of useful properties and can capture complementary information missed bythe classical average clustering coefficient. In this paper, we study theasymptotic distribution of the average closure coefficient of a heterogeneousErd"{o}s-R'{e}nyi random graph. We prove that the standardized averageclosure coefficient converges in distribution to the standard normaldistribution. In the Erd"{o}s-R'{e}nyi random graph, the variance of theaverage closure coefficient exhibits the same phase transition phenomenon asthe average clustering coefficient.
现实世界中的许多网络都存在边缘聚类现象,这种现象通常用平均聚类系数来衡量。最近,有人提出了另一种量化局部聚类的方法--平均闭合系数。研究表明,平均闭合系数具有许多有用的特性,可以捕捉经典平均聚类系数所遗漏的补充信息。本文研究了异质随机图的平均闭合系数的渐近分布。我们证明了标准化平均闭合系数的分布收敛于标准正态分布。在 Erd"{o}s-R'{e}nyi 随机图中,平均闭合系数的方差表现出与平均聚类系数相同的相变现象。
{"title":"Central limit theorem for the average closure coefficient","authors":"Mingao Yuan","doi":"arxiv-2312.03142","DOIUrl":"https://doi.org/arxiv-2312.03142","url":null,"abstract":"Many real-world networks exhibit the phenomenon of edge clustering, which is\u0000typically measured by the average clustering coefficient. Recently, an\u0000alternative measure, the average closure coefficient, is proposed to quantify\u0000local clustering. It is shown that the average closure coefficient possesses a\u0000number of useful properties and can capture complementary information missed by\u0000the classical average clustering coefficient. In this paper, we study the\u0000asymptotic distribution of the average closure coefficient of a heterogeneous\u0000Erd\"{o}s-R'{e}nyi random graph. We prove that the standardized average\u0000closure coefficient converges in distribution to the standard normal\u0000distribution. In the Erd\"{o}s-R'{e}nyi random graph, the variance of the\u0000average closure coefficient exhibits the same phase transition phenomenon as\u0000the average clustering coefficient.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotic Theory of the Best-Choice Rerandomization using the Mahalanobis Distance 基于马氏距离的最优选择再随机化的渐近理论
Pub Date : 2023-12-05 DOI: arxiv-2312.02513
Yuhao Wang, Xinran Li
Rerandomization, a design that utilizes pretreatment covariates and improvestheir balance between different treatment groups, has received attentionrecently in both theory and practice. There are at least two types ofrerandomization that are used in practice: the first rerandomizes the treatmentassignment until covariate imbalance is below a prespecified threshold; thesecond randomizes the treatment assignment multiple times and chooses the onewith the best covariate balance. In this paper we will consider the second typeof rerandomization, namely the best-choice rerandomization, whose theory andinference are still lacking in the literature. In particular, we will focus onthe best-choice rerandomization that uses the Mahalanobis distance to measurecovariate imbalance, which is one of the most commonly used imbalance measurefor multivariate covariates and is invariant to affine transformations ofcovariates. We will study the large-sample repeatedly sampling properties ofthe best-choice rerandomization, allowing both the number of covariates and thenumber of tried complete randomizations to increase with the sample size. Weshow that the asymptotic distribution of the difference-in-means estimator ismore concentrated around the true average treatment effect underrerandomization than under the complete randomization, and propose large-sampleaccurate confidence intervals for rerandomization that are shorter than thatfor the completely randomized experiment. We further demonstrate that, withmoderate number of covariates and with the number of tried randomizationsincreasing polynomially with the sample size, the best-choice rerandomizationcan achieve the ideally optimal precision that one can expect even withperfectly balanced covariates. The developed theory and methods forrerandomization are also illustrated using real field experiments.
再随机化是一种利用预处理协变量并改善其在不同治疗组之间的平衡的设计,最近在理论和实践中都受到了关注。实践中至少有两种类型的再随机化:第一种是对治疗分配进行再随机化,直到协变量不平衡低于预先指定的阈值;第二种方法将治疗分配随机化多次,并选择协变量平衡最佳的治疗分配。在本文中,我们将考虑第二种类型的再随机化,即最佳选择再随机化,其理论和推理在文献中仍然缺乏。特别是,我们将关注使用马氏距离来测量变量不平衡的最佳选择再随机化,这是最常用的多变量协变量不平衡度量之一,并且对协变量的仿射变换是不变的。我们将研究最佳选择再随机化的大样本重复抽样特性,允许协变量的数量和尝试完全随机化的数量随着样本量的增加而增加。表明在再随机化条件下,均值差估计量的渐近分布比完全随机化条件下更集中在真实平均治疗效果周围,并提出了比完全随机化条件下更短的大样本精确置信区间。我们进一步证明,在适度数量的协变量和尝试随机化的数量随样本量多项式增加的情况下,最佳选择的再随机化可以达到理想的最优精度,即使在完全平衡的协变量下也是如此。本文还通过实际的现场实验说明了发展起来的随机化理论和方法。
{"title":"Asymptotic Theory of the Best-Choice Rerandomization using the Mahalanobis Distance","authors":"Yuhao Wang, Xinran Li","doi":"arxiv-2312.02513","DOIUrl":"https://doi.org/arxiv-2312.02513","url":null,"abstract":"Rerandomization, a design that utilizes pretreatment covariates and improves\u0000their balance between different treatment groups, has received attention\u0000recently in both theory and practice. There are at least two types of\u0000rerandomization that are used in practice: the first rerandomizes the treatment\u0000assignment until covariate imbalance is below a prespecified threshold; the\u0000second randomizes the treatment assignment multiple times and chooses the one\u0000with the best covariate balance. In this paper we will consider the second type\u0000of rerandomization, namely the best-choice rerandomization, whose theory and\u0000inference are still lacking in the literature. In particular, we will focus on\u0000the best-choice rerandomization that uses the Mahalanobis distance to measure\u0000covariate imbalance, which is one of the most commonly used imbalance measure\u0000for multivariate covariates and is invariant to affine transformations of\u0000covariates. We will study the large-sample repeatedly sampling properties of\u0000the best-choice rerandomization, allowing both the number of covariates and the\u0000number of tried complete randomizations to increase with the sample size. We\u0000show that the asymptotic distribution of the difference-in-means estimator is\u0000more concentrated around the true average treatment effect under\u0000rerandomization than under the complete randomization, and propose large-sample\u0000accurate confidence intervals for rerandomization that are shorter than that\u0000for the completely randomized experiment. We further demonstrate that, with\u0000moderate number of covariates and with the number of tried randomizations\u0000increasing polynomially with the sample size, the best-choice rerandomization\u0000can achieve the ideally optimal precision that one can expect even with\u0000perfectly balanced covariates. The developed theory and methods for\u0000rerandomization are also illustrated using real field experiments.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"84 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian neural network approach to Multi-fidelity surrogate modelling 多保真度代理模型的贝叶斯神经网络方法
Pub Date : 2023-12-05 DOI: arxiv-2312.02575
Baptiste KerleguerDAM/DIF, CMAP, Claire CannamelaDAM/DIF, Josselin GarnierCMAP
This paper deals with surrogate modelling of a computer code output in ahierarchical multi-fidelity context, i.e., when the output can be evaluated atdifferent levels of accuracy and computational cost. Using observations of theoutput at low- and high-fidelity levels, we propose a method that combinesGaussian process (GP) regression and Bayesian neural network (BNN), in a methodcalled GPBNN. The low-fidelity output is treated as a single-fidelity codeusing classical GP regression. The high-fidelity output is approximated by aBNN that incorporates, in addition to the high-fidelity observations,well-chosen realisations of the low-fidelity output emulator. The predictiveuncertainty of the final surrogate model is then quantified by a completecharacterisation of the uncertainties of the different models and theirinteraction. GPBNN is compared with most of the multi-fidelity regressionmethods allowing to quantify the prediction uncertainty.
本文讨论了在分层多保真环境下计算机代码输出的代理建模,即当输出可以在不同的精度和计算成本水平上进行评估时。利用对低保真度和高保真度输出的观察,我们提出了一种将高斯过程(GP)回归和贝叶斯神经网络(BNN)相结合的方法,该方法称为GPBNN。使用经典GP回归将低保真输出处理为单保真编码。高保真输出由aBNN近似,除了高保真观察外,aBNN还结合了低保真输出模拟器的精心实现。然后,通过对不同模型及其相互作用的不确定性的完整描述,对最终代理模型的预测不确定性进行量化。GPBNN与大多数多保真度回归方法进行了比较,可以量化预测的不确定性。
{"title":"A Bayesian neural network approach to Multi-fidelity surrogate modelling","authors":"Baptiste KerleguerDAM/DIF, CMAP, Claire CannamelaDAM/DIF, Josselin GarnierCMAP","doi":"arxiv-2312.02575","DOIUrl":"https://doi.org/arxiv-2312.02575","url":null,"abstract":"This paper deals with surrogate modelling of a computer code output in a\u0000hierarchical multi-fidelity context, i.e., when the output can be evaluated at\u0000different levels of accuracy and computational cost. Using observations of the\u0000output at low- and high-fidelity levels, we propose a method that combines\u0000Gaussian process (GP) regression and Bayesian neural network (BNN), in a method\u0000called GPBNN. The low-fidelity output is treated as a single-fidelity code\u0000using classical GP regression. The high-fidelity output is approximated by a\u0000BNN that incorporates, in addition to the high-fidelity observations,\u0000well-chosen realisations of the low-fidelity output emulator. The predictive\u0000uncertainty of the final surrogate model is then quantified by a complete\u0000characterisation of the uncertainties of the different models and their\u0000interaction. GPBNN is compared with most of the multi-fidelity regression\u0000methods allowing to quantify the prediction uncertainty.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"93 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances 具有未知异方差的近最优均值估计
Pub Date : 2023-12-05 DOI: arxiv-2312.02417
Spencer Compton, Gregory Valiant
Given data drawn from a collection of Gaussian variables with a common meanbut different and unknown variances, what is the best algorithm for estimatingtheir common mean? We present an intuitive and efficient algorithm for thistask. As different closed-form guarantees can be hard to compare, theSubset-of-Signals model serves as a benchmark for heteroskedastic meanestimation: given $n$ Gaussian variables with an unknown subset of $m$variables having variance bounded by 1, what is the optimal estimation error asa function of $n$ and $m$? Our algorithm resolves this open question up tologarithmic factors, improving upon the previous best known estimation error bypolynomial factors when $m = n^c$ for all $0
给定从高斯变量集合中提取的数据,这些变量具有共同的平均值,但方差不同且未知,那么估计它们的共同平均值的最佳算法是什么?我们提出了一种直观有效的算法。由于不同的封闭形式保证很难比较,信号子集模型作为异方差均值估计的基准:给定$n$高斯变量与未知的$m$变量子集,其方差以1为界,作为$n$和$m$的函数,最优估计误差是什么?我们的算法通过拓扑因素解决了这个开放的问题,当$m = n^c$对于所有$0
{"title":"Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances","authors":"Spencer Compton, Gregory Valiant","doi":"arxiv-2312.02417","DOIUrl":"https://doi.org/arxiv-2312.02417","url":null,"abstract":"Given data drawn from a collection of Gaussian variables with a common mean\u0000but different and unknown variances, what is the best algorithm for estimating\u0000their common mean? We present an intuitive and efficient algorithm for this\u0000task. As different closed-form guarantees can be hard to compare, the\u0000Subset-of-Signals model serves as a benchmark for heteroskedastic mean\u0000estimation: given $n$ Gaussian variables with an unknown subset of $m$\u0000variables having variance bounded by 1, what is the optimal estimation error as\u0000a function of $n$ and $m$? Our algorithm resolves this open question up to\u0000logarithmic factors, improving upon the previous best known estimation error by\u0000polynomial factors when $m = n^c$ for all $0<c<1$. Of particular note, we\u0000obtain error $o(1)$ with $m = tilde{O}(n^{1/4})$ variance-bounded samples,\u0000whereas previous work required $m = tilde{Omega}(n^{1/2})$. Finally, we show\u0000that in the multi-dimensional setting, even for $d=2$, our techniques enable\u0000rates comparable to knowing the variance of each sample.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"87 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust parameter estimation of the log-logistic distribution based on density power divergence estimators 基于密度幂发散估计器的对数对数分布稳健参数估计
Pub Date : 2023-12-05 DOI: arxiv-2312.02662
A. Felipe, M. Jaenada, P. Miranda, L. Pardo
Robust inferential methods based on divergences measures have shown anappealing trade-off between efficiency and robustness in many differentstatistical models. In this paper, minimum density power divergence estimators(MDPDEs) for the scale and shape parameters of the log-logistic distributionare considered. The log-logistic is a versatile distribution modeling lifetimedata which is commonly adopted in survival analysis and reliability engineeringstudies when the hazard rate is initially increasing but then it decreasesafter some point. Further, it is shown that the classical estimators based onmaximum likelihood (MLE) are included as a particular case of the MDPDE family.Moreover, the corresponding influence function of the MDPDE is obtained, andits boundlessness is proved, thus leading to robust estimators. A simulationstudy is carried out to illustrate the slight loss in efficiency of MDPDE withrespect to MLE and, at besides, the considerable gain in robustness.
在许多不同的统计模型中,基于发散度量的稳健推断方法在效率和稳健性之间做出了令人满意的权衡。本文考虑了对数对数分布的规模和形状参数的最小密度功率发散估计器(MDPDE)。对数-对数分布是对生命周期数据建模的一种通用分布,在生存分析和可靠性工程研究中,当危险率最初不断增加,但在某个点之后又不断减少时,通常会采用对数-对数分布。此外,还得到了 MDPDE 的相应影响函数,并证明了其无边界性,从而得到了稳健的估计值。通过模拟研究说明了 MDPDE 相对于 MLE 在效率上的轻微损失,以及在稳健性上的显著提高。
{"title":"Robust parameter estimation of the log-logistic distribution based on density power divergence estimators","authors":"A. Felipe, M. Jaenada, P. Miranda, L. Pardo","doi":"arxiv-2312.02662","DOIUrl":"https://doi.org/arxiv-2312.02662","url":null,"abstract":"Robust inferential methods based on divergences measures have shown an\u0000appealing trade-off between efficiency and robustness in many different\u0000statistical models. In this paper, minimum density power divergence estimators\u0000(MDPDEs) for the scale and shape parameters of the log-logistic distribution\u0000are considered. The log-logistic is a versatile distribution modeling lifetime\u0000data which is commonly adopted in survival analysis and reliability engineering\u0000studies when the hazard rate is initially increasing but then it decreases\u0000after some point. Further, it is shown that the classical estimators based on\u0000maximum likelihood (MLE) are included as a particular case of the MDPDE family.\u0000Moreover, the corresponding influence function of the MDPDE is obtained, and\u0000its boundlessness is proved, thus leading to robust estimators. A simulation\u0000study is carried out to illustrate the slight loss in efficiency of MDPDE with\u0000respect to MLE and, at besides, the considerable gain in robustness.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138547012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - MATH - Statistics Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1