The problem of recovering a moment-determinate multivariate function $f$ via its moment sequence is studied. Under mild conditions on $f$, the point-wise and $L_1$-rates of convergence for the proposed constructions are established. The cases where $f$ is the indicator function of a set, and represents a discrete probability mass function are also investigated. Calculations of the approximants and simulation studies are conducted to graphically illustrate the behavior of the approximations in several simple examples. Analytical and simulated errors of proposed approximations are recorded in Tables 1-3.
{"title":"Reconstructions of piece-wise continuous and discrete functions using moments","authors":"Robert Mnatsakanov, Rafik Aramyan, Farhad Jafari","doi":"arxiv-2312.04462","DOIUrl":"https://doi.org/arxiv-2312.04462","url":null,"abstract":"The problem of recovering a moment-determinate multivariate function $f$ via\u0000its moment sequence is studied. Under mild conditions on $f$, the point-wise\u0000and $L_1$-rates of convergence for the proposed constructions are established.\u0000The cases where $f$ is the indicator function of a set, and represents a\u0000discrete probability mass function are also investigated. Calculations of the\u0000approximants and simulation studies are conducted to graphically illustrate the\u0000behavior of the approximations in several simple examples. Analytical and\u0000simulated errors of proposed approximations are recorded in Tables 1-3.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138553608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We discover a connection between the Benjamini-Hochberg (BH) procedure and the recently proposed e-BH procedure [Wang and Ramdas, 2022] with a suitably defined set of e-values. This insight extends to a generalized version of the BH procedure and the model-free multiple testing procedure in Barber and Cand`es [2015] (BC) with a general form of rejection rules. The connection provides an effective way of developing new multiple testing procedures by aggregating or assembling e-values resulting from the BH and BC procedures and their use in different subsets of the data. In particular, we propose new multiple testing methodologies in three applications, including a hybrid approach that integrates the BH and BC procedures, a multiple testing procedure aimed at ensuring a new notion of fairness by controlling both the group-wise and overall false discovery rates (FDR), and a structure adaptive multiple testing procedure that can incorporate external covariate information to boost detection power. One notable feature of the proposed methods is that we use a data-dependent approach for assigning weights to e-values, significantly enhancing the efficiency of the resulting e-BH procedure. The construction of the weights is non-trivial and is motivated by the leave-one-out analysis for the BH and BC procedures. In theory, we prove that the proposed e-BH procedures with data-dependent weights in the three applications ensure finite sample FDR control. Furthermore, we demonstrate the efficiency of the proposed methods through numerical studies in the three applications.
我们发现Benjamini-Hochberg (BH)程序和最近提出的e-BH程序之间的联系[Wang和Ramdas, 2022]具有适当定义的e值集。这一见解扩展到Barber and and ' es [2015] (BC)中具有一般形式的拒绝规则的bh程序和无模型多重测试程序的通用版本。该连接提供了一种有效的方法,通过聚合或组装由BH和BC程序产生的e值及其在不同数据子集中的使用来开发新的多个测试程序。特别地,我们在三个应用中提出了新的多重测试方法,包括集成BH和BC程序的混合方法,旨在通过控制群体智慧和整体错误发现率(FDR)来确保新的公平性概念的多重测试程序,以及可以结合外部协变量信息以提高检测能力的结构自适应多重测试程序。所提出方法的一个显著特征是,我们使用数据依赖的方法为e值分配权重,显著提高了所得e-BH过程的效率。权重的构造是非平凡的,其动机是对BH和BC过程的留一分析。理论上,我们证明了在这三种应用中所提出的具有数据相关权值的e-BH过程确保了有限样本fdr控制。此外,我们通过三个应用的数值研究证明了所提出方法的有效性。
{"title":"E-values, Multiple Testing and Beyond","authors":"Guanxun Li, Xianyang Zhang","doi":"arxiv-2312.02905","DOIUrl":"https://doi.org/arxiv-2312.02905","url":null,"abstract":"We discover a connection between the Benjamini-Hochberg (BH) procedure and\u0000the recently proposed e-BH procedure [Wang and Ramdas, 2022] with a suitably\u0000defined set of e-values. This insight extends to a generalized version of the\u0000BH procedure and the model-free multiple testing procedure in Barber and\u0000Cand`es [2015] (BC) with a general form of rejection rules. The connection\u0000provides an effective way of developing new multiple testing procedures by\u0000aggregating or assembling e-values resulting from the BH and BC procedures and\u0000their use in different subsets of the data. In particular, we propose new\u0000multiple testing methodologies in three applications, including a hybrid\u0000approach that integrates the BH and BC procedures, a multiple testing procedure\u0000aimed at ensuring a new notion of fairness by controlling both the group-wise\u0000and overall false discovery rates (FDR), and a structure adaptive multiple\u0000testing procedure that can incorporate external covariate information to boost\u0000detection power. One notable feature of the proposed methods is that we use a\u0000data-dependent approach for assigning weights to e-values, significantly\u0000enhancing the efficiency of the resulting e-BH procedure. The construction of\u0000the weights is non-trivial and is motivated by the leave-one-out analysis for\u0000the BH and BC procedures. In theory, we prove that the proposed e-BH procedures\u0000with data-dependent weights in the three applications ensure finite sample FDR\u0000control. Furthermore, we demonstrate the efficiency of the proposed methods\u0000through numerical studies in the three applications.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"93 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbb{R}^d$ by a product measure $pi^star$. When $pi$ is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that $pi^star$ is close to the minimizer $pi^star_diamond$ of the KL divergence over a emph{polyhedral} set $mathcal{P}_diamond$, and (2) an algorithm for minimizing $text{KL}(cdot|pi)$ over $mathcal{P}_diamond$ with accelerated complexity $O(sqrt kappa log(kappa d/varepsilon^2))$, where $kappa$ is the condition number of $pi$.
{"title":"Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space","authors":"Yiheng Jiang, Sinho Chewi, Aram-Alexandre Pooladian","doi":"arxiv-2312.02849","DOIUrl":"https://doi.org/arxiv-2312.02849","url":null,"abstract":"We develop a theory of finite-dimensional polyhedral subsets over the\u0000Wasserstein space and optimization of functionals over them via first-order\u0000methods. Our main application is to the problem of mean-field variational\u0000inference, which seeks to approximate a distribution $pi$ over $mathbb{R}^d$\u0000by a product measure $pi^star$. When $pi$ is strongly log-concave and\u0000log-smooth, we provide (1) approximation rates certifying that $pi^star$ is\u0000close to the minimizer $pi^star_diamond$ of the KL divergence over a\u0000emph{polyhedral} set $mathcal{P}_diamond$, and (2) an algorithm for\u0000minimizing $text{KL}(cdot|pi)$ over $mathcal{P}_diamond$ with accelerated\u0000complexity $O(sqrt kappa log(kappa d/varepsilon^2))$, where $kappa$ is\u0000the condition number of $pi$.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"86 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suppose that $KsubsetC$ is compact and that $z_0inCbackslash K$ is an external point. An optimal prediction measure for regression by polynomials of degree at most $n,$ is one for which the variance of the prediction at $z_0$ is as small as possible. Hoel and Levine (cite{HL}) have considered the case of $K=[-1,1]$ and $z_0=x_0in Rbackslash [-1,1],$ where they show that the support of the optimal measure is the $n+1$ extremme points of the Chebyshev polynomial $T_n(x)$ and characterizing the optimal weights in terms of absolute values of fundamental interpolating Lagrange polynomials. More recently, cite{BLO} has given the equivalence of the optimal prediction problem with that of finding polynomials of extremal growth. They also study in detail the case of $K=[-1,1]$ and $z_0=iain iR,$ purely imaginary. In this work we generalize the Hoel-Levine formula to the general case when the support of the optimal measure is a finite set and give a formula for the optimal weights in terms of a $ell_1$ minimization problem.
{"title":"A Characterization of Optimal Prediction Measures via $ell_1$ Minimization","authors":"Len Bos","doi":"arxiv-2312.03091","DOIUrl":"https://doi.org/arxiv-2312.03091","url":null,"abstract":"Suppose that $KsubsetC$ is compact and that $z_0inCbackslash K$ is an\u0000external point. An optimal prediction measure for regression by polynomials of\u0000degree at most $n,$ is one for which the variance of the prediction at $z_0$ is\u0000as small as possible. Hoel and Levine (cite{HL}) have considered the case of\u0000$K=[-1,1]$ and $z_0=x_0in Rbackslash [-1,1],$ where they show that the\u0000support of the optimal measure is the $n+1$ extremme points of the Chebyshev\u0000polynomial $T_n(x)$ and characterizing the optimal weights in terms of absolute\u0000values of fundamental interpolating Lagrange polynomials. More recently,\u0000cite{BLO} has given the equivalence of the optimal prediction problem with\u0000that of finding polynomials of extremal growth. They also study in detail the\u0000case of $K=[-1,1]$ and $z_0=iain iR,$ purely imaginary. In this work we\u0000generalize the Hoel-Levine formula to the general case when the support of the\u0000optimal measure is a finite set and give a formula for the optimal weights in\u0000terms of a $ell_1$ minimization problem.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Associated to each graph G is a Gaussian graphical model. Such models are often used in high-dimensional settings, i.e. where there are relatively few data points compared to the number of variables. The maximum likelihood threshold of a graph is the minimum number of data points required to fit the corresponding graphical model using maximum likelihood estimation. Graphical lasso is a method for selecting and fitting a graphical model. In this project, we ask: when graphical lasso is used to select and fit a graphical model on n data points, how likely is it that n is greater than or equal to the maximum likelihood threshold of the corresponding graph? Our results are a series of computational experiments.
与每个图形 G 关联的是一个高斯图形模型。这种模型通常用于高维环境,即与变量数量相比数据点相对较少的情况。图形的最大似然阈值是使用最大似然估计拟合相应图形模型所需的最小数据点数。图形拟合是一种选择和拟合图形模型的方法。在这个项目中,我们要问:当使用图形套索在 nd 个数据点上选择和拟合图形模型时,n 大于或等于相应图形的最大似然阈值的可能性有多大?我们的结果是一系列计算实验的结果。
{"title":"Maximum likelihood thresholds of Gaussian graphical models and graphical lasso","authors":"Daniel Irving Bernstein, Hayden Outlaw","doi":"arxiv-2312.03145","DOIUrl":"https://doi.org/arxiv-2312.03145","url":null,"abstract":"Associated to each graph G is a Gaussian graphical model. Such models are\u0000often used in high-dimensional settings, i.e. where there are relatively few\u0000data points compared to the number of variables. The maximum likelihood\u0000threshold of a graph is the minimum number of data points required to fit the\u0000corresponding graphical model using maximum likelihood estimation. Graphical\u0000lasso is a method for selecting and fitting a graphical model. In this project,\u0000we ask: when graphical lasso is used to select and fit a graphical model on n\u0000data points, how likely is it that n is greater than or equal to the maximum\u0000likelihood threshold of the corresponding graph? Our results are a series of\u0000computational experiments.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many real-world networks exhibit the phenomenon of edge clustering, which is typically measured by the average clustering coefficient. Recently, an alternative measure, the average closure coefficient, is proposed to quantify local clustering. It is shown that the average closure coefficient possesses a number of useful properties and can capture complementary information missed by the classical average clustering coefficient. In this paper, we study the asymptotic distribution of the average closure coefficient of a heterogeneous Erd"{o}s-R'{e}nyi random graph. We prove that the standardized average closure coefficient converges in distribution to the standard normal distribution. In the Erd"{o}s-R'{e}nyi random graph, the variance of the average closure coefficient exhibits the same phase transition phenomenon as the average clustering coefficient.
{"title":"Central limit theorem for the average closure coefficient","authors":"Mingao Yuan","doi":"arxiv-2312.03142","DOIUrl":"https://doi.org/arxiv-2312.03142","url":null,"abstract":"Many real-world networks exhibit the phenomenon of edge clustering, which is\u0000typically measured by the average clustering coefficient. Recently, an\u0000alternative measure, the average closure coefficient, is proposed to quantify\u0000local clustering. It is shown that the average closure coefficient possesses a\u0000number of useful properties and can capture complementary information missed by\u0000the classical average clustering coefficient. In this paper, we study the\u0000asymptotic distribution of the average closure coefficient of a heterogeneous\u0000Erd\"{o}s-R'{e}nyi random graph. We prove that the standardized average\u0000closure coefficient converges in distribution to the standard normal\u0000distribution. In the Erd\"{o}s-R'{e}nyi random graph, the variance of the\u0000average closure coefficient exhibits the same phase transition phenomenon as\u0000the average clustering coefficient.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rerandomization, a design that utilizes pretreatment covariates and improves their balance between different treatment groups, has received attention recently in both theory and practice. There are at least two types of rerandomization that are used in practice: the first rerandomizes the treatment assignment until covariate imbalance is below a prespecified threshold; the second randomizes the treatment assignment multiple times and chooses the one with the best covariate balance. In this paper we will consider the second type of rerandomization, namely the best-choice rerandomization, whose theory and inference are still lacking in the literature. In particular, we will focus on the best-choice rerandomization that uses the Mahalanobis distance to measure covariate imbalance, which is one of the most commonly used imbalance measure for multivariate covariates and is invariant to affine transformations of covariates. We will study the large-sample repeatedly sampling properties of the best-choice rerandomization, allowing both the number of covariates and the number of tried complete randomizations to increase with the sample size. We show that the asymptotic distribution of the difference-in-means estimator is more concentrated around the true average treatment effect under rerandomization than under the complete randomization, and propose large-sample accurate confidence intervals for rerandomization that are shorter than that for the completely randomized experiment. We further demonstrate that, with moderate number of covariates and with the number of tried randomizations increasing polynomially with the sample size, the best-choice rerandomization can achieve the ideally optimal precision that one can expect even with perfectly balanced covariates. The developed theory and methods for rerandomization are also illustrated using real field experiments.
{"title":"Asymptotic Theory of the Best-Choice Rerandomization using the Mahalanobis Distance","authors":"Yuhao Wang, Xinran Li","doi":"arxiv-2312.02513","DOIUrl":"https://doi.org/arxiv-2312.02513","url":null,"abstract":"Rerandomization, a design that utilizes pretreatment covariates and improves\u0000their balance between different treatment groups, has received attention\u0000recently in both theory and practice. There are at least two types of\u0000rerandomization that are used in practice: the first rerandomizes the treatment\u0000assignment until covariate imbalance is below a prespecified threshold; the\u0000second randomizes the treatment assignment multiple times and chooses the one\u0000with the best covariate balance. In this paper we will consider the second type\u0000of rerandomization, namely the best-choice rerandomization, whose theory and\u0000inference are still lacking in the literature. In particular, we will focus on\u0000the best-choice rerandomization that uses the Mahalanobis distance to measure\u0000covariate imbalance, which is one of the most commonly used imbalance measure\u0000for multivariate covariates and is invariant to affine transformations of\u0000covariates. We will study the large-sample repeatedly sampling properties of\u0000the best-choice rerandomization, allowing both the number of covariates and the\u0000number of tried complete randomizations to increase with the sample size. We\u0000show that the asymptotic distribution of the difference-in-means estimator is\u0000more concentrated around the true average treatment effect under\u0000rerandomization than under the complete randomization, and propose large-sample\u0000accurate confidence intervals for rerandomization that are shorter than that\u0000for the completely randomized experiment. We further demonstrate that, with\u0000moderate number of covariates and with the number of tried randomizations\u0000increasing polynomially with the sample size, the best-choice rerandomization\u0000can achieve the ideally optimal precision that one can expect even with\u0000perfectly balanced covariates. The developed theory and methods for\u0000rerandomization are also illustrated using real field experiments.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"84 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper deals with surrogate modelling of a computer code output in a hierarchical multi-fidelity context, i.e., when the output can be evaluated at different levels of accuracy and computational cost. Using observations of the output at low- and high-fidelity levels, we propose a method that combines Gaussian process (GP) regression and Bayesian neural network (BNN), in a method called GPBNN. The low-fidelity output is treated as a single-fidelity code using classical GP regression. The high-fidelity output is approximated by a BNN that incorporates, in addition to the high-fidelity observations, well-chosen realisations of the low-fidelity output emulator. The predictive uncertainty of the final surrogate model is then quantified by a complete characterisation of the uncertainties of the different models and their interaction. GPBNN is compared with most of the multi-fidelity regression methods allowing to quantify the prediction uncertainty.
{"title":"A Bayesian neural network approach to Multi-fidelity surrogate modelling","authors":"Baptiste KerleguerDAM/DIF, CMAP, Claire CannamelaDAM/DIF, Josselin GarnierCMAP","doi":"arxiv-2312.02575","DOIUrl":"https://doi.org/arxiv-2312.02575","url":null,"abstract":"This paper deals with surrogate modelling of a computer code output in a\u0000hierarchical multi-fidelity context, i.e., when the output can be evaluated at\u0000different levels of accuracy and computational cost. Using observations of the\u0000output at low- and high-fidelity levels, we propose a method that combines\u0000Gaussian process (GP) regression and Bayesian neural network (BNN), in a method\u0000called GPBNN. The low-fidelity output is treated as a single-fidelity code\u0000using classical GP regression. The high-fidelity output is approximated by a\u0000BNN that incorporates, in addition to the high-fidelity observations,\u0000well-chosen realisations of the low-fidelity output emulator. The predictive\u0000uncertainty of the final surrogate model is then quantified by a complete\u0000characterisation of the uncertainties of the different models and their\u0000interaction. GPBNN is compared with most of the multi-fidelity regression\u0000methods allowing to quantify the prediction uncertainty.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"93 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean? We present an intuitive and efficient algorithm for this task. As different closed-form guarantees can be hard to compare, the Subset-of-Signals model serves as a benchmark for heteroskedastic mean estimation: given $n$ Gaussian variables with an unknown subset of $m$ variables having variance bounded by 1, what is the optimal estimation error as a function of $n$ and $m$? Our algorithm resolves this open question up to logarithmic factors, improving upon the previous best known estimation error by polynomial factors when $m = n^c$ for all $0