Pub Date : 2024-02-20eCollection Date: 2024-03-01DOI: 10.1093/imaiai/iaae005
Hanwen Huang, Peng Zeng, Qinglong Yang
We study the problem of estimating a [Formula: see text]-sparse signal [Formula: see text] from a set of noisy observations [Formula: see text] under the model [Formula: see text], where [Formula: see text] is the measurement matrix the row of which is drawn from distribution [Formula: see text]. We consider the class of [Formula: see text]-regularized least squares (LQLS) given by the formulation [Formula: see text], where [Formula: see text] [Formula: see text] denotes the [Formula: see text]-norm. In the setting [Formula: see text] with fixed [Formula: see text] and [Formula: see text], we derive the asymptotic risk of [Formula: see text] for arbitrary covariance matrix [Formula: see text] that generalizes the existing results for standard Gaussian design, i.e. [Formula: see text]. The results were derived from the non-rigorous replica method. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of [Formula: see text] in the cases [Formula: see text] and [Formula: see text] which indicates that the correlations among predictors only affect the phase transition curve in the case [Formula: see text] a.k.a. LASSO. To study the influence of the covariance structure of [Formula: see text] on the performance of LQLS in the cases [Formula: see text] and [Formula: see text], we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.
{"title":"Phase transition and higher order analysis of <i>L<sub>q</sub></i> regularization under dependence.","authors":"Hanwen Huang, Peng Zeng, Qinglong Yang","doi":"10.1093/imaiai/iaae005","DOIUrl":"10.1093/imaiai/iaae005","url":null,"abstract":"<p><p>We study the problem of estimating a [Formula: see text]-sparse signal [Formula: see text] from a set of noisy observations [Formula: see text] under the model [Formula: see text], where [Formula: see text] is the measurement matrix the row of which is drawn from distribution [Formula: see text]. We consider the class of [Formula: see text]-regularized least squares (LQLS) given by the formulation [Formula: see text], where [Formula: see text] [Formula: see text] denotes the [Formula: see text]-norm. In the setting [Formula: see text] with fixed [Formula: see text] and [Formula: see text], we derive the asymptotic risk of [Formula: see text] for arbitrary covariance matrix [Formula: see text] that generalizes the existing results for standard Gaussian design, i.e. [Formula: see text]. The results were derived from the non-rigorous replica method. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of [Formula: see text] in the cases [Formula: see text] and [Formula: see text] which indicates that the correlations among predictors only affect the phase transition curve in the case [Formula: see text] a.k.a. LASSO. To study the influence of the covariance structure of [Formula: see text] on the performance of LQLS in the cases [Formula: see text] and [Formula: see text], we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10878746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139933465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider asymptotically exact inference on the leading canonical correlation directions and strengths between two high-dimensional vectors under sparsity restrictions. In this regard, our main contribution is developing a novel representation of the Canonical Correlation Analysis problem, based on which one can operationalize a one-step bias correction on reasonable initial estimators. Our analytic results in this regard are adaptive over suitable structural restrictions of the high-dimensional nuisance parameters, which, in this set-up, correspond to the covariance matrices of the variables of interest. We further supplement the theoretical guarantees behind our procedures with extensive numerical studies.
{"title":"On statistical inference with high-dimensional sparse CCA.","authors":"Nilanjana Laha, Nathan Huey, Brent Coull, Rajarshi Mukherjee","doi":"10.1093/imaiai/iaad040","DOIUrl":"10.1093/imaiai/iaad040","url":null,"abstract":"<p><p>We consider asymptotically exact inference on the leading canonical correlation directions and strengths between two high-dimensional vectors under sparsity restrictions. In this regard, our main contribution is developing a novel representation of the Canonical Correlation Analysis problem, based on which one can operationalize a one-step bias correction on reasonable initial estimators. Our analytic results in this regard are adaptive over suitable structural restrictions of the high-dimensional nuisance parameters, which, in this set-up, correspond to the covariance matrices of the variables of interest. We further supplement the theoretical guarantees behind our procedures with extensive numerical studies.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138048165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-14eCollection Date: 2023-12-01DOI: 10.1093/imaiai/iaad039
Byol Kim, Rina Foygel Barber
Algorithmic stability is a concept from learning theory that expresses the degree to which changes to the input data (e.g. removal of a single data point) may affect the outputs of a regression algorithm. Knowing an algorithm's stability properties is often useful for many downstream applications-for example, stability is known to lead to desirable generalization properties and predictive inference guarantees. However, many modern algorithms currently used in practice are too complex for a theoretical analysis of their stability properties, and thus we can only attempt to establish these properties through an empirical exploration of the algorithm's behaviour on various datasets. In this work, we lay out a formal statistical framework for this kind of black-box testing without any assumptions on the algorithm or the data distribution, and establish fundamental bounds on the ability of any black-box test to identify algorithmic stability.
{"title":"Black-box tests for algorithmic stability.","authors":"Byol Kim, Rina Foygel Barber","doi":"10.1093/imaiai/iaad039","DOIUrl":"10.1093/imaiai/iaad039","url":null,"abstract":"<p><p>Algorithmic stability is a concept from learning theory that expresses the degree to which changes to the input data (e.g. removal of a single data point) may affect the outputs of a regression algorithm. Knowing an algorithm's stability properties is often useful for many downstream applications-for example, stability is known to lead to desirable generalization properties and predictive inference guarantees. However, many modern algorithms currently used in practice are too complex for a theoretical analysis of their stability properties, and thus we can only attempt to establish these properties through an empirical exploration of the algorithm's behaviour on various datasets. In this work, we lay out a formal statistical framework for this kind of <i>black-box testing</i> without any assumptions on the algorithm or the data distribution, and establish fundamental bounds on the ability of any black-box test to identify algorithmic stability.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41239705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Denoising a stationary process $(X_{i})_{i in mathbb{Z}}$ corrupted by additive white Gaussian noise $(Z_{i})_{i in mathbb{Z}}$ is a classic, well-studied and fundamental problem in information theory and statistical signal processing. However, finding theoretically founded computationally efficient denoising methods applicable to general sources is still an open problem. In the Bayesian set-up where the source distribution is known, a minimum mean square error (MMSE) denoiser estimates $X^{n}$ from noisy measurements $Y^{n}$ as $hat{X}^{n}=mathrm{E}[X^{n}|Y^{n}]$. However, for general sources, computing $mathrm{E}[X^{n}|Y^{n}]$ is computationally very challenging, if not infeasible. In this paper, starting from a Bayesian set-up, a novel denoising method, namely, quantized maximum a posteriori (Q-MAP) denoiser is proposed and its asymptotic performance is analysed. Both for memoryless sources, and for structured first-order Markov sources, it is shown that, asymptotically, as $sigma _{z}^{2} $ (noise variance) converges to zero, ${1over sigma _{z}^{2}} mathrm{E}[(X_{i}-hat{X}^{mathrm{QMAP}}_{i})^{2}]$ converges to the information dimension of the source. For the studied memoryless sources, this limit is known to be optimal. A key advantage of the Q-MAP denoiser, unlike an MMSE denoiser, is that it highlights the key properties of the source distribution that are to be used in its denoising. This key property leads to a new learning-based denoising approach that is applicable to generic structured sources. Using ImageNet database for training, initial simulation results exploring the performance of such a learning-based denoiser in image denoising are presented.
对受加性高斯白噪声干扰的平稳过程$(X_{i})_{i in mathbb{Z}}$去噪$(Z_{i})_{i in mathbb{Z}}$是信息论和统计信号处理中一个经典的、研究得很充分的基础问题。然而,寻找适用于一般信号源的理论基础计算高效的去噪方法仍然是一个悬而未决的问题。在已知源分布的贝叶斯设置中,最小均方误差(MMSE)去噪器从噪声测量$Y^{n}$估计$X^{n}$为$hat{X}^{n}=mathrm{E}[X^{n}|Y^{n}]$。然而,对于一般资源,计算$mathrm{E}[X^{n}|Y^{n}]$在计算上是非常具有挑战性的,如果不是不可行的。本文从贝叶斯模型出发,提出了一种新的去噪方法——量化最大后验去噪(Q-MAP),并对其渐近性能进行了分析。对于无记忆源和结构化一阶马尔可夫源,结果表明,随着$sigma _{z}^{2} $(噪声方差)收敛于零,${1over sigma _{z}^{2}} mathrm{E}[(X_{i}-hat{X}^{mathrm{QMAP}}_{i})^{2}]$收敛于源的信息维。对于所研究的无记忆源,已知这个限制是最优的。与MMSE去噪器不同,Q-MAP去噪器的一个关键优点是,它突出了用于去噪的源分布的关键属性。这一关键特性导致了一种新的基于学习的去噪方法,适用于一般结构化源。利用ImageNet数据库进行训练,给出了初步的仿真结果,探索了这种基于学习的去噪器在图像去噪中的性能。
{"title":"Bayesian denoising of structured sources and its implications on learning-based denoising","authors":"Wenda Zhou, Joachim Wabnig, Shirin Jalali","doi":"10.1093/imaiai/iaad036","DOIUrl":"https://doi.org/10.1093/imaiai/iaad036","url":null,"abstract":"Abstract Denoising a stationary process $(X_{i})_{i in mathbb{Z}}$ corrupted by additive white Gaussian noise $(Z_{i})_{i in mathbb{Z}}$ is a classic, well-studied and fundamental problem in information theory and statistical signal processing. However, finding theoretically founded computationally efficient denoising methods applicable to general sources is still an open problem. In the Bayesian set-up where the source distribution is known, a minimum mean square error (MMSE) denoiser estimates $X^{n}$ from noisy measurements $Y^{n}$ as $hat{X}^{n}=mathrm{E}[X^{n}|Y^{n}]$. However, for general sources, computing $mathrm{E}[X^{n}|Y^{n}]$ is computationally very challenging, if not infeasible. In this paper, starting from a Bayesian set-up, a novel denoising method, namely, quantized maximum a posteriori (Q-MAP) denoiser is proposed and its asymptotic performance is analysed. Both for memoryless sources, and for structured first-order Markov sources, it is shown that, asymptotically, as $sigma _{z}^{2} $ (noise variance) converges to zero, ${1over sigma _{z}^{2}} mathrm{E}[(X_{i}-hat{X}^{mathrm{QMAP}}_{i})^{2}]$ converges to the information dimension of the source. For the studied memoryless sources, this limit is known to be optimal. A key advantage of the Q-MAP denoiser, unlike an MMSE denoiser, is that it highlights the key properties of the source distribution that are to be used in its denoising. This key property leads to a new learning-based denoising approach that is applicable to generic structured sources. Using ImageNet database for training, initial simulation results exploring the performance of such a learning-based denoiser in image denoising are presented.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135060543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This note addresses the question of optimally estimating a linear functional of an object acquired through linear observations corrupted by random noise, where optimality pertains to a worst-case setting tied to a symmetric, convex and closed model set containing the object. It complements the article ‘Statistical Estimation and Optimal Recovery’ published in the Annals of Statistics in 1994. There, Donoho showed (among other things) that, for Gaussian noise, linear maps provide near-optimal estimation schemes relatively to a performance measure relevant in Statistical Estimation. Here, we advocate for a different performance measure arguably more relevant in Optimal Recovery. We show that, relatively to this new measure, linear maps still provide near-optimal estimation schemes even if the noise is merely log-concave. Our arguments, which make a connection to the deterministic noise situation and bypass properties specific to the Gaussian case, offer an alternative to parts of Donoho’s proof.
{"title":"Near-optimal estimation of linear functionals with log-concave observation errors","authors":"Simon Foucart, Grigoris Paouris","doi":"10.1093/imaiai/iaad038","DOIUrl":"https://doi.org/10.1093/imaiai/iaad038","url":null,"abstract":"Abstract This note addresses the question of optimally estimating a linear functional of an object acquired through linear observations corrupted by random noise, where optimality pertains to a worst-case setting tied to a symmetric, convex and closed model set containing the object. It complements the article ‘Statistical Estimation and Optimal Recovery’ published in the Annals of Statistics in 1994. There, Donoho showed (among other things) that, for Gaussian noise, linear maps provide near-optimal estimation schemes relatively to a performance measure relevant in Statistical Estimation. Here, we advocate for a different performance measure arguably more relevant in Optimal Recovery. We show that, relatively to this new measure, linear maps still provide near-optimal estimation schemes even if the noise is merely log-concave. Our arguments, which make a connection to the deterministic noise situation and bypass properties specific to the Gaussian case, offer an alternative to parts of Donoho’s proof.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135010731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Approximate message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph verify rigorous SE equations, extending the reach of previous proofs and proving a number of recent heuristic derivations of those equations. Our proof naturally includes non-separable functions and we show how existing refinements, such as spatial coupling or matrix-valued variables, can be combined with our framework.
{"title":"Graph-based approximate message passing iterations","authors":"Cédric Gerbelot, Raphaël Berthier","doi":"10.1093/imaiai/iaad020","DOIUrl":"https://doi.org/10.1093/imaiai/iaad020","url":null,"abstract":"Abstract Approximate message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph verify rigorous SE equations, extending the reach of previous proofs and proving a number of recent heuristic derivations of those equations. Our proof naturally includes non-separable functions and we show how existing refinements, such as spatial coupling or matrix-valued variables, can be combined with our framework.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135110705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We implement a complex analytic method to build an estimator of the spectrum of a matrix perturbed by the addition of a random matrix noise in the free probabilistic regime. This method, which has been previously introduced by Arizmendi, Tarrago and Vargas, involves two steps: the first step consists in a fixed point method to compute the Stieltjes transform of the desired distribution in a certain domain, and the second step is a classical deconvolution by a Cauchy distribution, whose parameter depends on the intensity of the noise. This method thus reduces the spectral deconvolution problem to a classical one. We provide explicit bounds for the mean squared error of the first step under the assumption that the distribution of the noise is unitary invariant. In the case where the unknown measure is sparse or close to a distribution with a density with enough smoothness, we prove that the resulting estimator converges to the measure in the $1$-Wasserstein distance at speed $O(1/sqrt{N})$, where $N$ is the dimension of the matrix.
{"title":"Spectral deconvolution of matrix models: the additive case","authors":"Pierre Tarrago","doi":"10.1093/imaiai/iaad037","DOIUrl":"https://doi.org/10.1093/imaiai/iaad037","url":null,"abstract":"Abstract We implement a complex analytic method to build an estimator of the spectrum of a matrix perturbed by the addition of a random matrix noise in the free probabilistic regime. This method, which has been previously introduced by Arizmendi, Tarrago and Vargas, involves two steps: the first step consists in a fixed point method to compute the Stieltjes transform of the desired distribution in a certain domain, and the second step is a classical deconvolution by a Cauchy distribution, whose parameter depends on the intensity of the noise. This method thus reduces the spectral deconvolution problem to a classical one. We provide explicit bounds for the mean squared error of the first step under the assumption that the distribution of the noise is unitary invariant. In the case where the unknown measure is sparse or close to a distribution with a density with enough smoothness, we prove that the resulting estimator converges to the measure in the $1$-Wasserstein distance at speed $O(1/sqrt{N})$, where $N$ is the dimension of the matrix.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135109334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We study Langevin dynamics for recovering the planted signal in the spiked matrix model. We provide a ‘path-wise’ characterization of the overlap between the output of the Langevin algorithm and the planted signal. This overlap is characterized in terms of a self-consistent system of integro-differential equations, usually referred to as the Crisanti–Horner–Sommers–Cugliandolo–Kurchan equations in the spin glass literature. As a second contribution, we derive an explicit formula for the limiting overlap in terms of the signal-to-noise ratio and the injected noise in the diffusion. This uncovers a sharp phase transition—in one regime, the limiting overlap is strictly positive, while in the other, the injected noise overcomes the signal, and the limiting overlap is zero.
{"title":"High-dimensional asymptotics of Langevin dynamics in spiked matrix models","authors":"Tengyuan Liang, Subhabrata Sen, Pragya Sur","doi":"10.1093/imaiai/iaad042","DOIUrl":"https://doi.org/10.1093/imaiai/iaad042","url":null,"abstract":"Abstract We study Langevin dynamics for recovering the planted signal in the spiked matrix model. We provide a ‘path-wise’ characterization of the overlap between the output of the Langevin algorithm and the planted signal. This overlap is characterized in terms of a self-consistent system of integro-differential equations, usually referred to as the Crisanti–Horner–Sommers–Cugliandolo–Kurchan equations in the spin glass literature. As a second contribution, we derive an explicit formula for the limiting overlap in terms of the signal-to-noise ratio and the injected noise in the diffusion. This uncovers a sharp phase transition—in one regime, the limiting overlap is strictly positive, while in the other, the injected noise overcomes the signal, and the limiting overlap is zero.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135256563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Gromov–Wasserstein (GW) distances are combinations of Gromov–Hausdorff and Wasserstein distances that allow the comparison of two different metric measure spaces (mm-spaces). Due to their invariance under measure- and distance-preserving transformations, they are well suited for many applications in graph and shape analysis. In this paper, we introduce the concept of multi-marginal GW transport between a set of mm-spaces as well as its regularized and unbalanced versions. As a special case, we discuss multi-marginal fused variants, which combine the structure information of an mm-space with label information from an additional label space. To tackle the new formulations numerically, we consider the bi-convex relaxation of the multi-marginal GW problem, which is tight in the balanced case if the cost function is conditionally negative definite. The relaxed model can be solved by an alternating minimization, where each step can be performed by a multi-marginal Sinkhorn scheme. We show relations of our multi-marginal GW problem to (unbalanced, fused) GW barycentres and present various numerical results, which indicate the potential of the concept.
{"title":"Multi-marginal Gromov–Wasserstein transport and barycentres","authors":"Florian Beier, Robert Beinert, Gabriele Steidl","doi":"10.1093/imaiai/iaad041","DOIUrl":"https://doi.org/10.1093/imaiai/iaad041","url":null,"abstract":"Abstract Gromov–Wasserstein (GW) distances are combinations of Gromov–Hausdorff and Wasserstein distances that allow the comparison of two different metric measure spaces (mm-spaces). Due to their invariance under measure- and distance-preserving transformations, they are well suited for many applications in graph and shape analysis. In this paper, we introduce the concept of multi-marginal GW transport between a set of mm-spaces as well as its regularized and unbalanced versions. As a special case, we discuss multi-marginal fused variants, which combine the structure information of an mm-space with label information from an additional label space. To tackle the new formulations numerically, we consider the bi-convex relaxation of the multi-marginal GW problem, which is tight in the balanced case if the cost function is conditionally negative definite. The relaxed model can be solved by an alternating minimization, where each step can be performed by a multi-marginal Sinkhorn scheme. We show relations of our multi-marginal GW problem to (unbalanced, fused) GW barycentres and present various numerical results, which indicate the potential of the concept.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135257493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract A generic out-of-sample error estimate is proposed for $M$-estimators regularized with a convex penalty in high-dimensional linear regression where $(boldsymbol{X},boldsymbol{y})$ is observed and the dimension $p$ and sample size $n$ are of the same order. The out-of-sample error estimate enjoys a relative error of order $n^{-1/2}$ in a linear model with Gaussian covariates and independent noise, either non-asymptotically when $p/nle gamma $ or asymptotically in the high-dimensional asymptotic regime $p/nto gamma ^{prime}in (0,infty )$. General differentiable loss functions $rho $ are allowed provided that the derivative of the loss is 1-Lipschitz; this includes the least-squares loss as well as robust losses such as the Huber loss and its smoothed versions. The validity of the out-of-sample error estimate holds either under a strong convexity assumption, or for the L1-penalized Huber M-estimator and the Lasso under a sparsity assumption and a bound on the number of contaminated observations. For the square loss and in the absence of corruption in the response, the results additionally yield $n^{-1/2}$-consistent estimates of the noise variance and of the generalization error. This generalizes, to arbitrary convex penalty and arbitrary covariance, estimates that were previously known for the Lasso.
{"title":"Out-of-sample error estimation for M-estimators with convex penalty","authors":"Pierre C Bellec","doi":"10.1093/imaiai/iaad031","DOIUrl":"https://doi.org/10.1093/imaiai/iaad031","url":null,"abstract":"Abstract A generic out-of-sample error estimate is proposed for $M$-estimators regularized with a convex penalty in high-dimensional linear regression where $(boldsymbol{X},boldsymbol{y})$ is observed and the dimension $p$ and sample size $n$ are of the same order. The out-of-sample error estimate enjoys a relative error of order $n^{-1/2}$ in a linear model with Gaussian covariates and independent noise, either non-asymptotically when $p/nle gamma $ or asymptotically in the high-dimensional asymptotic regime $p/nto gamma ^{prime}in (0,infty )$. General differentiable loss functions $rho $ are allowed provided that the derivative of the loss is 1-Lipschitz; this includes the least-squares loss as well as robust losses such as the Huber loss and its smoothed versions. The validity of the out-of-sample error estimate holds either under a strong convexity assumption, or for the L1-penalized Huber M-estimator and the Lasso under a sparsity assumption and a bound on the number of contaminated observations. For the square loss and in the absence of corruption in the response, the results additionally yield $n^{-1/2}$-consistent estimates of the noise variance and of the generalization error. This generalizes, to arbitrary convex penalty and arbitrary covariance, estimates that were previously known for the Lasso.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135258177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}