Pub Date : 2025-03-12eCollection Date: 2025-03-01DOI: 10.1093/imaiai/iaaf001
Inbeom Lee, Siyi Deng, Yang Ning
Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which can be very informative, especially in high-dimensional settings or when mean information is not available. To extract the information from the dependence structure for clustering, we propose a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns. Under this model, we further propose a class of hierarchical clustering algorithms using the difference of a weighted covariance matrix as the dissimilarity measure. Theoretically, we show that under mild conditions, our algorithm attains clustering consistency in the high-dimensional setting. While this consistency result holds for our algorithm with a broad class of weighted covariance matrices, the conditions for this result depend on the choice of the weight. To investigate how the weight affects the theoretical performance of our algorithm, we establish the minimax lower bound for clustering under our latent variable model in terms of some cluster separation metric. Given these results, we identify the optimal weight in the sense that using this weight guarantees our algorithm to be minimax rate-optimal. The practical implementation of our algorithm with the optimal weight is also discussed. Simulation studies show that our algorithm performs better than existing methods in terms of the adjusted Rand index (ARI). The method is applied to a genomic dataset and yields meaningful interpretations.
{"title":"Optimal variable clustering for high-dimensional matrix valued data.","authors":"Inbeom Lee, Siyi Deng, Yang Ning","doi":"10.1093/imaiai/iaaf001","DOIUrl":"10.1093/imaiai/iaaf001","url":null,"abstract":"<p><p>Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which can be very informative, especially in high-dimensional settings or when mean information is not available. To extract the information from the dependence structure for clustering, we propose a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns. Under this model, we further propose a class of hierarchical clustering algorithms using the difference of a weighted covariance matrix as the dissimilarity measure. Theoretically, we show that under mild conditions, our algorithm attains clustering consistency in the high-dimensional setting. While this consistency result holds for our algorithm with a broad class of weighted covariance matrices, the conditions for this result depend on the choice of the weight. To investigate how the weight affects the theoretical performance of our algorithm, we establish the minimax lower bound for clustering under our latent variable model in terms of some cluster separation metric. Given these results, we identify the optimal weight in the sense that using this weight guarantees our algorithm to be minimax rate-optimal. The practical implementation of our algorithm with the optimal weight is also discussed. Simulation studies show that our algorithm performs better than existing methods in terms of the adjusted Rand index (ARI). The method is applied to a genomic dataset and yields meaningful interpretations.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"14 1","pages":"iaaf001"},"PeriodicalIF":1.4,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899537/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16eCollection Date: 2025-03-01DOI: 10.1093/imaiai/iaae036
Boris Landa, Yuval Kluger
Detecting and recovering a low-rank signal in a noisy data matrix is a fundamental task in data analysis. Typically, this task is addressed by inspecting and manipulating the spectrum of the observed data, e.g. thresholding the singular values of the data matrix at a certain critical level. This approach is well established in the case of homoskedastic noise, where the noise variance is identical across the entries. However, in numerous applications, the noise can be heteroskedastic, where the noise characteristics may vary considerably across the rows and columns of the data. In this scenario, the spectral behaviour of the noise can differ significantly from the homoskedastic case, posing various challenges for signal detection and recovery. To address these challenges, we develop an adaptive normalization procedure that equalizes the average noise variance across the rows and columns of a given data matrix. Our proposed procedure is data-driven and fully automatic, supporting a broad range of noise distributions, variance patterns and signal structures. Our approach relies on random matrix theory results that describe the resolvent of the noise via the so-called Dyson equation. By leveraging this relation, we can accurately infer the noise level in each row and each column directly from the resolvent of the data. We establish that in many cases, our normalization enforces the standard spectral behaviour of homoskedastic noise-the Marchenko-Pastur (MP) law, allowing for simple and reliable detection of signal components. Furthermore, we demonstrate that our approach can substantially improve signal recovery in heteroskedastic settings by manipulating the spectrum after normalization. Lastly, we apply our method to single-cell RNA sequencing and spatial transcriptomics data, showcasing accurate fits to the MP law after normalization.
{"title":"The Dyson equalizer: adaptive noise stabilization for low-rank signal detection and recovery.","authors":"Boris Landa, Yuval Kluger","doi":"10.1093/imaiai/iaae036","DOIUrl":"10.1093/imaiai/iaae036","url":null,"abstract":"<p><p>Detecting and recovering a low-rank signal in a noisy data matrix is a fundamental task in data analysis. Typically, this task is addressed by inspecting and manipulating the spectrum of the observed data, e.g. thresholding the singular values of the data matrix at a certain critical level. This approach is well established in the case of homoskedastic noise, where the noise variance is identical across the entries. However, in numerous applications, the noise can be heteroskedastic, where the noise characteristics may vary considerably across the rows and columns of the data. In this scenario, the spectral behaviour of the noise can differ significantly from the homoskedastic case, posing various challenges for signal detection and recovery. To address these challenges, we develop an adaptive normalization procedure that equalizes the average noise variance across the rows and columns of a given data matrix. Our proposed procedure is data-driven and fully automatic, supporting a broad range of noise distributions, variance patterns and signal structures. Our approach relies on random matrix theory results that describe the resolvent of the noise via the so-called Dyson equation. By leveraging this relation, we can accurately infer the noise level in each row and each column directly from the resolvent of the data. We establish that in many cases, our normalization enforces the standard spectral behaviour of homoskedastic noise-the Marchenko-Pastur (MP) law, allowing for simple and reliable detection of signal components. Furthermore, we demonstrate that our approach can substantially improve signal recovery in heteroskedastic settings by manipulating the spectrum after normalization. Lastly, we apply our method to single-cell RNA sequencing and spatial transcriptomics data, showcasing accurate fits to the MP law after normalization.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"14 1","pages":"iaae036"},"PeriodicalIF":1.6,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143013811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-20eCollection Date: 2024-12-01DOI: 10.1093/imaiai/iaae026
Xiuyuan Cheng, Boris Landa
Bi-stochastic normalization provides an alternative normalization of graph Laplacians in graph-based data analysis and can be computed efficiently by Sinkhorn-Knopp (SK) iterations. This paper proves the convergence of bi-stochastically normalized graph Laplacian to manifold (weighted-)Laplacian with rates, when [Formula: see text] data points are i.i.d. sampled from a general [Formula: see text]-dimensional manifold embedded in a possibly high-dimensional space. Under certain joint limit of [Formula: see text] and kernel bandwidth [Formula: see text], the point-wise convergence rate of the graph Laplacian operator (under 2-norm) is proved to be [Formula: see text] at finite large [Formula: see text] up to log factors, achieved at the scaling of [Formula: see text]. When the manifold data are corrupted by outlier noise, we theoretically prove the graph Laplacian point-wise consistency which matches the rate for clean manifold data plus an additional term proportional to the boundedness of the inner-products of the noise vectors among themselves and with data vectors. Motivated by our analysis, which suggests that not exact bi-stochastic normalization but an approximate one will achieve the same consistency rate, we propose an approximate and constrained matrix scaling problem that can be solved by SK iterations with early termination. Numerical experiments support our theoretical results and show the robustness of bi-stochastically normalized graph Laplacian to high-dimensional outlier noise.
在基于图的数据分析中,双随机归一化为图拉普拉卡提供了另一种归一化方法,并且可以通过 Sinkhorn-Knopp (SK) 迭代高效计算。本文证明了当[公式:见正文]数据点是从一个嵌入到可能的高维空间中的一般[公式:见正文]维流形中进行 i.i.d. 采样时,双随机归一化图拉普拉奇与流形(加权)拉普拉奇的收敛率。在[公式:见正文]和核带宽[公式:见正文]的某些联合限制下,图拉普拉斯算子(2 正态下)的点向收敛速率被证明是[公式:见正文]在有限大[公式:见正文]对数因子以下,在[公式:见正文]的缩放比例下实现的。当流形数据被离群噪声干扰时,我们从理论上证明了图拉普拉斯的点向一致性,它与干净流形数据的速率相匹配,而且还有一个与噪声矢量之间以及与数据矢量之间的内积的有界性成正比的附加项。我们的分析表明,不是精确的双随机归一化,而是近似的归一化也能达到相同的一致性率,受此启发,我们提出了一个近似和受约束的矩阵缩放问题,该问题可以通过提前终止的 SK 迭代来解决。数值实验支持了我们的理论结果,并显示了双随机归一化图拉普拉卡对高维离群噪声的鲁棒性。
{"title":"Bi-stochastically normalized graph Laplacian: convergence to manifold Laplacian and robustness to outlier noise.","authors":"Xiuyuan Cheng, Boris Landa","doi":"10.1093/imaiai/iaae026","DOIUrl":"10.1093/imaiai/iaae026","url":null,"abstract":"<p><p>Bi-stochastic normalization provides an alternative normalization of graph Laplacians in graph-based data analysis and can be computed efficiently by Sinkhorn-Knopp (SK) iterations. This paper proves the convergence of bi-stochastically normalized graph Laplacian to manifold (weighted-)Laplacian with rates, when [Formula: see text] data points are i.i.d. sampled from a general [Formula: see text]-dimensional manifold embedded in a possibly high-dimensional space. Under certain joint limit of [Formula: see text] and kernel bandwidth [Formula: see text], the point-wise convergence rate of the graph Laplacian operator (under 2-norm) is proved to be [Formula: see text] at finite large [Formula: see text] up to log factors, achieved at the scaling of [Formula: see text]. When the manifold data are corrupted by outlier noise, we theoretically prove the graph Laplacian point-wise consistency which matches the rate for clean manifold data plus an additional term proportional to the boundedness of the inner-products of the noise vectors among themselves and with data vectors. Motivated by our analysis, which suggests that not exact bi-stochastic normalization but an approximate one will achieve the same consistency rate, we propose an approximate and constrained matrix scaling problem that can be solved by SK iterations with early termination. Numerical experiments support our theoretical results and show the robustness of bi-stochastically normalized graph Laplacian to high-dimensional outlier noise.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"13 4","pages":"iaae026"},"PeriodicalIF":1.6,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11415053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142298149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-20eCollection Date: 2024-03-01DOI: 10.1093/imaiai/iaae005
Hanwen Huang, Peng Zeng, Qinglong Yang
We study the problem of estimating a [Formula: see text]-sparse signal [Formula: see text] from a set of noisy observations [Formula: see text] under the model [Formula: see text], where [Formula: see text] is the measurement matrix the row of which is drawn from distribution [Formula: see text]. We consider the class of [Formula: see text]-regularized least squares (LQLS) given by the formulation [Formula: see text], where [Formula: see text] [Formula: see text] denotes the [Formula: see text]-norm. In the setting [Formula: see text] with fixed [Formula: see text] and [Formula: see text], we derive the asymptotic risk of [Formula: see text] for arbitrary covariance matrix [Formula: see text] that generalizes the existing results for standard Gaussian design, i.e. [Formula: see text]. The results were derived from the non-rigorous replica method. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of [Formula: see text] in the cases [Formula: see text] and [Formula: see text] which indicates that the correlations among predictors only affect the phase transition curve in the case [Formula: see text] a.k.a. LASSO. To study the influence of the covariance structure of [Formula: see text] on the performance of LQLS in the cases [Formula: see text] and [Formula: see text], we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.
{"title":"Phase transition and higher order analysis of <i>L<sub>q</sub></i> regularization under dependence.","authors":"Hanwen Huang, Peng Zeng, Qinglong Yang","doi":"10.1093/imaiai/iaae005","DOIUrl":"10.1093/imaiai/iaae005","url":null,"abstract":"<p><p>We study the problem of estimating a [Formula: see text]-sparse signal [Formula: see text] from a set of noisy observations [Formula: see text] under the model [Formula: see text], where [Formula: see text] is the measurement matrix the row of which is drawn from distribution [Formula: see text]. We consider the class of [Formula: see text]-regularized least squares (LQLS) given by the formulation [Formula: see text], where [Formula: see text] [Formula: see text] denotes the [Formula: see text]-norm. In the setting [Formula: see text] with fixed [Formula: see text] and [Formula: see text], we derive the asymptotic risk of [Formula: see text] for arbitrary covariance matrix [Formula: see text] that generalizes the existing results for standard Gaussian design, i.e. [Formula: see text]. The results were derived from the non-rigorous replica method. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of [Formula: see text] in the cases [Formula: see text] and [Formula: see text] which indicates that the correlations among predictors only affect the phase transition curve in the case [Formula: see text] a.k.a. LASSO. To study the influence of the covariance structure of [Formula: see text] on the performance of LQLS in the cases [Formula: see text] and [Formula: see text], we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"13 1","pages":"iaae005"},"PeriodicalIF":1.4,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10878746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139933465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider asymptotically exact inference on the leading canonical correlation directions and strengths between two high-dimensional vectors under sparsity restrictions. In this regard, our main contribution is developing a novel representation of the Canonical Correlation Analysis problem, based on which one can operationalize a one-step bias correction on reasonable initial estimators. Our analytic results in this regard are adaptive over suitable structural restrictions of the high-dimensional nuisance parameters, which, in this set-up, correspond to the covariance matrices of the variables of interest. We further supplement the theoretical guarantees behind our procedures with extensive numerical studies.
{"title":"On statistical inference with high-dimensional sparse CCA.","authors":"Nilanjana Laha, Nathan Huey, Brent Coull, Rajarshi Mukherjee","doi":"10.1093/imaiai/iaad040","DOIUrl":"10.1093/imaiai/iaad040","url":null,"abstract":"<p><p>We consider asymptotically exact inference on the leading canonical correlation directions and strengths between two high-dimensional vectors under sparsity restrictions. In this regard, our main contribution is developing a novel representation of the Canonical Correlation Analysis problem, based on which one can operationalize a one-step bias correction on reasonable initial estimators. Our analytic results in this regard are adaptive over suitable structural restrictions of the high-dimensional nuisance parameters, which, in this set-up, correspond to the covariance matrices of the variables of interest. We further supplement the theoretical guarantees behind our procedures with extensive numerical studies.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 4","pages":"iaad040"},"PeriodicalIF":16.4,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10656287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138048165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-14eCollection Date: 2023-12-01DOI: 10.1093/imaiai/iaad039
Byol Kim, Rina Foygel Barber
Algorithmic stability is a concept from learning theory that expresses the degree to which changes to the input data (e.g. removal of a single data point) may affect the outputs of a regression algorithm. Knowing an algorithm's stability properties is often useful for many downstream applications-for example, stability is known to lead to desirable generalization properties and predictive inference guarantees. However, many modern algorithms currently used in practice are too complex for a theoretical analysis of their stability properties, and thus we can only attempt to establish these properties through an empirical exploration of the algorithm's behaviour on various datasets. In this work, we lay out a formal statistical framework for this kind of black-box testing without any assumptions on the algorithm or the data distribution, and establish fundamental bounds on the ability of any black-box test to identify algorithmic stability.
{"title":"Black-box tests for algorithmic stability.","authors":"Byol Kim, Rina Foygel Barber","doi":"10.1093/imaiai/iaad039","DOIUrl":"10.1093/imaiai/iaad039","url":null,"abstract":"<p><p>Algorithmic stability is a concept from learning theory that expresses the degree to which changes to the input data (e.g. removal of a single data point) may affect the outputs of a regression algorithm. Knowing an algorithm's stability properties is often useful for many downstream applications-for example, stability is known to lead to desirable generalization properties and predictive inference guarantees. However, many modern algorithms currently used in practice are too complex for a theoretical analysis of their stability properties, and thus we can only attempt to establish these properties through an empirical exploration of the algorithm's behaviour on various datasets. In this work, we lay out a formal statistical framework for this kind of <i>black-box testing</i> without any assumptions on the algorithm or the data distribution, and establish fundamental bounds on the ability of any black-box test to identify algorithmic stability.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 4","pages":"2690-2719"},"PeriodicalIF":1.4,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41239705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Denoising a stationary process $(X_{i})_{i in mathbb{Z}}$ corrupted by additive white Gaussian noise $(Z_{i})_{i in mathbb{Z}}$ is a classic, well-studied and fundamental problem in information theory and statistical signal processing. However, finding theoretically founded computationally efficient denoising methods applicable to general sources is still an open problem. In the Bayesian set-up where the source distribution is known, a minimum mean square error (MMSE) denoiser estimates $X^{n}$ from noisy measurements $Y^{n}$ as $hat{X}^{n}=mathrm{E}[X^{n}|Y^{n}]$. However, for general sources, computing $mathrm{E}[X^{n}|Y^{n}]$ is computationally very challenging, if not infeasible. In this paper, starting from a Bayesian set-up, a novel denoising method, namely, quantized maximum a posteriori (Q-MAP) denoiser is proposed and its asymptotic performance is analysed. Both for memoryless sources, and for structured first-order Markov sources, it is shown that, asymptotically, as $sigma _{z}^{2} $ (noise variance) converges to zero, ${1over sigma _{z}^{2}} mathrm{E}[(X_{i}-hat{X}^{mathrm{QMAP}}_{i})^{2}]$ converges to the information dimension of the source. For the studied memoryless sources, this limit is known to be optimal. A key advantage of the Q-MAP denoiser, unlike an MMSE denoiser, is that it highlights the key properties of the source distribution that are to be used in its denoising. This key property leads to a new learning-based denoising approach that is applicable to generic structured sources. Using ImageNet database for training, initial simulation results exploring the performance of such a learning-based denoiser in image denoising are presented.
对受加性高斯白噪声干扰的平稳过程$(X_{i})_{i in mathbb{Z}}$去噪$(Z_{i})_{i in mathbb{Z}}$是信息论和统计信号处理中一个经典的、研究得很充分的基础问题。然而,寻找适用于一般信号源的理论基础计算高效的去噪方法仍然是一个悬而未决的问题。在已知源分布的贝叶斯设置中,最小均方误差(MMSE)去噪器从噪声测量$Y^{n}$估计$X^{n}$为$hat{X}^{n}=mathrm{E}[X^{n}|Y^{n}]$。然而,对于一般资源,计算$mathrm{E}[X^{n}|Y^{n}]$在计算上是非常具有挑战性的,如果不是不可行的。本文从贝叶斯模型出发,提出了一种新的去噪方法——量化最大后验去噪(Q-MAP),并对其渐近性能进行了分析。对于无记忆源和结构化一阶马尔可夫源,结果表明,随着$sigma _{z}^{2} $(噪声方差)收敛于零,${1over sigma _{z}^{2}} mathrm{E}[(X_{i}-hat{X}^{mathrm{QMAP}}_{i})^{2}]$收敛于源的信息维。对于所研究的无记忆源,已知这个限制是最优的。与MMSE去噪器不同,Q-MAP去噪器的一个关键优点是,它突出了用于去噪的源分布的关键属性。这一关键特性导致了一种新的基于学习的去噪方法,适用于一般结构化源。利用ImageNet数据库进行训练,给出了初步的仿真结果,探索了这种基于学习的去噪器在图像去噪中的性能。
{"title":"Bayesian denoising of structured sources and its implications on learning-based denoising","authors":"Wenda Zhou, Joachim Wabnig, Shirin Jalali","doi":"10.1093/imaiai/iaad036","DOIUrl":"https://doi.org/10.1093/imaiai/iaad036","url":null,"abstract":"Abstract Denoising a stationary process $(X_{i})_{i in mathbb{Z}}$ corrupted by additive white Gaussian noise $(Z_{i})_{i in mathbb{Z}}$ is a classic, well-studied and fundamental problem in information theory and statistical signal processing. However, finding theoretically founded computationally efficient denoising methods applicable to general sources is still an open problem. In the Bayesian set-up where the source distribution is known, a minimum mean square error (MMSE) denoiser estimates $X^{n}$ from noisy measurements $Y^{n}$ as $hat{X}^{n}=mathrm{E}[X^{n}|Y^{n}]$. However, for general sources, computing $mathrm{E}[X^{n}|Y^{n}]$ is computationally very challenging, if not infeasible. In this paper, starting from a Bayesian set-up, a novel denoising method, namely, quantized maximum a posteriori (Q-MAP) denoiser is proposed and its asymptotic performance is analysed. Both for memoryless sources, and for structured first-order Markov sources, it is shown that, asymptotically, as $sigma _{z}^{2} $ (noise variance) converges to zero, ${1over sigma _{z}^{2}} mathrm{E}[(X_{i}-hat{X}^{mathrm{QMAP}}_{i})^{2}]$ converges to the information dimension of the source. For the studied memoryless sources, this limit is known to be optimal. A key advantage of the Q-MAP denoiser, unlike an MMSE denoiser, is that it highlights the key properties of the source distribution that are to be used in its denoising. This key property leads to a new learning-based denoising approach that is applicable to generic structured sources. Using ImageNet database for training, initial simulation results exploring the performance of such a learning-based denoiser in image denoising are presented.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135060543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This note addresses the question of optimally estimating a linear functional of an object acquired through linear observations corrupted by random noise, where optimality pertains to a worst-case setting tied to a symmetric, convex and closed model set containing the object. It complements the article ‘Statistical Estimation and Optimal Recovery’ published in the Annals of Statistics in 1994. There, Donoho showed (among other things) that, for Gaussian noise, linear maps provide near-optimal estimation schemes relatively to a performance measure relevant in Statistical Estimation. Here, we advocate for a different performance measure arguably more relevant in Optimal Recovery. We show that, relatively to this new measure, linear maps still provide near-optimal estimation schemes even if the noise is merely log-concave. Our arguments, which make a connection to the deterministic noise situation and bypass properties specific to the Gaussian case, offer an alternative to parts of Donoho’s proof.
{"title":"Near-optimal estimation of linear functionals with log-concave observation errors","authors":"Simon Foucart, Grigoris Paouris","doi":"10.1093/imaiai/iaad038","DOIUrl":"https://doi.org/10.1093/imaiai/iaad038","url":null,"abstract":"Abstract This note addresses the question of optimally estimating a linear functional of an object acquired through linear observations corrupted by random noise, where optimality pertains to a worst-case setting tied to a symmetric, convex and closed model set containing the object. It complements the article ‘Statistical Estimation and Optimal Recovery’ published in the Annals of Statistics in 1994. There, Donoho showed (among other things) that, for Gaussian noise, linear maps provide near-optimal estimation schemes relatively to a performance measure relevant in Statistical Estimation. Here, we advocate for a different performance measure arguably more relevant in Optimal Recovery. We show that, relatively to this new measure, linear maps still provide near-optimal estimation schemes even if the noise is merely log-concave. Our arguments, which make a connection to the deterministic noise situation and bypass properties specific to the Gaussian case, offer an alternative to parts of Donoho’s proof.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135010731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Approximate message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph verify rigorous SE equations, extending the reach of previous proofs and proving a number of recent heuristic derivations of those equations. Our proof naturally includes non-separable functions and we show how existing refinements, such as spatial coupling or matrix-valued variables, can be combined with our framework.
{"title":"Graph-based approximate message passing iterations","authors":"Cédric Gerbelot, Raphaël Berthier","doi":"10.1093/imaiai/iaad020","DOIUrl":"https://doi.org/10.1093/imaiai/iaad020","url":null,"abstract":"Abstract Approximate message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph verify rigorous SE equations, extending the reach of previous proofs and proving a number of recent heuristic derivations of those equations. Our proof naturally includes non-separable functions and we show how existing refinements, such as spatial coupling or matrix-valued variables, can be combined with our framework.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135110705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We implement a complex analytic method to build an estimator of the spectrum of a matrix perturbed by the addition of a random matrix noise in the free probabilistic regime. This method, which has been previously introduced by Arizmendi, Tarrago and Vargas, involves two steps: the first step consists in a fixed point method to compute the Stieltjes transform of the desired distribution in a certain domain, and the second step is a classical deconvolution by a Cauchy distribution, whose parameter depends on the intensity of the noise. This method thus reduces the spectral deconvolution problem to a classical one. We provide explicit bounds for the mean squared error of the first step under the assumption that the distribution of the noise is unitary invariant. In the case where the unknown measure is sparse or close to a distribution with a density with enough smoothness, we prove that the resulting estimator converges to the measure in the $1$-Wasserstein distance at speed $O(1/sqrt{N})$, where $N$ is the dimension of the matrix.
{"title":"Spectral deconvolution of matrix models: the additive case","authors":"Pierre Tarrago","doi":"10.1093/imaiai/iaad037","DOIUrl":"https://doi.org/10.1093/imaiai/iaad037","url":null,"abstract":"Abstract We implement a complex analytic method to build an estimator of the spectrum of a matrix perturbed by the addition of a random matrix noise in the free probabilistic regime. This method, which has been previously introduced by Arizmendi, Tarrago and Vargas, involves two steps: the first step consists in a fixed point method to compute the Stieltjes transform of the desired distribution in a certain domain, and the second step is a classical deconvolution by a Cauchy distribution, whose parameter depends on the intensity of the noise. This method thus reduces the spectral deconvolution problem to a classical one. We provide explicit bounds for the mean squared error of the first step under the assumption that the distribution of the noise is unitary invariant. In the case where the unknown measure is sparse or close to a distribution with a density with enough smoothness, we prove that the resulting estimator converges to the measure in the $1$-Wasserstein distance at speed $O(1/sqrt{N})$, where $N$ is the dimension of the matrix.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135109334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}