Pub Date : 2023-08-16eCollection Date: 2023-09-01DOI: 10.1093/imaiai/iaad032
Yariv Aizenbud, Ariel Jaffe, Meng Wang, Amber Hu, Noah Amsel, Boaz Nadler, Joseph T Chang, Yuval Kluger
Modeling the distribution of high-dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, separately recover the structure of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop spectral top-down recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.
{"title":"Spectral top-down recovery of latent tree models.","authors":"Yariv Aizenbud, Ariel Jaffe, Meng Wang, Amber Hu, Noah Amsel, Boaz Nadler, Joseph T Chang, Yuval Kluger","doi":"10.1093/imaiai/iaad032","DOIUrl":"10.1093/imaiai/iaad032","url":null,"abstract":"<p><p>Modeling the distribution of high-dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed <i>divide-and-conquer</i>, is to recover the tree structure in two steps. First, separately recover the structure of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop spectral top-down recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10431953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10233338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alden Green, Sivaraman Balakrishnan, Ryan J Tibshirani
Abstract In this paper, we study the statistical properties of Principal Components Regression with Laplacian Eigenmaps (PCR-LE), a method for non-parametric regression based on Laplacian Eigenmaps (LE). PCR-LE works by projecting a vector of observed responses ${textbf Y} = (Y_1,ldots ,Y_n)$ onto a subspace spanned by certain eigenvectors of a neighbourhood graph Laplacian. We show that PCR-LE achieves minimax rates of convergence for random design regression over Sobolev spaces. Under sufficient smoothness conditions on the design density $p$, PCR-LE achieves the optimal rates for both estimation (where the optimal rate in squared $L^2$ norm is known to be $n^{-2s/(2s + d)}$) and goodness-of-fit testing ($n^{-4s/(4s + d)}$). We also consider the situation where the design is supported on a manifold of small intrinsic dimension $m$, and give upper bounds establishing that PCR-LE achieves the faster minimax estimation ($n^{-2s/(2s + m)}$) and testing ($n^{-4s/(4s + m)}$) rates of convergence. Interestingly, these rates are almost always much faster than the known rates of convergence of graph Laplacian eigenvectors to their population-level limits; in other words, for this problem regression with estimated features appears to be much easier, statistically speaking, than estimating the features itself. We support these theoretical results with empirical evidence.
{"title":"Minimax optimal regression over Sobolev spaces via Laplacian Eigenmaps on neighbourhood graphs","authors":"Alden Green, Sivaraman Balakrishnan, Ryan J Tibshirani","doi":"10.1093/imaiai/iaad034","DOIUrl":"https://doi.org/10.1093/imaiai/iaad034","url":null,"abstract":"Abstract In this paper, we study the statistical properties of Principal Components Regression with Laplacian Eigenmaps (PCR-LE), a method for non-parametric regression based on Laplacian Eigenmaps (LE). PCR-LE works by projecting a vector of observed responses ${textbf Y} = (Y_1,ldots ,Y_n)$ onto a subspace spanned by certain eigenvectors of a neighbourhood graph Laplacian. We show that PCR-LE achieves minimax rates of convergence for random design regression over Sobolev spaces. Under sufficient smoothness conditions on the design density $p$, PCR-LE achieves the optimal rates for both estimation (where the optimal rate in squared $L^2$ norm is known to be $n^{-2s/(2s + d)}$) and goodness-of-fit testing ($n^{-4s/(4s + d)}$). We also consider the situation where the design is supported on a manifold of small intrinsic dimension $m$, and give upper bounds establishing that PCR-LE achieves the faster minimax estimation ($n^{-2s/(2s + m)}$) and testing ($n^{-4s/(4s + m)}$) rates of convergence. Interestingly, these rates are almost always much faster than the known rates of convergence of graph Laplacian eigenvectors to their population-level limits; in other words, for this problem regression with estimated features appears to be much easier, statistically speaking, than estimating the features itself. We support these theoretical results with empirical evidence.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136266565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier Álvarez-Vizoso, Carlos Beltrán, Diego Cuevas, Ignacio Santamaría, Vít Tuček, Gunnar Peters
Abstract We consider the chordal product determinant, a measure of the distance between two subspaces of the same dimension. In information theory, collections of elements in the complex Grassmannian are searched with the property that their pairwise chordal products are as large as possible. We characterize this function from an statistical perspective, which allows us to obtain bounds for the minimal chordal product and related energy of such collections.
{"title":"Statistical characterization of the chordal product determinant of Grassmannian codes","authors":"Javier Álvarez-Vizoso, Carlos Beltrán, Diego Cuevas, Ignacio Santamaría, Vít Tuček, Gunnar Peters","doi":"10.1093/imaiai/iaad035","DOIUrl":"https://doi.org/10.1093/imaiai/iaad035","url":null,"abstract":"Abstract We consider the chordal product determinant, a measure of the distance between two subspaces of the same dimension. In information theory, collections of elements in the complex Grassmannian are searched with the property that their pairwise chordal products are as large as possible. We characterize this function from an statistical perspective, which allows us to obtain bounds for the minimal chordal product and related energy of such collections.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136266747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ratio of $ell _{1}$ and $ell _{2}$ norms, denoted as $ell _{1}/ell _{2}$, has presented prominent performance in promoting sparsity. By adding partial support information to the standard $ell _{1}/ell _{2}$ minimization, in this paper, we introduce a novel model, i.e. the weighted $ell _{1}/ell _{2}$ minimization, to recover sparse signals from the linear measurements. The restricted isometry property based conditions for sparse signal recovery in both noiseless and noisy cases through the weighted $ell _{1}/ell _{2}$ minimization are established. And we show that the proposed conditions are weaker than the analogous conditions for standard $ell _{1}/ell _{2}$ minimization when the accuracy of the partial support information is at least $50%$. Moreover, we develop effective algorithms and illustrate our results via extensive numerical experiments on synthetic data in both noiseless and noisy cases.
$ well _{1}$和$ well _{2}$规范的比值,表示为$ well _{1}/ well _{2}$,在促进稀疏性方面表现出突出的性能。本文通过在标准的$ well _{1}/ well _{2}$最小化中加入部分支持信息,引入加权$ well _{1}/ well _{2}$最小化模型,从线性测量中恢复稀疏信号。通过加权的$ well _{1}/ well _{2}$最小化,建立了在无噪声和有噪声情况下稀疏信号恢复的基于限制等距性质的条件。当部分支持信息的精度至少为50%时,所提出的条件比标准的$ well _{1}/ well _{2}$最小化的类似条件弱。此外,我们开发了有效的算法,并通过在无噪声和有噪声情况下对合成数据进行广泛的数值实验来说明我们的结果。
{"title":"Analysis of the ratio of ℓ1 and ℓ2 norms for signal recovery with partial support information","authors":"Huanmin Ge, Wengu Chen, Michael K. Ng","doi":"10.1093/imaiai/iaad015","DOIUrl":"https://doi.org/10.1093/imaiai/iaad015","url":null,"abstract":"\u0000 The ratio of $ell _{1}$ and $ell _{2}$ norms, denoted as $ell _{1}/ell _{2}$, has presented prominent performance in promoting sparsity. By adding partial support information to the standard $ell _{1}/ell _{2}$ minimization, in this paper, we introduce a novel model, i.e. the weighted $ell _{1}/ell _{2}$ minimization, to recover sparse signals from the linear measurements. The restricted isometry property based conditions for sparse signal recovery in both noiseless and noisy cases through the weighted $ell _{1}/ell _{2}$ minimization are established. And we show that the proposed conditions are weaker than the analogous conditions for standard $ell _{1}/ell _{2}$ minimization when the accuracy of the partial support information is at least $50%$. Moreover, we develop effective algorithms and illustrate our results via extensive numerical experiments on synthetic data in both noiseless and noisy cases.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78091741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kiryung Lee, Rakshith Sharma Srinivasa, Marius Junge, Justin Romberg
Abstract Low-rank matrix models have been universally useful for numerous applications, from classical system identification to more modern matrix completion in signal processing and statistics. The nuclear norm has been employed as a convex surrogate of the low-rankness since it induces a low-rank solution to inverse problems. While the nuclear norm for low rankness has an excellent analogy with the $ell _1$ norm for sparsity through the singular value decomposition, other matrix norms also induce low-rankness. Particularly as one interprets a matrix as a linear operator between Banach spaces, various tensor product norms generalize the role of the nuclear norm. We provide a tensor-norm-constrained estimator for the recovery of approximately low-rank matrices from local measurements corrupted with noise. A tensor-norm regularizer is designed to adapt to the local structure. We derive statistical analysis of the estimator over matrix completion and decentralized sketching by applying Maurey’s empirical method to tensor products of Banach spaces. The estimator provides a near-optimal error bound in a minimax sense and admits a polynomial-time algorithm for these applications.
{"title":"Approximately low-rank recovery from noisy and local measurements by convex program","authors":"Kiryung Lee, Rakshith Sharma Srinivasa, Marius Junge, Justin Romberg","doi":"10.1093/imaiai/iaad013","DOIUrl":"https://doi.org/10.1093/imaiai/iaad013","url":null,"abstract":"Abstract Low-rank matrix models have been universally useful for numerous applications, from classical system identification to more modern matrix completion in signal processing and statistics. The nuclear norm has been employed as a convex surrogate of the low-rankness since it induces a low-rank solution to inverse problems. While the nuclear norm for low rankness has an excellent analogy with the $ell _1$ norm for sparsity through the singular value decomposition, other matrix norms also induce low-rankness. Particularly as one interprets a matrix as a linear operator between Banach spaces, various tensor product norms generalize the role of the nuclear norm. We provide a tensor-norm-constrained estimator for the recovery of approximately low-rank matrices from local measurements corrupted with noise. A tensor-norm regularizer is designed to adapt to the local structure. We derive statistical analysis of the estimator over matrix completion and decentralized sketching by applying Maurey’s empirical method to tensor products of Banach spaces. The estimator provides a near-optimal error bound in a minimax sense and admits a polynomial-time algorithm for these applications.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136085521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the $k$-nearest neighbour classifier ($k$-NN) of probability measures under the Wasserstein distance. We show that the $k$-NN classifier is not universally consistent on the space of measures supported in $(0,1)$. As any Euclidean ball contains a copy of $(0,1)$, one should not expect to obtain universal consistency without some restriction on the base metric space, or the Wasserstein space itself. To this end, via the notion of $sigma $-finite metric dimension, we show that the $k$-NN classifier is universally consistent on spaces of discrete measures (and more generally, $sigma $-finite uniformly discrete measures) with rational mass. In addition, by studying the geodesic structures of the Wasserstein spaces for $p=1$ and $p=2$, we show that the $k$-NN classifier is universally consistent on spaces of measures supported on a finite set, the space of Gaussian measures and spaces of measures with finite wavelet series densities.
{"title":"Universal consistency of Wasserstein k-NN classifier: a negative and some positive results","authors":"Donlapark Ponnoprat","doi":"10.1093/imaiai/iaad027","DOIUrl":"https://doi.org/10.1093/imaiai/iaad027","url":null,"abstract":"\u0000 We study the $k$-nearest neighbour classifier ($k$-NN) of probability measures under the Wasserstein distance. We show that the $k$-NN classifier is not universally consistent on the space of measures supported in $(0,1)$. As any Euclidean ball contains a copy of $(0,1)$, one should not expect to obtain universal consistency without some restriction on the base metric space, or the Wasserstein space itself. To this end, via the notion of $sigma $-finite metric dimension, we show that the $k$-NN classifier is universally consistent on spaces of discrete measures (and more generally, $sigma $-finite uniformly discrete measures) with rational mass. In addition, by studying the geodesic structures of the Wasserstein spaces for $p=1$ and $p=2$, we show that the $k$-NN classifier is universally consistent on spaces of measures supported on a finite set, the space of Gaussian measures and spaces of measures with finite wavelet series densities.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80275020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract In this paper, we study the high-dimensional super-resolution imaging problem. Here, we are given an image of a number of point sources of light whose locations and intensities are unknown. The image is pixelized and is blurred by a known point-spread function arising from the imaging device. We encode the unknown point sources and their intensities via a non-negative measure and we propose a convex optimization program to find it. Assuming the device’s point-spread function is componentwise decomposable, we show that the optimal solution is the true measure in the noiseless case, and it approximates the true measure well in the noisy case with respect to the generalized Wasserstein distance. Our main assumption is that the components of the point-spread function form a Tchebychev system ($T$-system) in the noiseless case and a $T^{*}$-system in the noisy case, mild conditions that are satisfied by Gaussian point-spread functions. Our work is a generalization to all dimensions of the work [14] where the same analysis is carried out in two dimensions. We also extend results in [27] to the high-dimensional case when the point-spread function decomposes.
{"title":"Multivariate super-resolution without separation","authors":"Bakytzhan Kurmanbek, Elina Robeva","doi":"10.1093/imaiai/iaad024","DOIUrl":"https://doi.org/10.1093/imaiai/iaad024","url":null,"abstract":"Abstract In this paper, we study the high-dimensional super-resolution imaging problem. Here, we are given an image of a number of point sources of light whose locations and intensities are unknown. The image is pixelized and is blurred by a known point-spread function arising from the imaging device. We encode the unknown point sources and their intensities via a non-negative measure and we propose a convex optimization program to find it. Assuming the device’s point-spread function is componentwise decomposable, we show that the optimal solution is the true measure in the noiseless case, and it approximates the true measure well in the noisy case with respect to the generalized Wasserstein distance. Our main assumption is that the components of the point-spread function form a Tchebychev system ($T$-system) in the noiseless case and a $T^{*}$-system in the noisy case, mild conditions that are satisfied by Gaussian point-spread functions. Our work is a generalization to all dimensions of the work [14] where the same analysis is carried out in two dimensions. We also extend results in [27] to the high-dimensional case when the point-spread function decomposes.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136266723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Wang, Minshuo Chen, Tuo Zhao, Wenjing Liao, Yao Xie
Abstract Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold. We characterize the properties of proposed tests with respect to the number of samples $n$ and the structure of the manifold with intrinsic dimension $d$. When an atlas is given, we propose a two-step test to identify the difference between general distributions, which achieves the type-II risk in the order of $n^{-1/max {d,2}}$. When an atlas is not given, we propose Hölder IPM test that applies for data distributions with $(s,beta )$-Hölder densities, which achieves the type-II risk in the order of $n^{-(s+beta )/d}$. To mitigate the heavy computation burden of evaluating the Hölder IPM, we approximate the Hölder function class using neural networks. Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+beta )/d}$, which is in the same order of the type-II risk as the Hölder IPM test. Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
{"title":"A manifold two-sample test study: integral probability metric with neural networks","authors":"Jie Wang, Minshuo Chen, Tuo Zhao, Wenjing Liao, Yao Xie","doi":"10.1093/imaiai/iaad018","DOIUrl":"https://doi.org/10.1093/imaiai/iaad018","url":null,"abstract":"Abstract Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold. We characterize the properties of proposed tests with respect to the number of samples $n$ and the structure of the manifold with intrinsic dimension $d$. When an atlas is given, we propose a two-step test to identify the difference between general distributions, which achieves the type-II risk in the order of $n^{-1/max {d,2}}$. When an atlas is not given, we propose Hölder IPM test that applies for data distributions with $(s,beta )$-Hölder densities, which achieves the type-II risk in the order of $n^{-(s+beta )/d}$. To mitigate the heavy computation burden of evaluating the Hölder IPM, we approximate the Hölder function class using neural networks. Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+beta )/d}$, which is in the same order of the type-II risk as the Hölder IPM test. Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136266917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract An increasing number of data science and machine learning problems rely on computation with tensors, which better capture the multi-way relationships and interactions of data than matrices. When tapping into this critical advantage, a key challenge is to develop computationally efficient and provably correct algorithms for extracting useful information from tensor data that are simultaneously robust to corruptions and ill-conditioning. This paper tackles tensor robust principal component analysis (RPCA), which aims to recover a low-rank tensor from its observations contaminated by sparse corruptions, under the Tucker decomposition. To minimize the computation and memory footprints, we propose to directly recover the low-dimensional tensor factors—starting from a tailored spectral initialization—via scaled gradient descent (ScaledGD), coupled with an iteration-varying thresholding operation to adaptively remove the impact of corruptions. Theoretically, we establish that the proposed algorithm converges linearly to the true low-rank tensor at a constant rate that is independent with its condition number, as long as the level of corruptions is not too large. Empirically, we demonstrate that the proposed algorithm achieves better and more scalable performance than state-of-the-art tensor RPCA algorithms through synthetic experiments and real-world applications.
{"title":"Fast and provable tensor robust principal component analysis via scaled gradient descent","authors":"Harry Dong, Tian Tong, Cong Ma, Yuejie Chi","doi":"10.1093/imaiai/iaad019","DOIUrl":"https://doi.org/10.1093/imaiai/iaad019","url":null,"abstract":"Abstract An increasing number of data science and machine learning problems rely on computation with tensors, which better capture the multi-way relationships and interactions of data than matrices. When tapping into this critical advantage, a key challenge is to develop computationally efficient and provably correct algorithms for extracting useful information from tensor data that are simultaneously robust to corruptions and ill-conditioning. This paper tackles tensor robust principal component analysis (RPCA), which aims to recover a low-rank tensor from its observations contaminated by sparse corruptions, under the Tucker decomposition. To minimize the computation and memory footprints, we propose to directly recover the low-dimensional tensor factors—starting from a tailored spectral initialization—via scaled gradient descent (ScaledGD), coupled with an iteration-varying thresholding operation to adaptively remove the impact of corruptions. Theoretically, we establish that the proposed algorithm converges linearly to the true low-rank tensor at a constant rate that is independent with its condition number, as long as the level of corruptions is not too large. Empirically, we demonstrate that the proposed algorithm achieves better and more scalable performance than state-of-the-art tensor RPCA algorithms through synthetic experiments and real-world applications.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136267080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g. manifolds and fractal sets. Manifold structures are prevalent in data science, e.g. in compressed sensing, machine learning, image processing and handwritten digit recognition. Fractal sets find application in image compression and in the modeling of Ethernet traffic. Our main contributions are bounds on the rate-distortion function and the quantization error. These bounds are very general and essentially only require the existence of reference measures satisfying certain regularity conditions in terms of small ball probabilities. To illustrate the wide applicability of our results, we particularize them to random variables taking values in (i) manifolds, namely, hyperspheres and Grassmannians and (ii) self-similar sets characterized by iterated function systems satisfying the weak separation property.
{"title":"Lossy compression of general random variables","authors":"Erwin Riegler, Helmut Bölcskei, Günther Koliander","doi":"10.1093/imaiai/iaac035","DOIUrl":"https://doi.org/10.1093/imaiai/iaac035","url":null,"abstract":"Abstract This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g. manifolds and fractal sets. Manifold structures are prevalent in data science, e.g. in compressed sensing, machine learning, image processing and handwritten digit recognition. Fractal sets find application in image compression and in the modeling of Ethernet traffic. Our main contributions are bounds on the rate-distortion function and the quantization error. These bounds are very general and essentially only require the existence of reference measures satisfying certain regularity conditions in terms of small ball probabilities. To illustrate the wide applicability of our results, we particularize them to random variables taking values in (i) manifolds, namely, hyperspheres and Grassmannians and (ii) self-similar sets characterized by iterated function systems satisfying the weak separation property.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136085522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}