{"title":"Convergence of a Constrained Vector Extrapolation Scheme","authors":"Mathieu Barré, Adrien B. Taylor, A. d’Aspremont","doi":"10.1137/21m1428030","DOIUrl":"https://doi.org/10.1137/21m1428030","url":null,"abstract":"","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"22 1","pages":"979-1002"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72852432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Archimbaud, Z. Drmač, K. Nordhausen, Una Radojicic, A. Ruiz-Gazen
Invariant Coordinate Selection (ICS) is a multivariate data transformation and a dimension reduction method that can be useful in many different contexts. It can be used for outlier detection or cluster identification, and can be seen as an independent component or a non-Gaussian component analysis method. The usual implementation of ICS is based on a joint diagonalization of two scatter matrices, and may be numerically unstable in some ill-conditioned situations. We focus on one-step M-scatter matrices and propose a new implementation of ICS based on a pivoted QR factorization of the centered data set. This factorization avoids the direct computation of the scatter matrices and their inverse and brings numerical stability to the algorithm. Furthermore, the row and column pivoting leads to a rank revealing procedure that allows computation of ICS when the scatter matrices are not full rank. Several artificial and real data sets illustrate the interest of using the new implementation compared to the original one.
{"title":"Numerical Considerations and a New Implementation for Invariant Coordinate Selection","authors":"A. Archimbaud, Z. Drmač, K. Nordhausen, Una Radojicic, A. Ruiz-Gazen","doi":"10.1137/22M1498759","DOIUrl":"https://doi.org/10.1137/22M1498759","url":null,"abstract":"Invariant Coordinate Selection (ICS) is a multivariate data transformation and a dimension reduction method that can be useful in many different contexts. It can be used for outlier detection or cluster identification, and can be seen as an independent component or a non-Gaussian component analysis method. The usual implementation of ICS is based on a joint diagonalization of two scatter matrices, and may be numerically unstable in some ill-conditioned situations. We focus on one-step M-scatter matrices and propose a new implementation of ICS based on a pivoted QR factorization of the centered data set. This factorization avoids the direct computation of the scatter matrices and their inverse and brings numerical stability to the algorithm. Furthermore, the row and column pivoting leads to a rank revealing procedure that allows computation of ICS when the scatter matrices are not full rank. Several artificial and real data sets illustrate the interest of using the new implementation compared to the original one.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44460331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongwei Tan, Subhadip Mukherjee, Junqi Tang, C. Schonlieb
Learning-to-optimize is an emerging framework that seeks to speed up the solution of certain optimization problems by leveraging training data. Learned optimization solvers have been shown to outperform classical optimization algorithms in terms of convergence speed, especially for convex problems. Many existing data-driven optimization methods are based on parameterizing the update step and learning the optimal parameters (typically scalars) from the available data. We propose a novel functional parameterization approach for learned convex optimization solvers based on the classical mirror descent (MD) algorithm. Specifically, we seek to learn the optimal Bregman distance in MD by modeling the underlying convex function using an input-convex neural network (ICNN). The parameters of the ICNN are learned by minimizing the target objective function evaluated at the MD iterate after a predetermined number of iterations. The inverse of the mirror map is modeled approximately using another neural network, as the exact inverse is intractable to compute. We derive convergence rate bounds for the proposed learned mirror descent (LMD) approach with an approximate inverse mirror map and perform extensive numerical evaluation on various convex problems such as image inpainting, denoising, learning a two-class support vector machine (SVM) classifier and a multi-class linear classifier on fixed features.
{"title":"Data-Driven Mirror Descent with Input-Convex Neural Networks","authors":"Hongwei Tan, Subhadip Mukherjee, Junqi Tang, C. Schonlieb","doi":"10.1137/22m1508613","DOIUrl":"https://doi.org/10.1137/22m1508613","url":null,"abstract":"Learning-to-optimize is an emerging framework that seeks to speed up the solution of certain optimization problems by leveraging training data. Learned optimization solvers have been shown to outperform classical optimization algorithms in terms of convergence speed, especially for convex problems. Many existing data-driven optimization methods are based on parameterizing the update step and learning the optimal parameters (typically scalars) from the available data. We propose a novel functional parameterization approach for learned convex optimization solvers based on the classical mirror descent (MD) algorithm. Specifically, we seek to learn the optimal Bregman distance in MD by modeling the underlying convex function using an input-convex neural network (ICNN). The parameters of the ICNN are learned by minimizing the target objective function evaluated at the MD iterate after a predetermined number of iterations. The inverse of the mirror map is modeled approximately using another neural network, as the exact inverse is intractable to compute. We derive convergence rate bounds for the proposed learned mirror descent (LMD) approach with an approximate inverse mirror map and perform extensive numerical evaluation on various convex problems such as image inpainting, denoising, learning a two-class support vector machine (SVM) classifier and a multi-class linear classifier on fixed features.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47238537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
. In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Cram´er distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state-action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.
{"title":"Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis","authors":"Markus Böck, C. Heitzinger","doi":"10.1137/20m1364436","DOIUrl":"https://doi.org/10.1137/20m1364436","url":null,"abstract":". In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Cram´er distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state-action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"11 1","pages":"675-693"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86834086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Howard Heaton, Samy Wu Fung, A. Lin, S. Osher, W. Yin
. Inverse problems consist of recovering a signal from a collection of noisy measurements. These are typically cast as optimization problems, with classic approaches using a data fidelity term and an analytic regularizer that stabilizes recovery. Recent Plug-and-Play (PnP) works propose replacing the operator for analytic regularization in optimization methods by a data-driven denoiser. These schemes obtain state of the art results, but at the cost of limited theoretical guarantees. To bridge this gap, we present a new algorithm that takes samples from the manifold of true data as input and outputs an approximation of the projection operator onto this manifold. Under standard assumptions, we prove this algorithm generates a learned operator, called Wasserstein-based projection (WP), that approximates the true projection with high probability. Thus, WPs can be inserted into optimization methods in the same manner as PnP, but now with theoretical guarantees. Provided numerical examples show WPs obtain state of the art results for unsupervised PnP signal recovery. 1
{"title":"Wasserstein-Based Projections with Applications to Inverse Problems","authors":"Howard Heaton, Samy Wu Fung, A. Lin, S. Osher, W. Yin","doi":"10.1137/20m1376790","DOIUrl":"https://doi.org/10.1137/20m1376790","url":null,"abstract":". Inverse problems consist of recovering a signal from a collection of noisy measurements. These are typically cast as optimization problems, with classic approaches using a data fidelity term and an analytic regularizer that stabilizes recovery. Recent Plug-and-Play (PnP) works propose replacing the operator for analytic regularization in optimization methods by a data-driven denoiser. These schemes obtain state of the art results, but at the cost of limited theoretical guarantees. To bridge this gap, we present a new algorithm that takes samples from the manifold of true data as input and outputs an approximation of the projection operator onto this manifold. Under standard assumptions, we prove this algorithm generates a learned operator, called Wasserstein-based projection (WP), that approximates the true projection with high probability. Thus, WPs can be inserted into optimization methods in the same manner as PnP, but now with theoretical guarantees. Provided numerical examples show WPs obtain state of the art results for unsupervised PnP signal recovery. 1","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"9 1","pages":"581-603"},"PeriodicalIF":0.0,"publicationDate":"2022-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75034311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-27DOI: 10.48550/arXiv.2204.13586
Philip S. Chodrow, Nicole Eikmeier, Jamie Haddock
Spectral methods offer a tractable, global framework for clustering in graphs via eigenvector computations on graph matrices. Hypergraph data, in which entities interact on edges of arbitrary size, poses challenges for matrix representations and therefore for spectral clustering. We study spectral clustering for nonuniform hypergraphs based on the hypergraph nonbacktracking operator. After reviewing the definition of this operator and its basic properties, we prove a theorem of Ihara-Bass type which allows eigenpair computations to take place on a smaller matrix, often enabling faster computation. We then propose an alternating algorithm for inference in a hypergraph stochastic blockmodel via linearized belief-propagation which involves a spectral clustering step again using nonbacktracking operators. We provide proofs related to this algorithm that both formalize and extend several previous results. We pose several conjectures about the limits of spectral methods and detectability in hypergraph stochastic blockmodels in general, supporting these with in-expectation analysis of the eigeinpairs of our studied operators. We perform experiments in real and synthetic data that demonstrate the benefits of hypergraph methods over graph-based ones when interactions of different sizes carry different information about cluster structure.
{"title":"Nonbacktracking spectral clustering of nonuniform hypergraphs","authors":"Philip S. Chodrow, Nicole Eikmeier, Jamie Haddock","doi":"10.48550/arXiv.2204.13586","DOIUrl":"https://doi.org/10.48550/arXiv.2204.13586","url":null,"abstract":"Spectral methods offer a tractable, global framework for clustering in graphs via eigenvector computations on graph matrices. Hypergraph data, in which entities interact on edges of arbitrary size, poses challenges for matrix representations and therefore for spectral clustering. We study spectral clustering for nonuniform hypergraphs based on the hypergraph nonbacktracking operator. After reviewing the definition of this operator and its basic properties, we prove a theorem of Ihara-Bass type which allows eigenpair computations to take place on a smaller matrix, often enabling faster computation. We then propose an alternating algorithm for inference in a hypergraph stochastic blockmodel via linearized belief-propagation which involves a spectral clustering step again using nonbacktracking operators. We provide proofs related to this algorithm that both formalize and extend several previous results. We pose several conjectures about the limits of spectral methods and detectability in hypergraph stochastic blockmodels in general, supporting these with in-expectation analysis of the eigeinpairs of our studied operators. We perform experiments in real and synthetic data that demonstrate the benefits of hypergraph methods over graph-based ones when interactions of different sizes carry different information about cluster structure.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"11 1","pages":"251-279"},"PeriodicalIF":0.0,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81460300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.48550/arXiv.2204.09105
E. Barrio, Alberto González-Sanz, Jean-Michel Loubes, Jonathan Niles-Weed
We prove a central limit theorem for the entropic transportation cost between subgaussian probability measures, centered at the population cost. This is the first result which allows for asymptotically valid inference for entropic optimal transport between measures which are not necessarily discrete. In the compactly supported case, we complement these results with new, faster, convergence rates for the expected entropic transportation cost between empirical measures. Our proof is based on strengthening convergence results for dual solutions to the entropic optimal transport problem.
{"title":"An improved central limit theorem and fast convergence rates for entropic transportation costs","authors":"E. Barrio, Alberto González-Sanz, Jean-Michel Loubes, Jonathan Niles-Weed","doi":"10.48550/arXiv.2204.09105","DOIUrl":"https://doi.org/10.48550/arXiv.2204.09105","url":null,"abstract":"We prove a central limit theorem for the entropic transportation cost between subgaussian probability measures, centered at the population cost. This is the first result which allows for asymptotically valid inference for entropic optimal transport between measures which are not necessarily discrete. In the compactly supported case, we complement these results with new, faster, convergence rates for the expected entropic transportation cost between empirical measures. Our proof is based on strengthening convergence results for dual solutions to the entropic optimal transport problem.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"4 1","pages":"639-669"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87397189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we consider a certain convolutional Laplacian for metric measure spaces and investigate its potential for the statistical analysis of complex objects. The spectrum of that Laplacian serves as a signature of the space under consideration and the eigenvectors provide the principal directions of the shape, its harmonics. These concepts are used to assess the similarity of objects or understand their most important features in a principled way which is illustrated in various examples. Adopting a statistical point of view, we define a mean spectral measure and its empirical counterpart. The corresponding limiting process of interest is derived and statistical applications are discussed.
{"title":"Statistical Analysis of Random Objects Via Metric Measure Laplacians","authors":"Gilles Mordant, A. Munk","doi":"10.1137/22m1491022","DOIUrl":"https://doi.org/10.1137/22m1491022","url":null,"abstract":"In this paper, we consider a certain convolutional Laplacian for metric measure spaces and investigate its potential for the statistical analysis of complex objects. The spectrum of that Laplacian serves as a signature of the space under consideration and the eigenvectors provide the principal directions of the shape, its harmonics. These concepts are used to assess the similarity of objects or understand their most important features in a principled way which is illustrated in various examples. Adopting a statistical point of view, we define a mean spectral measure and its empirical counterpart. The corresponding limiting process of interest is derived and statistical applications are discussed.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47214825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-13DOI: 10.48550/arXiv.2204.06233
Sebastian Neumayer, Alexis Goujon, Pakshal Bohra, M. Unser
Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation functions with at least 3 linear regions instead. We prove that this choice is optimal among all component-wise $1$-Lipschitz activation functions in the sense that no other weight constrained architecture can approximate a larger class of functions. Additionally, this choice is at least as expressive as the recently introduced non component-wise Groupsort activation function for spectral-norm-constrained weights. Previously published numerical results support our theoretical findings.
{"title":"Approximation of Lipschitz Functions using Deep Spline Neural Networks","authors":"Sebastian Neumayer, Alexis Goujon, Pakshal Bohra, M. Unser","doi":"10.48550/arXiv.2204.06233","DOIUrl":"https://doi.org/10.48550/arXiv.2204.06233","url":null,"abstract":"Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation functions with at least 3 linear regions instead. We prove that this choice is optimal among all component-wise $1$-Lipschitz activation functions in the sense that no other weight constrained architecture can approximate a larger class of functions. Additionally, this choice is at least as expressive as the recently introduced non component-wise Groupsort activation function for spectral-norm-constrained weights. Previously published numerical results support our theoretical findings.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"72 1","pages":"306-322"},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86873781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
. We describe a simple iterative solution to a widely recurring problem in multivariate data analysis: given a sparse nonnegative matrix X , how to estimate a low-rank matrix Θ such that X ≈ f ( Θ ), where f is an elementwise nonlinearity? We develop a latent variable model for this problem and consider those sparsifying nonlinearities, popular in neural networks, that map all negative values to zero. The model seeks to explain the variability of sparse high-dimensional data in terms of a smaller number of degrees of freedom. We show that exact inference in this model is tractable and derive an expectation-maximization (EM) algorithm to estimate the low-rank matrix Θ . Notably, we do not parameterize Θ as a product of smaller matrices to be alternately optimized; instead, we estimate Θ directly via the singular value decomposition of matrices that are repeatedly inferred (at each iteration of the EM algorithm) from the model’s posterior distribution. We use the model to analyze large sparse matrices that arise from data sets of binary, grayscale, and color images. In all of these cases, we find that the model discovers much lower-rank decompositions than purely linear approaches.
{"title":"A Nonlinear Matrix Decomposition for Mining the Zeros of Sparse Data","authors":"L. Saul","doi":"10.1137/21m1405769","DOIUrl":"https://doi.org/10.1137/21m1405769","url":null,"abstract":". We describe a simple iterative solution to a widely recurring problem in multivariate data analysis: given a sparse nonnegative matrix X , how to estimate a low-rank matrix Θ such that X ≈ f ( Θ ), where f is an elementwise nonlinearity? We develop a latent variable model for this problem and consider those sparsifying nonlinearities, popular in neural networks, that map all negative values to zero. The model seeks to explain the variability of sparse high-dimensional data in terms of a smaller number of degrees of freedom. We show that exact inference in this model is tractable and derive an expectation-maximization (EM) algorithm to estimate the low-rank matrix Θ . Notably, we do not parameterize Θ as a product of smaller matrices to be alternately optimized; instead, we estimate Θ directly via the singular value decomposition of matrices that are repeatedly inferred (at each iteration of the EM algorithm) from the model’s posterior distribution. We use the model to analyze large sparse matrices that arise from data sets of binary, grayscale, and color images. In all of these cases, we find that the model discovers much lower-rank decompositions than purely linear approaches.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"147 7 1","pages":"431-463"},"PeriodicalIF":0.0,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83112621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}