Fast transforms correspond to factorizations of the form $mathbf{Z} = mathbf{X}^{(1)} ldots mathbf{X}^{(J)}$, where each factor $ mathbf{X}^{(ell)}$ is sparse and possibly structured. This paper investigates essential uniqueness of such factorizations, i.e., uniqueness up to unavoidable scaling ambiguities. Our main contribution is to prove that any $N times N$ matrix having the so-called butterfly structure admits an essentially unique factorization into $J$ butterfly factors (where $N = 2^{J}$), and that the factors can be recovered by a hierarchical factorization method, which consists in recursively factorizing the considered matrix into two factors. This hierarchical identifiability property relies on a simple identifiability condition in the two-layer and fixed-support setting. This approach contrasts with existing ones that fit the product of butterfly factors to a given matrix via gradient descent. The proposed method can be applied in particular to retrieve the factorization of the Hadamard or the discrete Fourier transform matrices of size $N=2^J$. Computing such factorizations costs $mathcal{O}(N^{2})$, which is of the order of dense matrix-vector multiplication, while the obtained factorizations enable fast $mathcal{O}(N log N)$ matrix-vector multiplications and have the potential to be applied to compress deep neural networks.
{"title":"Efficient Identification of Butterfly Sparse Matrix Factorizations","authors":"Léon Zheng, E. Riccietti, R. Gribonval","doi":"10.1137/22m1488727","DOIUrl":"https://doi.org/10.1137/22m1488727","url":null,"abstract":"Fast transforms correspond to factorizations of the form $mathbf{Z} = mathbf{X}^{(1)} ldots mathbf{X}^{(J)}$, where each factor $ mathbf{X}^{(ell)}$ is sparse and possibly structured. This paper investigates essential uniqueness of such factorizations, i.e., uniqueness up to unavoidable scaling ambiguities. Our main contribution is to prove that any $N times N$ matrix having the so-called butterfly structure admits an essentially unique factorization into $J$ butterfly factors (where $N = 2^{J}$), and that the factors can be recovered by a hierarchical factorization method, which consists in recursively factorizing the considered matrix into two factors. This hierarchical identifiability property relies on a simple identifiability condition in the two-layer and fixed-support setting. This approach contrasts with existing ones that fit the product of butterfly factors to a given matrix via gradient descent. The proposed method can be applied in particular to retrieve the factorization of the Hadamard or the discrete Fourier transform matrices of size $N=2^J$. Computing such factorizations costs $mathcal{O}(N^{2})$, which is of the order of dense matrix-vector multiplication, while the obtained factorizations enable fast $mathcal{O}(N log N)$ matrix-vector multiplications and have the potential to be applied to compress deep neural networks.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"50 1","pages":"22-49"},"PeriodicalIF":0.0,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75817474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emerging applications in multi-agent environments such as internet-of-things, networked sensing, autonomous systems and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource-efficient in terms of both computation and communication. In this paper, we consider the prototypical setting where the agents work collaboratively to minimize the sum of local loss functions by only communicating with their neighbors over a predetermined network topology. We develop a new algorithm, called DEcentralized STochastic REcurSive gradient methodS (DESTRESS) for nonconvex finite-sum optimization, which matches the optimal incremental first-order oracle (IFO) complexity of centralized algorithms for finding first-order stationary points, while maintaining communication efficiency. Detailed theoretical and numerical comparisons corroborate that the resource efficiencies of DESTRESS improve upon prior decentralized algorithms over a wide range of parameter regimes. DESTRESS leverages several key algorithm design ideas including randomly activated stochastic recursive gradient updates with mini-batches for local computation, gradient tracking with extra mixing (i.e., multiple gossiping rounds) for per-iteration communication, together with careful choices of hyper-parameters and new analysis frameworks to provably achieve a desirable computation-communication trade-off.
{"title":"DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization","authors":"Boyue Li, Zhize Li, Yuejie Chi","doi":"10.1137/21m1450677","DOIUrl":"https://doi.org/10.1137/21m1450677","url":null,"abstract":"Emerging applications in multi-agent environments such as internet-of-things, networked sensing, autonomous systems and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource-efficient in terms of both computation and communication. In this paper, we consider the prototypical setting where the agents work collaboratively to minimize the sum of local loss functions by only communicating with their neighbors over a predetermined network topology. We develop a new algorithm, called DEcentralized STochastic REcurSive gradient methodS (DESTRESS) for nonconvex finite-sum optimization, which matches the optimal incremental first-order oracle (IFO) complexity of centralized algorithms for finding first-order stationary points, while maintaining communication efficiency. Detailed theoretical and numerical comparisons corroborate that the resource efficiencies of DESTRESS improve upon prior decentralized algorithms over a wide range of parameter regimes. DESTRESS leverages several key algorithm design ideas including randomly activated stochastic recursive gradient updates with mini-batches for local computation, gradient tracking with extra mixing (i.e., multiple gossiping rounds) for per-iteration communication, together with careful choices of hyper-parameters and new analysis frameworks to provably achieve a desirable computation-communication trade-off.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"6 1","pages":"1031-1051"},"PeriodicalIF":0.0,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84079619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
. Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.
{"title":"Local versions of sum-of-norms clustering","authors":"Alexander Dunlap, J. Mourrat","doi":"10.1137/21m1448732","DOIUrl":"https://doi.org/10.1137/21m1448732","url":null,"abstract":". Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"11 1","pages":"1250-1271"},"PeriodicalIF":0.0,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90784914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper establishes a strong correspondence between two important clustering approaches that emerged in the 1970's: clustering by level sets or cluster tree as proposed by Hartigan and clustering by gradient lines or gradient flow as proposed by Fukunaga and Hostetler. We do so by showing that we can move up the cluster tree by following the gradient ascent flow.
{"title":"Moving Up the Cluster Tree with the Gradient Flow","authors":"E. Arias-Castro, Wanli Qiao","doi":"10.1137/22m1469869","DOIUrl":"https://doi.org/10.1137/22m1469869","url":null,"abstract":"The paper establishes a strong correspondence between two important clustering approaches that emerged in the 1970's: clustering by level sets or cluster tree as proposed by Hartigan and clustering by gradient lines or gradient flow as proposed by Fukunaga and Hostetler. We do so by showing that we can move up the cluster tree by following the gradient ascent flow.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48287290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop a method for analyzing spatial and spatiotemporal anomalies in geospatial data using topological data analysis (TDA). To do this, we use persistent homology (PH), which allows one to algorithmically detect geometric voids in a data set and quantify the persistence of such voids. We construct an efficient filtered simplicial complex (FSC) such that the voids in our FSC are in one-to-one correspondence with the anomalies. Our approach goes beyond simply identifying anomalies;it also encodes information about the relationships between anomalies. We use vineyards, which one can interpret as time-varying persistence diagrams (which are an approach for visualizing PH), to track how the locations of the anomalies change with time. We conduct two case studies using spatially heterogeneous COVID-19 data. First, we examine vaccination rates in New York City by zip code at a single point in time. Second, we study a year-long data set of COVID-19 case rates in neighborhoods of the city of Los Angeles.
{"title":"Analysis of Spatial and Spatiotemporal Anomalies Using Persistent Homology: Case Studies with COVID-19 Data","authors":"Abigail Hickok, D. Needell, M. A. Porter","doi":"10.1137/21m1435033","DOIUrl":"https://doi.org/10.1137/21m1435033","url":null,"abstract":"We develop a method for analyzing spatial and spatiotemporal anomalies in geospatial data using topological data analysis (TDA). To do this, we use persistent homology (PH), which allows one to algorithmically detect geometric voids in a data set and quantify the persistence of such voids. We construct an efficient filtered simplicial complex (FSC) such that the voids in our FSC are in one-to-one correspondence with the anomalies. Our approach goes beyond simply identifying anomalies;it also encodes information about the relationships between anomalies. We use vineyards, which one can interpret as time-varying persistence diagrams (which are an approach for visualizing PH), to track how the locations of the anomalies change with time. We conduct two case studies using spatially heterogeneous COVID-19 data. First, we examine vaccination rates in New York City by zip code at a single point in time. Second, we study a year-long data set of COVID-19 case rates in neighborhoods of the city of Los Angeles.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46879222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We prove minimax optimal learning rates for kernel ridge regression, resp. support vector machines based on a data dependent partition of the input space, where the dependence of the dimension of the input space is replaced by the fractal dimension of the support of the data generating distribution. We further show that these optimal rates can be achieved by a training validation procedure without any prior knowledge on this intrinsic dimension of the data. Finally, we conduct extensive experiments which demonstrate that our considered learning methods are actually able to generalize from a dataset that is non-trivially embedded in a much higher dimensional space just as well as from the original dataset.
{"title":"Intrinsic Dimension Adaptive Partitioning for Kernel Methods","authors":"Thomas Hamm, Ingo Steinwart","doi":"10.1137/21m1435690","DOIUrl":"https://doi.org/10.1137/21m1435690","url":null,"abstract":"We prove minimax optimal learning rates for kernel ridge regression, resp. support vector machines based on a data dependent partition of the input space, where the dependence of the dimension of the input space is replaced by the fractal dimension of the support of the data generating distribution. We further show that these optimal rates can be achieved by a training validation procedure without any prior knowledge on this intrinsic dimension of the data. Finally, we conduct extensive experiments which demonstrate that our considered learning methods are actually able to generalize from a dataset that is non-trivially embedded in a much higher dimensional space just as well as from the original dataset.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45933232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Hien, D. Phan, Nicolas Gillis, Masoud Ahookhosh, Panagiotis Patrinos
In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$-smooth functions have been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman Majorization-Minimization framework with Extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions, and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.
{"title":"Block Alternating Bregman Majorization Minimization with Extrapolation","authors":"L. Hien, D. Phan, Nicolas Gillis, Masoud Ahookhosh, Panagiotis Patrinos","doi":"10.1137/21M1432661","DOIUrl":"https://doi.org/10.1137/21M1432661","url":null,"abstract":"In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$-smooth functions have been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman Majorization-Minimization framework with Extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions, and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"33 1","pages":"1-25"},"PeriodicalIF":0.0,"publicationDate":"2021-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85062665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a generalized CUR (GCUR) decomposition for matrix pairs (A,B). Given matrices A and B with the same number of columns, such a decomposition provides low-rank approximations of both matrices simultaneously, in terms of some of their rows and columns. We obtain the indices for selecting the subset of rows and columns of the original matrices using the discrete empirical interpolation method (DEIM) on the generalized singular vectors. When B is square and nonsingular, there are close connections between the GCUR of (A,B) and the DEIM-induced CUR of AB−1. When B is the identity, the GCUR decomposition of A coincides with the DEIM-induced CUR decomposition of A. We also show similar connection between the GCUR of (A,B) and the CUR of AB for a nonsquare but full-rank matrix B, where B denotes the Moore–Penrose pseudoinverse of B. While a CUR decomposition acts on one data set, a GCUR factorization jointly decomposes two data sets. The algorithm may be suitable for applications where one is interested in extracting the most discriminative features from one data set relative to another data set. In numerical experiments, we demonstrate the advantages of the new method over the standard CUR approximation; for recovering data perturbed with colored noise and subgroup discovery.
{"title":"A Generalized CUR decomposition for matrix pairs","authors":"Perfect Y. Gidisu, M. Hochstenbach","doi":"10.1137/21m1432119","DOIUrl":"https://doi.org/10.1137/21m1432119","url":null,"abstract":"We propose a generalized CUR (GCUR) decomposition for matrix pairs (A,B). Given matrices A and B with the same number of columns, such a decomposition provides low-rank approximations of both matrices simultaneously, in terms of some of their rows and columns. We obtain the indices for selecting the subset of rows and columns of the original matrices using the discrete empirical interpolation method (DEIM) on the generalized singular vectors. When B is square and nonsingular, there are close connections between the GCUR of (A,B) and the DEIM-induced CUR of AB−1. When B is the identity, the GCUR decomposition of A coincides with the DEIM-induced CUR decomposition of A. We also show similar connection between the GCUR of (A,B) and the CUR of AB for a nonsquare but full-rank matrix B, where B denotes the Moore–Penrose pseudoinverse of B. While a CUR decomposition acts on one data set, a GCUR factorization jointly decomposes two data sets. The algorithm may be suitable for applications where one is interested in extracting the most discriminative features from one data set relative to another data set. In numerical experiments, we demonstrate the advantages of the new method over the standard CUR approximation; for recovering data perturbed with colored noise and subgroup discovery.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"40 1","pages":"386-409"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81598226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper is concerned with the development, analysis and numerical realization of a novel variational model for the regularization of inverse problems in imaging. The proposed model is inspired by the architecture of generative convolutional neural networks; it aims to generate the unknown from variables in a latent space via multi-layer convolutions and non-linear penalties, and penalizes an associated cost. In contrast to conventional neural-network-based approaches, however, the convolution kernels are learned directly from the measured data such that no training is required. The present work provides a mathematical analysis of the proposed model in a function space setting, including proofs for regularity and existence/stability of solutions, and convergence for vanishing noise. Moreover, in a discretized setting, a numerical algorithm for solving various types of inverse problems with the proposed model is derived. Numerical results are provided for applications in inpainting, denoising, deblurring under noise, super-resolution and JPEG decompression with multiple test images.
{"title":"A Generative Variational Model for Inverse Problems in Imaging","authors":"Andreas Habring, M. Holler","doi":"10.1137/21m1414978","DOIUrl":"https://doi.org/10.1137/21m1414978","url":null,"abstract":"This paper is concerned with the development, analysis and numerical realization of a novel variational model for the regularization of inverse problems in imaging. The proposed model is inspired by the architecture of generative convolutional neural networks; it aims to generate the unknown from variables in a latent space via multi-layer convolutions and non-linear penalties, and penalizes an associated cost. In contrast to conventional neural-network-based approaches, however, the convolution kernels are learned directly from the measured data such that no training is required. The present work provides a mathematical analysis of the proposed model in a function space setting, including proofs for regularity and existence/stability of solutions, and convergence for vanishing noise. Moreover, in a discretized setting, a numerical algorithm for solving various types of inverse problems with the proposed model is derived. Numerical results are provided for applications in inpainting, denoising, deblurring under noise, super-resolution and JPEG decompression with multiple test images.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"40 1","pages":"306-335"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85104625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
. In the computational sciences, one must often estimate model parameters from data subject to noise and uncertainty, leading to inaccurate results. In order to improve the accuracy of models with noisy parameters, we consider the problem of reducing error in a linear system with the operator corrupted by noise. Our contribution in this paper is to extend the elliptic operator shifting framework from Etter, Ying ’20 to the general nonsymmetric matrix case. Roughly, the operator shifting technique is a matrix analogue of the James-Stein estimator. The key insight is that a shift of the matrix inverse estimate in an appropriately chosen direction will reduce average error. In our extension, we interrogate a number of questions — namely, whether or not shifting towards the origin for general matrix inverses always reduces error as it does in the elliptic case. We show that this is usually the case, but that there are three key features of the general nonsingular matrices that allow for adversarial examples not possible in the symmetric case. We prove that when these adversarial possibilities are eliminated by the assumption of noise symmetry and the use of the residual norm as the error metric, the optimal shift is always towards the origin, mirroring results from Etter, Ying ’20. We also investigate behavior in the small noise regime and other scenarios. We conclude by presenting numerical experiments (with accompanying source code) inspired by Reinforcement Learning to demonstrate that operator shifting can yield substantial reductions in error.
{"title":"Operator Shifting for General Noisy Matrix Systems","authors":"Philip A. Etter, Lexing Ying","doi":"10.1137/21m1416849","DOIUrl":"https://doi.org/10.1137/21m1416849","url":null,"abstract":". In the computational sciences, one must often estimate model parameters from data subject to noise and uncertainty, leading to inaccurate results. In order to improve the accuracy of models with noisy parameters, we consider the problem of reducing error in a linear system with the operator corrupted by noise. Our contribution in this paper is to extend the elliptic operator shifting framework from Etter, Ying ’20 to the general nonsymmetric matrix case. Roughly, the operator shifting technique is a matrix analogue of the James-Stein estimator. The key insight is that a shift of the matrix inverse estimate in an appropriately chosen direction will reduce average error. In our extension, we interrogate a number of questions — namely, whether or not shifting towards the origin for general matrix inverses always reduces error as it does in the elliptic case. We show that this is usually the case, but that there are three key features of the general nonsingular matrices that allow for adversarial examples not possible in the symmetric case. We prove that when these adversarial possibilities are eliminated by the assumption of noise symmetry and the use of the residual norm as the error metric, the optimal shift is always towards the origin, mirroring results from Etter, Ying ’20. We also investigate behavior in the small noise regime and other scenarios. We conclude by presenting numerical experiments (with accompanying source code) inspired by Reinforcement Learning to demonstrate that operator shifting can yield substantial reductions in error.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42222215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}