Pub Date : 2021-05-01DOI: 10.1017/S0962492921000076
Lek-Heng Lim
The notion of a tensor captures three great ideas: equivariance, multilinearity, separability. But trying to be three things at once makes the notion difficult to understand. We will explain tensors in an accessible and elementary way through the lens of linear algebra and numerical linear algebra, elucidated with examples from computational and applied mathematics.
{"title":"Tensors in computations","authors":"Lek-Heng Lim","doi":"10.1017/S0962492921000076","DOIUrl":"https://doi.org/10.1017/S0962492921000076","url":null,"abstract":"The notion of a tensor captures three great ideas: equivariance, multilinearity, separability. But trying to be three things at once makes the notion difficult to understand. We will explain tensors in an accessible and elementary way through the lens of linear algebra and numerical linear algebra, elucidated with examples from computational and applied mathematics.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"30 1","pages":"555 - 764"},"PeriodicalIF":14.2,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44930203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1017/S0962492921000064
O. Ghattas, K. Willcox
This article addresses the inference of physics models from data, from the perspectives of inverse problems and model reduction. These fields develop formulations that integrate data into physics-based models while exploiting the fact that many mathematical models of natural and engineered systems exhibit an intrinsically low-dimensional solution manifold. In inverse problems, we seek to infer uncertain components of the inputs from observations of the outputs, while in model reduction we seek low-dimensional models that explicitly capture the salient features of the input–output map through approximation in a low-dimensional subspace. In both cases, the result is a predictive model that reflects data-driven learning yet deeply embeds the underlying physics, and thus can be used for design, control and decision-making, often with quantified uncertainties. We highlight recent developments in scalable and efficient algorithms for inverse problems and model reduction governed by large-scale models in the form of partial differential equations. Several illustrative applications to large-scale complex problems across different domains of science and engineering are provided.
{"title":"Learning physics-based models from data: perspectives from inverse problems and model reduction","authors":"O. Ghattas, K. Willcox","doi":"10.1017/S0962492921000064","DOIUrl":"https://doi.org/10.1017/S0962492921000064","url":null,"abstract":"This article addresses the inference of physics models from data, from the perspectives of inverse problems and model reduction. These fields develop formulations that integrate data into physics-based models while exploiting the fact that many mathematical models of natural and engineered systems exhibit an intrinsically low-dimensional solution manifold. In inverse problems, we seek to infer uncertain components of the inputs from observations of the outputs, while in model reduction we seek low-dimensional models that explicitly capture the salient features of the input–output map through approximation in a low-dimensional subspace. In both cases, the result is a predictive model that reflects data-driven learning yet deeply embeds the underlying physics, and thus can be used for design, control and decision-making, often with quantified uncertainties. We highlight recent developments in scalable and efficient algorithms for inverse problems and model reduction governed by large-scale models in the form of partial differential equations. Several illustrative applications to large-scale complex problems across different domains of science and engineering are provided.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"30 1","pages":"445 - 554"},"PeriodicalIF":14.2,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47260397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1017/S0962492921000039
M. Belkin
In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parametrization enables interpolation and provides flexibility to select a suitable interpolating model. As we will see, just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning. This article is written in the belief and hope that clearer understanding of these issues will bring us a step closer towards a general theory of deep learning and machine learning.
{"title":"Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation","authors":"M. Belkin","doi":"10.1017/S0962492921000039","DOIUrl":"https://doi.org/10.1017/S0962492921000039","url":null,"abstract":"In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parametrization enables interpolation and provides flexibility to select a suitable interpolating model. As we will see, just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning. This article is written in the belief and hope that clearer understanding of these issues will bring us a step closer towards a general theory of deep learning and machine learning.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"30 1","pages":"203 - 248"},"PeriodicalIF":14.2,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46793783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-06DOI: 10.1017/S0962492921000088
Wen Wang, Lei Zhang, Pingwen Zhang
Liquid crystals are a type of soft matter that is intermediate between crystalline solids and isotropic fluids. The study of liquid crystals has made tremendous progress over the past four decades, which is of great importance for fundamental scientific research and has widespread applications in industry. In this paper we review the mathematical models and their connections to liquid crystals, and survey the developments of numerical methods for finding rich configurations of liquid crystals.
{"title":"Modelling and computation of liquid crystals","authors":"Wen Wang, Lei Zhang, Pingwen Zhang","doi":"10.1017/S0962492921000088","DOIUrl":"https://doi.org/10.1017/S0962492921000088","url":null,"abstract":"Liquid crystals are a type of soft matter that is intermediate between crystalline solids and isotropic fluids. The study of liquid crystals has made tremendous progress over the past four decades, which is of great importance for fundamental scientific research and has widespread applications in industry. In this paper we review the mathematical models and their connections to liquid crystals, and survey the developments of numerical methods for finding rich configurations of liquid crystals.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"30 1","pages":"765 - 851"},"PeriodicalIF":14.2,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49450682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-16DOI: 10.1017/S0962492921000027
P. Bartlett, A. Montanari, A. Rakhlin
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
{"title":"Deep learning: a statistical viewpoint","authors":"P. Bartlett, A. Montanari, A. Rakhlin","doi":"10.1017/S0962492921000027","DOIUrl":"https://doi.org/10.1017/S0962492921000027","url":null,"abstract":"The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"30 1","pages":"87 - 201"},"PeriodicalIF":14.2,"publicationDate":"2021-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/S0962492921000027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41948779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-28DOI: 10.1017/S0962492921000052
R. DeVore, B. Hanin, G. Petrova
Neural networks (NNs) are the method of choice for building learning algorithms. They are now being investigated for other numerical tasks such as solving high-dimensional partial differential equations. Their popularity stems from their empirical success on several challenging learning problems (computer chess/Go, autonomous navigation, face recognition). However, most scholars agree that a convincing theoretical explanation for this success is still lacking. Since these applications revolve around approximating an unknown function from data observations, part of the answer must involve the ability of NNs to produce accurate approximations. This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properties that are not present in the more traditional methods of approximation used in numerical analysis, such as approximations using polynomials, wavelets, rational functions and splines. Comparisons are made with traditional approximation methods from the viewpoint of rate distortion, i.e. error versus the number of parameters used to create the approximant. Another major component in the analysis of numerical approximation is the computational time needed to construct the approximation, and this in turn is intimately connected with the stability of the approximation algorithm. So the stability of numerical approximation using NNs is a large part of the analysis put forward. The survey, for the most part, is concerned with NNs using the popular ReLU activation function. In this case the outputs of the NNs are piecewise linear functions on rather complicated partitions of the domain of f into cells that are convex polytopes. When the architecture of the NN is fixed and the parameters are allowed to vary, the set of output functions of the NN is a parametrized nonlinear manifold. It is shown that this manifold has certain space-filling properties leading to an increased ability to approximate (better rate distortion) but at the expense of numerical stability. The space filling creates the challenge to the numerical method of finding best or good parameter choices when trying to approximate.
{"title":"Neural network approximation","authors":"R. DeVore, B. Hanin, G. Petrova","doi":"10.1017/S0962492921000052","DOIUrl":"https://doi.org/10.1017/S0962492921000052","url":null,"abstract":"Neural networks (NNs) are the method of choice for building learning algorithms. They are now being investigated for other numerical tasks such as solving high-dimensional partial differential equations. Their popularity stems from their empirical success on several challenging learning problems (computer chess/Go, autonomous navigation, face recognition). However, most scholars agree that a convincing theoretical explanation for this success is still lacking. Since these applications revolve around approximating an unknown function from data observations, part of the answer must involve the ability of NNs to produce accurate approximations. This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properties that are not present in the more traditional methods of approximation used in numerical analysis, such as approximations using polynomials, wavelets, rational functions and splines. Comparisons are made with traditional approximation methods from the viewpoint of rate distortion, i.e. error versus the number of parameters used to create the approximant. Another major component in the analysis of numerical approximation is the computational time needed to construct the approximation, and this in turn is intimately connected with the stability of the approximation algorithm. So the stability of numerical approximation using NNs is a large part of the analysis put forward. The survey, for the most part, is concerned with NNs using the popular ReLU activation function. In this case the outputs of the NNs are piecewise linear functions on rather complicated partitions of the domain of f into cells that are convex polytopes. When the architecture of the NN is fixed and the parameters are allowed to vary, the set of output functions of the NN is a parametrized nonlinear manifold. It is shown that this manifold has certain space-filling properties leading to an increased ability to approximate (better rate distortion) but at the expense of numerical stability. The space filling creates the challenge to the numerical method of finding best or good parameter choices when trying to approximate.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"30 1","pages":"327 - 444"},"PeriodicalIF":14.2,"publicationDate":"2020-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47549585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-30DOI: 10.1017/s096249292000001x
Marta D’Elia, Qiang Du, Christian Glusa, Max Gunzburger, Xiaochuan Tian, Zhi Zhou
Partial differential equations (PDEs) are used with huge success to model phenomena across all scientific and engineering disciplines. However, across an equally wide swath, there exist situations in which PDEs fail to adequately model observed phenomena, or are not the best available model for that purpose. On the other hand, in many situations,nonlocal modelsthat account for interaction occurring at a distance have been shown to more faithfully and effectively model observed phenomena that involve possible singularities and other anomalies. In this article we consider a generic nonlocal model, beginning with a short review of its definition, the properties of its solution, its mathematical analysis and of specific concrete examples. We then provide extensive discussions about numerical methods, including finite element, finite difference and spectral methods, for determining approximate solutions of the nonlocal models considered. In that discussion, we pay particular attention to a special class of nonlocal models that are the most widely studied in the literature, namely those involving fractional derivatives. The article ends with brief considerations of several modelling and algorithmic extensions, which serve to show the wide applicability of nonlocal modelling.
{"title":"Numerical methods for nonlocal and fractional models","authors":"Marta D’Elia, Qiang Du, Christian Glusa, Max Gunzburger, Xiaochuan Tian, Zhi Zhou","doi":"10.1017/s096249292000001x","DOIUrl":"https://doi.org/10.1017/s096249292000001x","url":null,"abstract":"Partial differential equations (PDEs) are used with huge success to model phenomena across all scientific and engineering disciplines. However, across an equally wide swath, there exist situations in which PDEs fail to adequately model observed phenomena, or are not the best available model for that purpose. On the other hand, in many situations,<jats:italic>nonlocal models</jats:italic>that account for interaction occurring at a distance have been shown to more faithfully and effectively model observed phenomena that involve possible singularities and other anomalies. In this article we consider a generic nonlocal model, beginning with a short review of its definition, the properties of its solution, its mathematical analysis and of specific concrete examples. We then provide extensive discussions about numerical methods, including finite element, finite difference and spectral methods, for determining approximate solutions of the nonlocal models considered. In that discussion, we pay particular attention to a special class of nonlocal models that are the most widely studied in the literature, namely those involving fractional derivatives. The article ends with brief considerations of several modelling and algorithmic extensions, which serve to show the wide applicability of nonlocal modelling.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"26 1","pages":""},"PeriodicalIF":14.2,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138530497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1017/S0962492920000045
S. Olver, R. Slevinsky, Alex Townsend
We review recent advances in algorithms for quadrature, transforms, differential equations and singular integral equations using orthogonal polynomials. Quadrature based on asymptotics has facilitated optimal complexity quadrature rules, allowing for efficient computation of quadrature rules with millions of nodes. Transforms based on rank structures in change-of-basis operators allow for quasi-optimal complexity, including in multivariate settings such as on triangles and for spherical harmonics. Ordinary and partial differential equations can be solved via sparse linear algebra when set up using orthogonal polynomials as a basis, provided that care is taken with the weights of orthogonality. A similar idea, together with low-rank approximation, gives an efficient method for solving singular integral equations. These techniques can be combined to produce high-performance codes for a wide range of problems that appear in applications.
{"title":"Fast algorithms using orthogonal polynomials","authors":"S. Olver, R. Slevinsky, Alex Townsend","doi":"10.1017/S0962492920000045","DOIUrl":"https://doi.org/10.1017/S0962492920000045","url":null,"abstract":"We review recent advances in algorithms for quadrature, transforms, differential equations and singular integral equations using orthogonal polynomials. Quadrature based on asymptotics has facilitated optimal complexity quadrature rules, allowing for efficient computation of quadrature rules with millions of nodes. Transforms based on rank structures in change-of-basis operators allow for quasi-optimal complexity, including in multivariate settings such as on triangles and for spherical harmonics. Ordinary and partial differential equations can be solved via sparse linear algebra when set up using orthogonal polynomials as a basis, provided that care is taken with the weights of orthogonality. A similar idea, together with low-rank approximation, gives an efficient method for solving singular integral equations. These techniques can be combined to produce high-performance codes for a wide range of problems that appear in applications.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"29 1","pages":"573 - 699"},"PeriodicalIF":14.2,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/S0962492920000045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49221314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1017/s0962492920000082
M. D'Elia, Q. Du, Christian A. Glusa, M. Gunzburger, X. Tian
{"title":"ANU volume 29 Cover and Back matter","authors":"M. D'Elia, Q. Du, Christian A. Glusa, M. Gunzburger, X. Tian","doi":"10.1017/s0962492920000082","DOIUrl":"https://doi.org/10.1017/s0962492920000082","url":null,"abstract":"","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"29 1","pages":"b1 - b2"},"PeriodicalIF":14.2,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/s0962492920000082","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42777579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}