Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova
{"title":"优化学习","authors":"Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova","doi":"10.1007/s10092-023-00564-y","DOIUrl":null,"url":null,"abstract":"<p>This paper studies the problem of learning an unknown function <i>f</i> from given data about <i>f</i>. The learning problem is to give an approximation <span>\\({\\hat{f}}\\)</span> to <i>f</i> that predicts the values of <i>f</i> away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about <i>f</i> (known as a model class assumption), (ii) how we measure the accuracy of how well <span>\\({\\hat{f}}\\)</span> predicts <i>f</i>, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal <span>\\({\\hat{f}}\\)</span> can be found by solving a certain finite-dimensional over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation <span>\\({\\hat{f}}\\)</span> of the function <i>f</i> from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of <i>f</i>. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.</p>","PeriodicalId":9522,"journal":{"name":"Calcolo","volume":"24 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal learning\",\"authors\":\"Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova\",\"doi\":\"10.1007/s10092-023-00564-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper studies the problem of learning an unknown function <i>f</i> from given data about <i>f</i>. The learning problem is to give an approximation <span>\\\\({\\\\hat{f}}\\\\)</span> to <i>f</i> that predicts the values of <i>f</i> away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about <i>f</i> (known as a model class assumption), (ii) how we measure the accuracy of how well <span>\\\\({\\\\hat{f}}\\\\)</span> predicts <i>f</i>, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal <span>\\\\({\\\\hat{f}}\\\\)</span> can be found by solving a certain finite-dimensional over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation <span>\\\\({\\\\hat{f}}\\\\)</span> of the function <i>f</i> from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of <i>f</i>. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.</p>\",\"PeriodicalId\":9522,\"journal\":{\"name\":\"Calcolo\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Calcolo\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s10092-023-00564-y\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Calcolo","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10092-023-00564-y","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0
摘要
本文研究从给定的关于 f 的数据中学习未知函数 f 的问题。学习问题是给出 f 的近似值 \({\hat{f}}\),该近似值可以预测 f 在数据之外的值。这个学习问题有多种设置,取决于:(i) 我们有哪些关于 f 的额外信息(称为模型类假设);(ii) 我们如何衡量 \({\hat{f}}\) 预测 f 的准确性;(iii) 我们对数据和数据站点的了解;(iv) 数据观测是否受到噪声污染。在有模型类假设的情况下,可能的最佳性能(可能的最小恢复误差)的数学描述是已知的。本文表明,在标准模型类假设条件下,通过求解某个带有惩罚项的有限维超参数优化问题,可以找到一个接近最优的 \({\hhat{f}}\)。这里的近似最优指的是误差以最优误差乘以一个固定常数为界。这就解释了现代机器学习中常用的超参数化的优势。本文的主要结果证明,使用适当的损失函数进行过参数化学习,可以得到函数 f 的近似值 \({\hat{f}}\)。本文还给出了定量约束,说明需要采用多少过度参数化以及如何调整惩罚比例才能保证近似最优地恢复 f。
This paper studies the problem of learning an unknown function f from given data about f. The learning problem is to give an approximation \({\hat{f}}\) to f that predicts the values of f away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about f (known as a model class assumption), (ii) how we measure the accuracy of how well \({\hat{f}}\) predicts f, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal \({\hat{f}}\) can be found by solving a certain finite-dimensional over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation \({\hat{f}}\) of the function f from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of f. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.
期刊介绍:
Calcolo is a quarterly of the Italian National Research Council, under the direction of the Institute for Informatics and Telematics in Pisa. Calcolo publishes original contributions in English on Numerical Analysis and its Applications, and on the Theory of Computation.
The main focus of the journal is on Numerical Linear Algebra, Approximation Theory and its Applications, Numerical Solution of Differential and Integral Equations, Computational Complexity, Algorithmics, Mathematical Aspects of Computer Science, Optimization Theory.
Expository papers will also appear from time to time as an introduction to emerging topics in one of the above mentioned fields. There will be a "Report" section, with abstracts of PhD Theses, news and reports from conferences and book reviews. All submissions will be carefully refereed.