Optimal learning

IF 1.4 2区 数学 Q1 MATHEMATICS Calcolo Pub Date : 2024-02-19 DOI:10.1007/s10092-023-00564-y
Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova
{"title":"Optimal learning","authors":"Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova","doi":"10.1007/s10092-023-00564-y","DOIUrl":null,"url":null,"abstract":"<p>This paper studies the problem of learning an unknown function <i>f</i> from given data about <i>f</i>. The learning problem is to give an approximation <span>\\({\\hat{f}}\\)</span> to <i>f</i> that predicts the values of <i>f</i> away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about <i>f</i> (known as a model class assumption), (ii) how we measure the accuracy of how well <span>\\({\\hat{f}}\\)</span> predicts <i>f</i>, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal <span>\\({\\hat{f}}\\)</span> can be found by solving a certain finite-dimensional over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation <span>\\({\\hat{f}}\\)</span> of the function <i>f</i> from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of <i>f</i>. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.</p>","PeriodicalId":9522,"journal":{"name":"Calcolo","volume":"24 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Calcolo","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10092-023-00564-y","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

This paper studies the problem of learning an unknown function f from given data about f. The learning problem is to give an approximation \({\hat{f}}\) to f that predicts the values of f away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about f (known as a model class assumption), (ii) how we measure the accuracy of how well \({\hat{f}}\) predicts f, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal \({\hat{f}}\) can be found by solving a certain finite-dimensional over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation \({\hat{f}}\) of the function f from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of f. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
优化学习
本文研究从给定的关于 f 的数据中学习未知函数 f 的问题。学习问题是给出 f 的近似值 \({\hat{f}}\),该近似值可以预测 f 在数据之外的值。这个学习问题有多种设置,取决于:(i) 我们有哪些关于 f 的额外信息(称为模型类假设);(ii) 我们如何衡量 \({\hat{f}}\) 预测 f 的准确性;(iii) 我们对数据和数据站点的了解;(iv) 数据观测是否受到噪声污染。在有模型类假设的情况下,可能的最佳性能(可能的最小恢复误差)的数学描述是已知的。本文表明,在标准模型类假设条件下,通过求解某个带有惩罚项的有限维超参数优化问题,可以找到一个接近最优的 \({\hhat{f}}\)。这里的近似最优指的是误差以最优误差乘以一个固定常数为界。这就解释了现代机器学习中常用的超参数化的优势。本文的主要结果证明,使用适当的损失函数进行过参数化学习,可以得到函数 f 的近似值 \({\hat{f}}\)。本文还给出了定量约束,说明需要采用多少过度参数化以及如何调整惩罚比例才能保证近似最优地恢复 f。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Calcolo
Calcolo 数学-数学
CiteScore
2.40
自引率
11.80%
发文量
36
审稿时长
>12 weeks
期刊介绍: Calcolo is a quarterly of the Italian National Research Council, under the direction of the Institute for Informatics and Telematics in Pisa. Calcolo publishes original contributions in English on Numerical Analysis and its Applications, and on the Theory of Computation. The main focus of the journal is on Numerical Linear Algebra, Approximation Theory and its Applications, Numerical Solution of Differential and Integral Equations, Computational Complexity, Algorithmics, Mathematical Aspects of Computer Science, Optimization Theory. Expository papers will also appear from time to time as an introduction to emerging topics in one of the above mentioned fields. There will be a "Report" section, with abstracts of PhD Theses, news and reports from conferences and book reviews. All submissions will be carefully refereed.
期刊最新文献
Pressure-improved Scott-Vogelius type elements. Adaptive finite element approximation of bilinear optimal control problem with fractional Laplacian An explicit two-grid spectral deferred correction method for nonlinear fractional pantograph differential equations Fast algebraic multigrid for block-structured dense systems arising from nonlocal diffusion problems A modification of the periodic nonuniform sampling involving derivatives with a Gaussian multiplier
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1