A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

IF 7.4 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Annual Review of Statistics and Its Application Pub Date : 2024-11-21 DOI:10.1146/annurev-statistics-040522-013920
Namjoon Suh, Guang Cheng
{"title":"A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models","authors":"Namjoon Suh, Guang Cheng","doi":"10.1146/annurev-statistics-040522-013920","DOIUrl":null,"url":null,"abstract":"In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"111 1","pages":""},"PeriodicalIF":7.4000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Review of Statistics and Its Application","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1146/annurev-statistics-040522-013920","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
深度学习统计理论概览:逼近、训练动态和生成模型
在本文中,我们从逼近、训练动态和生成模型三个角度回顾了有关神经网络统计理论的文献。在第一部分中,我们回顾了在非参数回归框架下神经网络的超额风险结果。这些结果依赖于神经网络的明确构造,从而导致超额风险的快速收敛率。然而,它们的基本分析只适用于深度神经网络高度非凸景观中的全局最小化。这促使我们在第二部分回顾神经网络的训练动态。具体来说,我们回顾了一些文章,这些文章试图回答这样一个问题:通过基于梯度的方法训练的神经网络如何找到一个能在未见数据上很好泛化的解决方案。我们特别回顾了两种著名的范式:神经正切核和均值场范式。最后,我们回顾了生成模型的最新理论进展,包括生成对抗网络、扩散模型,以及从近似和训练动态这两个相同的角度对大型语言模型进行的上下文学习。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Annual Review of Statistics and Its Application
Annual Review of Statistics and Its Application MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-STATISTICS & PROBABILITY
CiteScore
13.40
自引率
1.30%
发文量
29
期刊介绍: The Annual Review of Statistics and Its Application publishes comprehensive review articles focusing on methodological advancements in statistics and the utilization of computational tools facilitating these advancements. It is abstracted and indexed in Scopus, Science Citation Index Expanded, and Inspec.
期刊最新文献
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models Models and Rating Systems for Head-to-Head Competition A Review of Reinforcement Learning in Financial Applications Joint Modeling of Longitudinal and Survival Data Infectious Disease Modeling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1