原理深度神经网络训练通过线性规划

IF 0.9 4区 数学 Q3 MATHEMATICS, APPLIED Discrete Optimization Pub Date : 2023-08-01 DOI:10.1016/j.disopt.2023.100795
Daniel Bienstock , Gonzalo Muñoz , Sebastian Pokutta
{"title":"原理深度神经网络训练通过线性规划","authors":"Daniel Bienstock ,&nbsp;Gonzalo Muñoz ,&nbsp;Sebastian Pokutta","doi":"10.1016/j.disopt.2023.100795","DOIUrl":null,"url":null,"abstract":"<div><p><span>Deep learning<span> has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible </span></span>deep neural network<span> training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems.</span></p></div>","PeriodicalId":50571,"journal":{"name":"Discrete Optimization","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Principled deep neural network training through linear programming\",\"authors\":\"Daniel Bienstock ,&nbsp;Gonzalo Muñoz ,&nbsp;Sebastian Pokutta\",\"doi\":\"10.1016/j.disopt.2023.100795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>Deep learning<span> has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible </span></span>deep neural network<span> training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems.</span></p></div>\",\"PeriodicalId\":50571,\"journal\":{\"name\":\"Discrete Optimization\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Discrete Optimization\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1572528623000373\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discrete Optimization","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1572528623000373","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 17

摘要

由于训练算法取得了令人印象深刻的经验性能,深度学习最近受到了广泛关注。因此,对这些问题有更好的理论理解的必要性变得更加明显,近年来的许多工作都集中在这一任务上。在这项工作中,使用统一的框架,我们证明了存在一个多面体,它在其面部结构中同时编码给定架构、激活函数、损失函数和样本量可能产生的所有可能的深度神经网络训练问题。值得注意的是,多面体表示的大小仅线性地取决于样本大小,并且不太可能更好地依赖于其他几个网络参数。使用这个一般结果,我们计算了常用神经网络架构的多面体编码的大小。我们的结果通过多面体理论的视角为训练问题提供了一个新的视角,并揭示了这些问题产生的强大结构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Principled deep neural network training through linear programming

Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Discrete Optimization
Discrete Optimization 管理科学-应用数学
CiteScore
2.10
自引率
9.10%
发文量
30
审稿时长
>12 weeks
期刊介绍: Discrete Optimization publishes research papers on the mathematical, computational and applied aspects of all areas of integer programming and combinatorial optimization. In addition to reports on mathematical results pertinent to discrete optimization, the journal welcomes submissions on algorithmic developments, computational experiments, and novel applications (in particular, large-scale and real-time applications). The journal also publishes clearly labelled surveys, reviews, short notes, and open problems. Manuscripts submitted for possible publication to Discrete Optimization should report on original research, should not have been previously published, and should not be under consideration for publication by any other journal.
期刊最新文献
Approximation schemes for Min-Sum k-Clustering Easy and hard separation of sparse and dense odd-set constraints in matching Mostar index and bounded maximum degree Two-set inequalities for the binary knapsack polyhedra Revisiting some classical linearizations of the quadratic binary optimization problem and linkages with constraint aggregations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1