原理深度神经网络训练通过线性规划

IF 1.6 4区数学 Q3 MATHEMATICS, APPLIED Discrete Optimization Pub Date : 2023-08-01 DOI:10.1016/j.disopt.2023.100795

Daniel Bienstock , Gonzalo Muñoz , Sebastian Pokutta

{"title":"原理深度神经网络训练通过线性规划","authors":"Daniel Bienstock , Gonzalo Muñoz , Sebastian Pokutta","doi":"10.1016/j.disopt.2023.100795","DOIUrl":null,"url":null,"abstract":"<div><p><span>Deep learning<span> has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible </span></span>deep neural network<span> training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems.</span></p></div>","PeriodicalId":50571,"journal":{"name":"Discrete Optimization","volume":"49 ","pages":"Article 100795"},"PeriodicalIF":1.6000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Principled deep neural network training through linear programming\",\"authors\":\"Daniel Bienstock , Gonzalo Muñoz , Sebastian Pokutta\",\"doi\":\"10.1016/j.disopt.2023.100795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>Deep learning<span> has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible </span></span>deep neural network<span> training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems.</span></p></div>\",\"PeriodicalId\":50571,\"journal\":{\"name\":\"Discrete Optimization\",\"volume\":\"49 \",\"pages\":\"Article 100795\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Discrete Optimization\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1572528623000373\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discrete Optimization","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1572528623000373","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 17

摘要

由于训练算法取得了令人印象深刻的经验性能，深度学习最近受到了广泛关注。因此，对这些问题有更好的理论理解的必要性变得更加明显，近年来的许多工作都集中在这一任务上。在这项工作中，使用统一的框架，我们证明了存在一个多面体，它在其面部结构中同时编码给定架构、激活函数、损失函数和样本量可能产生的所有可能的深度神经网络训练问题。值得注意的是，多面体表示的大小仅线性地取决于样本大小，并且不太可能更好地依赖于其他几个网络参数。使用这个一般结果，我们计算了常用神经网络架构的多面体编码的大小。我们的结果通过多面体理论的视角为训练问题提供了一个新的视角，并揭示了这些问题产生的强大结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Principled deep neural network training through linear programming

Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Discrete Optimization 管理科学-应用数学

CiteScore

2.10

自引率

9.10%

发文量

审稿时长

>12 weeks

期刊介绍： Discrete Optimization publishes research papers on the mathematical, computational and applied aspects of all areas of integer programming and combinatorial optimization. In addition to reports on mathematical results pertinent to discrete optimization, the journal welcomes submissions on algorithmic developments, computational experiments, and novel applications (in particular, large-scale and real-time applications). The journal also publishes clearly labelled surveys, reviews, short notes, and open problems. Manuscripts submitted for possible publication to Discrete Optimization should report on original research, should not have been previously published, and should not be under consideration for publication by any other journal.

期刊最新文献

Inverse of the Gomory corner relaxation of integer programs The length polyhedron of an interval order Lower bounds on the performance of online algorithms for relaxed packing problems Saturation numbers of balanced double stars An optimization approach to degree deviation and spectral radius