Daniel Bienstock , Gonzalo Muñoz , Sebastian Pokutta
{"title":"Principled deep neural network training through linear programming","authors":"Daniel Bienstock , Gonzalo Muñoz , Sebastian Pokutta","doi":"10.1016/j.disopt.2023.100795","DOIUrl":null,"url":null,"abstract":"<div><p><span>Deep learning<span> has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible </span></span>deep neural network<span> training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems.</span></p></div>","PeriodicalId":50571,"journal":{"name":"Discrete Optimization","volume":"49 ","pages":"Article 100795"},"PeriodicalIF":0.9000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discrete Optimization","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1572528623000373","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 17
Abstract
Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems.
期刊介绍:
Discrete Optimization publishes research papers on the mathematical, computational and applied aspects of all areas of integer programming and combinatorial optimization. In addition to reports on mathematical results pertinent to discrete optimization, the journal welcomes submissions on algorithmic developments, computational experiments, and novel applications (in particular, large-scale and real-time applications). The journal also publishes clearly labelled surveys, reviews, short notes, and open problems. Manuscripts submitted for possible publication to Discrete Optimization should report on original research, should not have been previously published, and should not be under consideration for publication by any other journal.