Sebastian Neumayer, Alexis Goujon, Pakshal Bohra, Michael Unser
{"title":"Approximation of Lipschitz Functions Using Deep Spline Neural Networks","authors":"Sebastian Neumayer, Alexis Goujon, Pakshal Bohra, Michael Unser","doi":"10.1137/22m1504573","DOIUrl":null,"url":null,"abstract":"Although Lipschitz-constrained neural networks have many applications in machine learning, the design and training of expressive Lipschitz-constrained networks is very challenging. Since the popular rectified linear-unit networks have provable disadvantages in this setting, we propose using learnable spline activation functions with at least three linear regions instead. We prove that our choice is universal among all componentwise 1-Lipschitz activation functions in the sense that no other weight-constrained architecture can approximate a larger class of functions. Additionally, our choice is at least as expressive as the recently introduced non-componentwise Groupsort activation function for spectral-norm-constrained weights. The theoretical findings of this paper are consistent with previously published numerical results.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"7 1","pages":"0"},"PeriodicalIF":1.9000,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/22m1504573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 3
Abstract
Although Lipschitz-constrained neural networks have many applications in machine learning, the design and training of expressive Lipschitz-constrained networks is very challenging. Since the popular rectified linear-unit networks have provable disadvantages in this setting, we propose using learnable spline activation functions with at least three linear regions instead. We prove that our choice is universal among all componentwise 1-Lipschitz activation functions in the sense that no other weight-constrained architecture can approximate a larger class of functions. Additionally, our choice is at least as expressive as the recently introduced non-componentwise Groupsort activation function for spectral-norm-constrained weights. The theoretical findings of this paper are consistent with previously published numerical results.