E. M. Achour, Franccois Malgouyres, Franck Mamalet
{"title":"Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks","authors":"E. M. Achour, Franccois Malgouyres, Franck Mamalet","doi":"10.48550/arXiv.2108.05623","DOIUrl":null,"url":null,"abstract":"Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies the theoretical properties of orthogonal convolutional layers.We establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. The conditions prove that orthogonal convolutional transforms exist for almost all architectures used in practice for 'circular' padding.We also exhibit limitations with 'valid' boundary conditions and 'same' boundary conditions with zero-padding.Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed, and impressive empirical results have been obtained in different applications (Wang et al. 2020).The second motivation of the present paper is to specify the theory behind this.We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and that, in the presence of small errors and when the size of the signal/image is large, the convolutional layers remain close to isometric.The theoretical results are confirmed with experiments and the landscape of the regularization term is studied. Experiments on real data sets show that when orthogonality is used to enforce robustness, the parameter multiplying the regularization termcan be used to tune a tradeoff between accuracy and orthogonality, for the benefit of both accuracy and robustness.Altogether, the study guarantees that the regularization proposed in Wang et al. (2020) is an efficient, flexible and stable numerical strategy to learn orthogonal convolutional layers.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"27 1","pages":"347:1-347:56"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2108.05623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies the theoretical properties of orthogonal convolutional layers.We establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. The conditions prove that orthogonal convolutional transforms exist for almost all architectures used in practice for 'circular' padding.We also exhibit limitations with 'valid' boundary conditions and 'same' boundary conditions with zero-padding.Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed, and impressive empirical results have been obtained in different applications (Wang et al. 2020).The second motivation of the present paper is to specify the theory behind this.We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and that, in the presence of small errors and when the size of the signal/image is large, the convolutional layers remain close to isometric.The theoretical results are confirmed with experiments and the landscape of the regularization term is studied. Experiments on real data sets show that when orthogonality is used to enforce robustness, the parameter multiplying the regularization termcan be used to tune a tradeoff between accuracy and orthogonality, for the benefit of both accuracy and robustness.Altogether, the study guarantees that the regularization proposed in Wang et al. (2020) is an efficient, flexible and stable numerical strategy to learn orthogonal convolutional layers.
在神经网络层上施加正交性可以通过限制梯度的爆炸/消失来促进学习;去关联特征;提高鲁棒性。本文研究正交卷积层的理论性质。建立了保证正交卷积变换存在的层结构的充分必要条件。这些条件证明了正交卷积变换在几乎所有用于“圆形”填充的体系结构中都存在。我们还展示了“有效”边界条件和带有零填充的“相同”边界条件的局限性。最近,提出了一个施加卷积层正交性的正则化项,并在不同的应用中获得了令人印象深刻的经验结果(Wang et al. 2020)。本文的第二个动机是详细说明这背后的理论。我们在正则化项和正交度量之间建立联系。在此过程中,我们证明了这种正则化策略在数值误差和优化误差方面是稳定的,并且在存在小误差和信号/图像大小较大时,卷积层保持接近等距。实验验证了理论结果,并对正则化项的格局进行了研究。在真实数据集上的实验表明,当使用正交性来增强鲁棒性时,可以使用参数乘以正则化项来调整精度和正交性之间的权衡,从而既有利于精度又有利于鲁棒性。综上所述,本研究保证了Wang et al.(2020)提出的正则化是一种高效、灵活、稳定的学习正交卷积层的数值策略。