Predict globally, correct locally: Parallel-in-time optimization of neural networks

IF 4.8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Automatica Pub Date : 2024-10-30 DOI:10.1016/j.automatica.2024.111976

Panos Parpas, Corey Muir

{"title":"Predict globally, correct locally: Parallel-in-time optimization of neural networks","authors":"Panos Parpas, Corey Muir","doi":"10.1016/j.automatica.2024.111976","DOIUrl":null,"url":null,"abstract":"<div><div>The training of neural networks can be formulated as an optimal control problem of a dynamical system. The initial conditions of the dynamical system are given by the data. The objective of the control problem is to transform the initial conditions in a form that can be easily classified or regressed using linear methods. This link between optimal control of dynamical systems and neural networks has proved beneficial both from a theoretical and from a practical point of view. Several researchers have exploited this link to investigate the stability of different neural network architectures and develop memory efficient training algorithms. In this paper, we also adopt the dynamical systems view of neural networks, but our aim is different from earlier works. Instead, we develop a novel distributed optimization algorithm. The proposed algorithm addresses the most significant obstacle for distributed algorithms for neural network optimization: the network weights cannot be updated until the forward propagation of the data, and backward propagation of the gradients are complete. Using the dynamical systems point of view, we interpret the layers of a (residual) neural network as the discretized dynamics of a dynamical system and exploit the relationship between the co-states (adjoints) of the optimal control problem and backpropagation. We then develop a parallel-in-time method that updates the parameters of the network without waiting for the forward or back propagation algorithms to complete in full. We establish the convergence of the proposed algorithm. Preliminary numerical results suggest that the algorithm is competitive and more efficient than the state-of-the-art.</div></div>","PeriodicalId":55413,"journal":{"name":"Automatica","volume":"171 ","pages":"Article 111976"},"PeriodicalIF":4.8000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automatica","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0005109824004709","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The training of neural networks can be formulated as an optimal control problem of a dynamical system. The initial conditions of the dynamical system are given by the data. The objective of the control problem is to transform the initial conditions in a form that can be easily classified or regressed using linear methods. This link between optimal control of dynamical systems and neural networks has proved beneficial both from a theoretical and from a practical point of view. Several researchers have exploited this link to investigate the stability of different neural network architectures and develop memory efficient training algorithms. In this paper, we also adopt the dynamical systems view of neural networks, but our aim is different from earlier works. Instead, we develop a novel distributed optimization algorithm. The proposed algorithm addresses the most significant obstacle for distributed algorithms for neural network optimization: the network weights cannot be updated until the forward propagation of the data, and backward propagation of the gradients are complete. Using the dynamical systems point of view, we interpret the layers of a (residual) neural network as the discretized dynamics of a dynamical system and exploit the relationship between the co-states (adjoints) of the optimal control problem and backpropagation. We then develop a parallel-in-time method that updates the parameters of the network without waiting for the forward or back propagation algorithms to complete in full. We establish the convergence of the proposed algorithm. Preliminary numerical results suggest that the algorithm is competitive and more efficient than the state-of-the-art.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

全局预测，局部修正：神经网络的并行实时优化

神经网络的训练可以表述为一个动态系统的最优控制问题。动态系统的初始条件由数据给出。控制问题的目标是将初始条件转换为一种易于使用线性方法进行分类或回归的形式。事实证明，动态系统优化控制与神经网络之间的这种联系无论从理论还是从实践角度来看都是有益的。一些研究人员利用这种联系研究了不同神经网络架构的稳定性，并开发出了高效记忆训练算法。在本文中，我们也采用了神经网络的动力系统观点，但我们的目标与之前的研究有所不同。相反，我们开发了一种新型分布式优化算法。所提出的算法解决了神经网络分布式优化算法的最大障碍：在数据的前向传播和梯度的后向传播完成之前，网络权重不能更新。我们从动态系统的角度出发，将（残差）神经网络的层解释为动态系统的离散动态，并利用最优控制问题的共态（邻接）与反向传播之间的关系。然后，我们开发了一种实时并行方法，无需等待前向或反向传播算法全部完成即可更新网络参数。我们确定了所提算法的收敛性。初步的数值结果表明，该算法具有竞争力，比最先进的算法更高效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Automatica 工程技术-工程：电子与电气

CiteScore

10.70

自引率

7.80%

发文量

617

审稿时长

5 months

期刊介绍： Automatica is a leading archival publication in the field of systems and control. The field encompasses today a broad set of areas and topics, and is thriving not only within itself but also in terms of its impact on other fields, such as communications, computers, biology, energy and economics. Since its inception in 1963, Automatica has kept abreast with the evolution of the field over the years, and has emerged as a leading publication driving the trends in the field. After being founded in 1963, Automatica became a journal of the International Federation of Automatic Control (IFAC) in 1969. It features a characteristic blend of theoretical and applied papers of archival, lasting value, reporting cutting edge research results by authors across the globe. It features articles in distinct categories, including regular, brief and survey papers, technical communiqués, correspondence items, as well as reviews on published books of interest to the readership. It occasionally publishes special issues on emerging new topics or established mature topics of interest to a broad audience. Automatica solicits original high-quality contributions in all the categories listed above, and in all areas of systems and control interpreted in a broad sense and evolving constantly. They may be submitted directly to a subject editor or to the Editor-in-Chief if not sure about the subject area. Editorial procedures in place assure careful, fair, and prompt handling of all submitted articles. Accepted papers appear in the journal in the shortest time feasible given production time constraints.